Date   

Re: app guid uniqueness

Nicholas Calugar
 

Hi John,

An application's guid will never change. If you delete an app, and push the
code again, you are creating a new app with another guid.


Thanks,

Nick

On Mon, Apr 18, 2016 at 9:53 AM, John Wong <gokoproject(a)gmail.com> wrote:

Based on my brief testing and observation, the guid of an app sticks
around for as long as the app remains running (whether we restart or
restage). But removing the app, then cf push a new guid is generated.

Is this a true statement?

Thanks.

John


app guid uniqueness

John Wong
 

Based on my brief testing and observation, the guid of an app sticks around
for as long as the app remains running (whether we restart or restage). But
removing the app, then cf push a new guid is generated.

Is this a true statement?

Thanks.

John


Re: CF Job Failure

Daniel Mikusa
 

On Mon, Apr 18, 2016 at 7:10 AM, Gupta, Abhik <abhik.gupta(a)sap.com> wrote:

Hi,

We are trying to push a node.js application using the Cloud Controller
REST APIs. The flow that we follow is similar to the flow followed by CF
CLI:



Create Application Metadata > Create Route Metadata > Associate Route with
Application > Get cached resources from Cloud Foundry using the Resource
Match API > Upload the bits (sending the fingerprints + application.zip)
asynchronously > Poll for the Job Status



This flow works perfectly fine till the last step but the polling for the
job status gives back an error response like:



{

"metadata": {

"guid": "cd5bf18d-249b-4f00-9ee9-6328081d3d77",

"created_at": "2016-04-18T10:55:29Z",

"url": "/v2/jobs/cd5bf18d-249b-4f00-9ee9-6328081d3d77"

},

"entity": {

"guid": "cd5bf18d-249b-4f00-9ee9-6328081d3d77",

"status": "failed",

"error": "Use of entity>error is deprecated in favor of
entity>error_details.",

"error_details": {

"error_code": "UnknownError",

"description": "An unknown error occurred.",

"code": 10001

}

}

}
Does the app actually push and get started? i.e. if you run `cf apps`
after you get this message is the app up and running? Also, do you see
similar issues when you push with `cf push`?




Apparently, this error is also pretty well-known because it’s documented
in the API documentation as well here:
http://apidocs.cloudfoundry.org/228/jobs/retrieve_job_with_unknown_failure.html

What could be the reason for this error from the Controller?
Take a look at the cloud controller logs,
`/var/vcap/sys/log/cloud_controller_ng`. There should be more information
about the problem there.

Dan


Re: Maven: Resolve Dependencies on Platform?

Daniel Mikusa
 

On Sat, Apr 16, 2016 at 11:30 PM, Josh Long <starbuxman(a)gmail.com> wrote:

I'm not sure if this is the right forum. I doubt it.

* you could achieve what you want by forking the buildpack used. If you're
using the Java buildpack then it's
https://github.com/cloudfoundry/java-buildpack. the `cf push` command
supports providing an override URL for the buildpack.
As an experiment, I created a build pack that would do this. It hasn't
been updated in a while, I don't plan to update it and it was never very
solid to begin with. It was more to just see if I could make it work. It
did, but the benefit was very small. I wouldn't recommend using it, but
it's there if you want to look at it.

https://github.com/dmikusa-pivotal/cf-maven-buildpack


* that said, this is a TERRIBLE idea. Instead, prefer that one build be
promoted from development to staging, QA, and production. Ideally, that
promotion should be automatic, the result of a continuous delivery pipeline
that sees code committed to version control, then run through continuous
integration, then pushed to a testing environment where it's certified and
smoke-tested, validated by QA, and ultimately promoted to production. You
can support this process with continuous integration tools like Jenkins,
Travis, Spinnaker, or Concourse.CI, which will monitor version control and
can be scripted to package and cf push code..
+1 - It's also worth mentioning that when you `cf push` something, your
platform will cache any resources that are larger than 65k (default
threshold, your platform's actual value may differ). The cache is global
so it's not just per app or per user. Once any user pushes a file, it will
be cached. This helps a ton with Java apps since JAR files are generally
over the threshold and the same JAR files are used across many users &
apps. Long story short, when you go to push your app you likely won't need
to upload as much data as you think.

Hope that helps!

Dan



On Sat, Apr 16, 2016 at 7:15 PM Matthew Tyson <matthewcarltyson(a)gmail.com>
wrote:

Please let me know if there is a more appropriate forum for this type of
question.


How can i configure HA Doppler at cf.yml?

inho cho
 

I read "Overview of the Loggregator System " - https://docs.cloudfoundry.org/loggregator/architecture.html

In that document, metron_agent can forward metrics or logs to N doppler.

But i don't know how to do it.

Would you let me know how to configure it at cf.yml.

Thanks & Regards


CF Job Failure

Gupta, Abhik
 

Hi,
We are trying to push a node.js application using the Cloud Controller REST APIs. The flow that we follow is similar to the flow followed by CF CLI:

Create Application Metadata > Create Route Metadata > Associate Route with Application > Get cached resources from Cloud Foundry using the Resource Match API > Upload the bits (sending the fingerprints + application.zip) asynchronously > Poll for the Job Status

This flow works perfectly fine till the last step but the polling for the job status gives back an error response like:

{
"metadata": {
"guid": "cd5bf18d-249b-4f00-9ee9-6328081d3d77",
"created_at": "2016-04-18T10:55:29Z",
"url": "/v2/jobs/cd5bf18d-249b-4f00-9ee9-6328081d3d77"
},
"entity": {
"guid": "cd5bf18d-249b-4f00-9ee9-6328081d3d77",
"status": "failed",
"error": "Use of entity>error is deprecated in favor of entity>error_details.",
"error_details": {
"error_code": "UnknownError",
"description": "An unknown error occurred.",
"code": 10001
}
}
}

Apparently, this error is also pretty well-known because it's documented in the API documentation as well here: http://apidocs.cloudfoundry.org/228/jobs/retrieve_job_with_unknown_failure.html
What could be the reason for this error from the Controller?

Thanks & Regards
Abhik


Re: How can we customized "404 Not Found"

Stefan Mayr
 

Hi Amit,

Am 17.04.2016 um 21:10 schrieb Amit Gupta:
Hi Stefan, Mike,

For real applications, would you want a common 503 page for all
applications on the platform, or would your different applications have
different custom domains, with custom 503 pages for each domain.
For my current use case we need the 503 page as a catch-all for stopped
applications in all application domains. It displays a "nice" generic
message and tries not to break marketings seo efforts :-)
For more specific customer information (down till monday hh:mm) we can
deploy another staticfile application and map the application domain to it.

If the latter, then you probably wouldn't have the wildcard route on the
same domain as the default app domain used in the smoke tests.

If the former, then what sort of assertion do you think the smoke tests
should make to assert that the app has been cleaned up and is no longer
routable (without introducing false positives where the app is still
routable but returning a bad response code for some other reason)?
Thinking about the smoke tests in isolation, you might think to make the
response-code-check injectible as configuration to the smoke tests. But
from the perspective of the operator setting up the whole system and
maybe deciding whether or not to run smoke tests, they might not even
have deployed CF yet, let alone set up a 503 page, so it would be
awkward to have the operator decide and configure up front what
output/response code the smoke tests look for.

Not sure what the optimal answer is here, any thoughts?
I'm thinking about a correct answer since I've read the source code last
friday. Regardless of my preference of a 503 service unavailable to a
404 file not found - having a feature (wildcard for default application
domain) breaking a general smoke tests feels wrong.

At the moment I like the idea how some applications do domain
validation: they require you to put a file with some kind of key or uuid
onto your webserver or into a dns txt record. For cloud foundry we could
use something similar: deploy a file containing a specific id. If we try
to fetch the content of this url and it doesn't contain this id the
application is gone. It is basically like a negated version of the
current test. Instead of expecting a specific string when the
application is deleted (404) we expect a specific string (0xdeadbeef?)
not to be in the response string.

Amit

On Sun, Apr 17, 2016 at 11:43 AM, Stefan Mayr <stefan(a)mayr-stefan.de
<mailto:stefan(a)mayr-stefan.de>> wrote:

Hi,

Am 14.04.2016 um 18:14 schrieb Mike Youngstrom:

We passed the smoke tests by:

* Only returning a 503 if the requested route exists.
* Embed the old 404 page text in a comment of the returned html.

Mike


I verified this today: only the the text "404" is required to pass
the smoke tests. So I created a 503 service unavailable page
containing an html comment: <!-- CF: 404 -->

Good enough as a temporary workaround. I'd still consider this as a
bug in the smoke tests. When you use wildcard routes for real
applications this won't work.

Regards,

Stefan Mayr


Re: How can we customized "404 Not Found"

Amit Kumar Gupta
 

Hi Stefan, Mike,

For real applications, would you want a common 503 page for all
applications on the platform, or would your different applications have
different custom domains, with custom 503 pages for each domain.

If the latter, then you probably wouldn't have the wildcard route on the
same domain as the default app domain used in the smoke tests.

If the former, then what sort of assertion do you think the smoke tests
should make to assert that the app has been cleaned up and is no longer
routable (without introducing false positives where the app is still
routable but returning a bad response code for some other reason)?
Thinking about the smoke tests in isolation, you might think to make the
response-code-check injectible as configuration to the smoke tests. But
from the perspective of the operator setting up the whole system and maybe
deciding whether or not to run smoke tests, they might not even have
deployed CF yet, let alone set up a 503 page, so it would be awkward to
have the operator decide and configure up front what output/response code
the smoke tests look for.

Not sure what the optimal answer is here, any thoughts?

Amit

On Sun, Apr 17, 2016 at 11:43 AM, Stefan Mayr <stefan(a)mayr-stefan.de> wrote:

Hi,

Am 14.04.2016 um 18:14 schrieb Mike Youngstrom:

We passed the smoke tests by:

* Only returning a 503 if the requested route exists.
* Embed the old 404 page text in a comment of the returned html.

Mike
I verified this today: only the the text "404" is required to pass the
smoke tests. So I created a 503 service unavailable page containing an html
comment: <!-- CF: 404 -->

Good enough as a temporary workaround. I'd still consider this as a bug in
the smoke tests. When you use wildcard routes for real applications this
won't work.

Regards,

Stefan


Re: How can we customized "404 Not Found"

Stefan Mayr
 

Hi,

Am 14.04.2016 um 18:14 schrieb Mike Youngstrom:
We passed the smoke tests by:

* Only returning a 503 if the requested route exists.
* Embed the old 404 page text in a comment of the returned html.

Mike
I verified this today: only the the text "404" is required to pass the
smoke tests. So I created a 503 service unavailable page containing an
html comment: <!-- CF: 404 -->

Good enough as a temporary workaround. I'd still consider this as a bug
in the smoke tests. When you use wildcard routes for real applications
this won't work.

Regards,

Stefan


Re: Maven: Resolve Dependencies on Platform?

Josh Long <starbuxman@...>
 

I'm not sure if this is the right forum. I doubt it.

* you could achieve what you want by forking the buildpack used. If you're
using the Java buildpack then it's
https://github.com/cloudfoundry/java-buildpack. the `cf push` command
supports providing an override URL for the buildpack.
* that said, this is a TERRIBLE idea. Instead, prefer that one build be
promoted from development to staging, QA, and production. Ideally, that
promotion should be automatic, the result of a continuous delivery pipeline
that sees code committed to version control, then run through continuous
integration, then pushed to a testing environment where it's certified and
smoke-tested, validated by QA, and ultimately promoted to production. You
can support this process with continuous integration tools like Jenkins,
Travis, Spinnaker, or Concourse.CI, which will monitor version control and
can be scripted to package and cf push code..

On Sat, Apr 16, 2016 at 7:15 PM Matthew Tyson <matthewcarltyson(a)gmail.com>
wrote:

Please let me know if there is a more appropriate forum for this type of
question.


Re: Maven: Resolve Dependencies on Platform?

Matthew Tyson
 

Please let me know if there is a more appropriate forum for this type of question.


Maven: Resolve Dependencies on Platform?

Matthew Tyson
 

Is anyone aware of a way to deploy a maven application via cloud foundry that will:

1) Take only the application bits and upload them

2) Run the maven build based on the pom.xml

3) pull down the defined dependencies on the platform

Thereby NOT uploading the dependencies from the client to the platform?


Help needed building CF Certifications

Timothy Harris <tharris@...>
 

Hello Cloud Foundry Experts -

As you may know, the Cloud Foundry Foundation is working on building a
certification program for professionals who use Cloud Foundry. Initially
we are tackling a CF Developer certification, and a bit later will be
tackling a CF Operator certification. Thanks to the contributors who helped
us define the scope, domains and tasks, we know what we want to cover in
the exam. The next step in our process is taking the tasks and turning them
into good test questions.

We need your help in with item writing! We are looking for pairs of Cloud
Foundry contributors to help us develop the test. Specifically, we need
help in building out test questions - taking an initial summary of a task
and creating a specific question, with associated pre-requisites and final
correct answers. Some questions will have a coding component, as we also
need some sample (very simple) applications as part of the question.

We expect this activity to require roughly one full day per question, and
we’d like to have volunteers work on 2-4 of the questions. (If you plan to
take the test, you’ll need to limit the number of questions you work on -
let us know in advance!) So the ideal pair of volunteers would have 3-4
days in May to dedicate to this task, and we’d like to all work on it
together so we can get it done quickly. We’ll start out with a 2 hour
training session to explain how to write scorable test questions.

Are you someone that could help us with this important task? At this point
we are looking for interested and knowledgeable volunteers who likely can
help, and we’ll work together to detail the logistics of specifically how
and when we work together on the effort. So simply reply to this email
(and perhaps share your specialty in CF if you have one) and we’ll take it
from there...

Thanks in advance to those that can help!

Tim and Stormy

--
Tim Harris | Director of Certification Programs
Cloud Foundry Foundation
(415) 518-6807


Re: openstack / CF v234 deployment

Amit Kumar Gupta
 

Hi William,

Yes, this looks like a mixup between domain and system_domain. Both should
be the same value, system.domain.com. The two separate properties exist
for historical reasons, we plan to reduce back down to 1 to avoid these
confusions in the future.

Cheers,
Amit

On Fri, Apr 15, 2016 at 11:43 AM, Bean William R <BeanWilliamR(a)johndeere.com
wrote:
We've ventured through
https://docs.cloudfoundry.org/deploying/openstack/index.html and have CF
deployed on OpenStack.

[root(a)cfinstaller my-bosh]# bosh deployments
...

+------------------+--------------+------------------------------------------------+--------------+
| Name | Release(s) | Stemcell(s)
| Cloud Config |

+------------------+--------------+------------------------------------------------+--------------+
| cloudfoundry-lab | cf/234+dev.1 |
bosh-openstack-kvm-ubuntu-trusty-go_agent/3215 | none |

+------------------+--------------+------------------------------------------------+--------------+

After a bosh deploy, we are able to login with the admin credentials, and
cf push a staticfile app:


[root(a)cfinstaller billhello-project]# cf push billhello -m 64M
...
requested state: started
instances: 1/1
usage: 64M x 1 instances
urls: billhello.domain.com
last uploaded: Fri Apr 15 18:30:38 UTC 2016
stack: unknown
buildpack: staticfile 1.3.5

state since cpu memory disk
details
#0 running 2016-04-15 06:30:50 PM 0.0% 3.6M of 64M 5.4M of 1G


However we are not able to use the loggregator service from the cf-cli:

[root(a)cfinstaller billhello-project]# export CF_TRACE=true
[root(a)cfinstaller billhello-project]# cf logs billhello
...
WEBSOCKET REQUEST: [2016-04-15T18:35:22Z]
GET /tail/?app=d0a97a19-4794-4584-82ce-1b2fe596cf78 HTTP/1.1
Host: wss://loggregator.system.domain.com:4443
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: [HIDDEN]
Origin: http://localhost
Authorization: [PRIVATE DATA HIDDEN]


WEBSOCKET RESPONSE: [2016-04-15T18:35:22Z]
HTTP/1.1 404 Not Found
Content-Length: 91
Content-Type: text/plain; charset=utf-8
X-Cf-Routererror: unknown_route
X-Content-Type-Options: nosniff
X-Vcap-Request-Id: c8a05edd-a6b9-48d0-6f05-1c0aaa76f45f
Date: Fri, 15 Apr 2016 18:35:23 GMT

FAILED
Error dialing loggregator server: websocket: bad handshake.
Please ask your Cloud Foundry Operator to check the platform configuration
(loggregator endpoint is wss://loggregator.system.cflab.deere.com:4443).


We've already tried just restarting the services on the
loggregator_trafficcontroller_z1/0 instance, and logging out & back in with
the cf-cli... neither helps.

How does this wss://loggregator.system.domain.com:4443 get added to
nats? From the loggregator_trafficcontroller_z1/0 instance, the only
routes in /var/vcap/jobs/route_registrar/config/registrar_settings.yml are:

routes:
[{"name":"doppler","port":8081,"registration_interval":"20s","uris":["
doppler.domain.com
"]},{"name":"loggregator","port":8080,"registration_interval":"20s","uris":["
loggregator.domain.com"]}]

Should there be one for loggregator.system.domain.com? Is this just a
mismatch between system_domain and domain? Any troubleshooting tips?

Thanks,
William Bean


CF CLI v6.17.0 Released Today

Koper, Dies <diesk@...>
 

The CF CLI team just cut 6.17.0. Binaries and link to release notes are available at:

https://github.com/cloudfoundry/cli#downloads

The command reference guide on http://cli.cloudfoundry.org is being updated now.

Built with Golang 1.6.1 which addresses two security vulnerabilities

Golang 1.6.1 has just been released, addressing two vulnerabilities that could affect cf CLI users.
See https://groups.google.com/forum/#!topic/golang-nuts/9eqIHqaWvck for details.

TCP Routing

Various commands have been enhanced to support TCP routes for apps deployed to the Diego runtime.
This feature requires the target CF release to be v234 (CC API v2.53.0) or higher and Diego and the Routing API to be enabled.

App Instance Quotas

Quota related commands have been enhanced to expose app instance quotas.
This feature requires the target CF release to be v214 (CC API v2.33.0) or higher for org quotas and v221 (CC API v2.40.0) or higher for space quotas.

Native build on Mac OS

Prevents a fatal runtime error on certain Mac OS versions and Anti-Virus/Security software. See #783<https://github.com/cloudfoundry/cli/issues/783>,#789<https://github.com/cloudfoundry/cli/issues/789>

New Commands

* cf router-groups lists the router groups available to your targeted Cloud Foundry. Once an admin creates a new shared domain associated with a TCP router group, developers may create TCP routes from this domain.
* cf version shows the cf CLI version. cf --version and cf -v will remain offering the same functionality but are omitted from cf help's GLOBAL OPTIONS section in favor of the new command.

Updated Commands

* create-shared-domain now accepts a router group to create a domain for (http://cli.cloudfoundry.org/en-US/cf/create-shared-domain.html)
* domains now displays the routing type of each domain
* create-route, map-route, unmap-route, delete-route and push now support an additional option to specify a TCP route's port number
* create-route and map-route now support an additional option to request a random port for a TCP route
* routes output now includes the port number and type of route
* create-space help output now doesn't incorrectly indicate there is a default space quota (#774<https://github.com/cloudfoundry/cli/issues/774>)
* create-space now correctly looks for the specified quota in the specified org (#775<https://github.com/cloudfoundry/cli/issues/775>)
* create-service-broker now has an alias, csb
* curl now defaults to performing a POST when the -d option is specified (#788<https://github.com/cloudfoundry/cli/issues/788>)
* curl no longer displays a message that you should be logged in
* buildpacks no longer tries to retrieve information from the CF endpoint after displaying you are not logged in
* map-route and unmap-route now map and unmap routes with paths correctly (#792<https://github.com/cloudfoundry/cli/issues/792>)
* app now reports the correct URL when using routes with paths (#809<https://github.com/cloudfoundry/cli/issues/809>)
* create-app-manifest now doesn't wrap an extra set of quotes around environment variables (#800<https://github.com/cloudfoundry/cli/issues/800>)
* copy-source usage in its help page now correctly reflects that specification of a space also requires specification of the space's org

Updated Global Options

* -h is now accepted also after the command to display help, e.g. cf push myapp -h
* -v when used with a command, e.g. cf apps -v, prints API request diagnostics. This makes enabling trace for a single command much easier, particularly on Windows
* --version and -v when used by themselves still display the CLI version, but are omitted from thecf help listing in favor of the new version command
* --build, -b still display the Golang version the cf CLI was built with, but is omitted from the cf help listing as it's not relevant to most users

Updated Plugins:

* cf willitconnect v1.1.0: https://github.com/gambtho/cf_will_it_connect_plugin
* Diego Enabler v1.1.0: http://github.com/cloudfoundry-incubator/Diego-Enabler<https://github.com/cloudfoundry-incubator/Diego-Enabler>
* Usage Report v1.3.0: http://github.com/krujos/usagereport-plugin<https://github.com/krujos/usagereport-plugin>
Enjoy!

Regards,
Dies Koper
Cloud Foundry CLI PM


openstack / CF v234 deployment

Bean William R
 

We've ventured through https://docs.cloudfoundry.org/deploying/openstack/index.html and have CF deployed on OpenStack.

[root(a)cfinstaller my-bosh]# bosh deployments
...
+------------------+--------------+------------------------------------------------+--------------+
| Name | Release(s) | Stemcell(s) | Cloud Config |
+------------------+--------------+------------------------------------------------+--------------+
| cloudfoundry-lab | cf/234+dev.1 | bosh-openstack-kvm-ubuntu-trusty-go_agent/3215 | none |
+------------------+--------------+------------------------------------------------+--------------+

After a bosh deploy, we are able to login with the admin credentials, and cf push a staticfile app:


[root(a)cfinstaller billhello-project]# cf push billhello -m 64M
...
requested state: started
instances: 1/1
usage: 64M x 1 instances
urls: billhello.domain.com
last uploaded: Fri Apr 15 18:30:38 UTC 2016
stack: unknown
buildpack: staticfile 1.3.5

state since cpu memory disk details
#0 running 2016-04-15 06:30:50 PM 0.0% 3.6M of 64M 5.4M of 1G


However we are not able to use the loggregator service from the cf-cli:

[root(a)cfinstaller billhello-project]# export CF_TRACE=true
[root(a)cfinstaller billhello-project]# cf logs billhello
...
WEBSOCKET REQUEST: [2016-04-15T18:35:22Z]
GET /tail/?app=d0a97a19-4794-4584-82ce-1b2fe596cf78 HTTP/1.1
Host: wss://loggregator.system.domain.com:4443
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: [HIDDEN]
Origin: http://localhost
Authorization: [PRIVATE DATA HIDDEN]


WEBSOCKET RESPONSE: [2016-04-15T18:35:22Z]
HTTP/1.1 404 Not Found
Content-Length: 91
Content-Type: text/plain; charset=utf-8
X-Cf-Routererror: unknown_route
X-Content-Type-Options: nosniff
X-Vcap-Request-Id: c8a05edd-a6b9-48d0-6f05-1c0aaa76f45f
Date: Fri, 15 Apr 2016 18:35:23 GMT

FAILED
Error dialing loggregator server: websocket: bad handshake.
Please ask your Cloud Foundry Operator to check the platform configuration (loggregator endpoint is wss://loggregator.system.cflab.deere.com:4443).


We've already tried just restarting the services on the loggregator_trafficcontroller_z1/0 instance, and logging out & back in with the cf-cli... neither helps.

How does this wss://loggregator.system.domain.com:4443 get added to nats? From the loggregator_trafficcontroller_z1/0 instance, the only routes in /var/vcap/jobs/route_registrar/config/registrar_settings.yml are:

routes: [{"name":"doppler","port":8081,"registration_interval":"20s","uris":["doppler.domain.com"]},{"name":"loggregator","port":8080,"registration_interval":"20s","uris":["loggregator.domain.com"]}]

Should there be one for loggregator.system.domain.com? Is this just a mismatch between system_domain and domain? Any troubleshooting tips?

Thanks,
William Bean


Re: Remarks about the “confab” wrapper for consul

Benjamin Gandon
 

As an update, it looks like I’m running into the Node health flapping <https://github.com/hashicorp/consul/issues/1212> issue that is more frequent with consul 0.5.x servers compared to 0.6.x servers.

→ Q1: Are you planning to upgrade the consul version used in CF and Diego from 0.5.2 to 0.6.4 in near future?


Also, people recommend the following settings to mitigate the issue.
"dns_config": {
"allow_stale": true,
"node_ttl": "5s",
"service_ttl": {
"*": "5s"
}
}
I’ll try those and keep you updated with the results next week. Unfortunately, I’ll have to fork the consul-release <https://github.com/cloudfoundry-incubator/consul-release> because those settings are also hardwired to their default <https://github.com/cloudfoundry-incubator/consul-release/blob/master/src/confab/config/consul_config_definer.go#L13-L35> in confab.

→ Q2: Are you planing so update of confab so that people can tweak their consul settings directly from BOSH deployment?


Regarding my previous remark about properly configuring “skip_leave_on_interrupt” and “leave_on_terminate” in confab, I understand that the default value of “true” for “leave_on_terminate” might be necessary to properly scale down a consul cluster with BOSH.

But I saw today that skip_leave_on_interrupt will default to true <https://github.com/hashicorp/consul/blob/master/CHANGELOG.md> for consul servers in the upcoming version 0.7.0. Currently, this config is hard-wired to its default value of “false” in confab.

→ Q3: Are you planning to update this “skip_leave_on_interrupt” config in confab?


/Benjamin

Le 14 avr. 2016 à 17:00, Benjamin Gandon <benjamin(a)gandon.org> a écrit :

Thank you Amit for your answer.


I ran again in the “all-consuls-go-crazy” situation today, as quite every day actually. As soon as they start this flapping membership issue, the whole cf+diego deployment goes down.

Before I delete the content of the persistent storage, when I restart the consul servers, they don’t manage to elect a leader :
https://gist.github.com/bgandon/08707466324be7c9a093a56fd95a64e4 <https://gist.github.com/bgandon/08707466324be7c9a093a56fd95a64e4>

After I delete /var/vcap/store/consul_agent on all 3 consul servers, a consul leader is properly elected, but the cluster rapidly re-start flapping again with failures suspicions, missing acks, and timeouts :
https://gist.github.com/bgandon/cab53c22da66b24beff46389ba7f0bdc <https://gist.github.com/bgandon/cab53c22da66b24beff46389ba7f0bdc>

And at that time, the load of the bosh-ite VM goes up to 280+ and everything becomes very unresponsive.

How is it possible to bring the consul cluster in a healthy state again? I don’t want to reboot the bosh-lite VM and recreate all deployments with cloudchecks anymore.


/Benjamin


Le 11 avr. 2016 à 22:40, Amit Gupta <agupta(a)pivotal.io <mailto:agupta(a)pivotal.io>> a écrit :

Orchestrating a raft cluster in a way that requires no manual intervention is incredibly difficult. We write the PID file late for a specific reason:

https://www.pivotaltracker.com/story/show/112018069
<https://www.pivotaltracker.com/story/show/112018069>

For dealing with wedged states like the one you encountered, we have some recommendations in the documentation:

https://github.com/cloudfoundry-incubator/consul-release/#disaster-recovery <https://github.com/cloudfoundry-incubator/consul-release/#disaster-recovery>

We have acceptance tests we run in CI that exercise rolling a 3 node cluster, so if you hit a failure it would be useful to get logs if you have any.

Cheers,
Amit

On Mon, Apr 11, 2016 at 9:38 AM, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> wrote:
Actually, doing some further tests, I realize a mere 'join' is definitely not enough.

Instead, you need to restore the raft/peers.json on each one of the 3 consul server nodes:

monit stop consul_agent
echo '["10.244.0.58:8300 <http://10.244.0.58:8300/>","10.244.2.54:8300 <http://10.244.2.54:8300/>","10.244.0.54:8300 <http://10.244.0.54:8300/>"]' > /var/vcap/store/consul_agent/raft/peers.json

And make sure you start them quite at the same time with “monit start consul_agent”

So this advocates a strongly for setting skip_leave_on_interrupt=true and leave_on_terminate=false in confab, because loosing the peers.json is really something we don't want in our CF deployments!

/Benjamin


Le 11 avr. 2016 à 18:15, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> a écrit :

Hi cf devs,


I’m running a CF deployment with redundancy, and I just experienced my consul servers not being able to elect any leader.
That’s a VERY frustrating situation that keeps the whole CF deployment down, until you get a deeper understanding of consul, and figure out they just need a silly manual 'join' so that they get back together.

But that was definitely not easy to nail down because at first look, I could just see monit restarting the “agent_ctl” every 60 seconds because confab was not writing the damn PID file.


More specifically, the 3 consul servers (i.e. consul_z1/0, consul_z1/1 and consul_z2/0) had properly left oneanother uppon a graceful shutdown. This state was persisted in /var/vcap/store/raft/peers.json being “null” on each one of them, so they would not get back together on restart. A manual 'join' was necessary. But it took me hours to get there because I’m no expert with consul.

And until the 'join' is made, VerifySynced() was negative in confab, and monit was constantly starting and stopping it every 60 seconds. But once you step back, you realize confab was actually waiting for the new leader to be elected before it writes the PID file. Which is questionable.

So, I’m asking 3 questions here:

1. Does writing the PID file in confab that late really makes sense?
2. Could someone please write some minimal documentation about confab, at least to tell what it is supposed to do?
3. Wouldn’t it be wiser that whenever any of the consul servers is not here, then the cluster gets unhealthy?

With this 3rd question, I mean that even on a graceful TERM or INT, no consul server should not perform any graceful 'leave'. With this different approach, then they would properly be back up even when performing a complete graceful restart of the cluster.

This can be done with those extra configs from the “confab” wrapper:

{
"skip_leave_on_interrupt": true,
"leave_on_terminate": false
}

What do you guys think of it?


/Benjamin


Changes to cf-test-helpers

David Sabeti
 

Hi all,

tl;dr We've introduced a breaking change to cf-test-helpers (https://github.com/cloudfoundry-incubator/cf-test-helpers), so be careful if you decide to update to the newest version in your test suite.

The Release Integration team is making changes to cf-test-helpers, with a few different goals. One is to ensure that cf-test-helpers stop leaking credentials. Another is to push assertions from the test helpers up into the tests themselves. As a result, we've had to re-design a few parts of cf-test-helpers, specifically the `cmdRunner`. The class included a good deal of logic around running sub-processes (like the cf cli), including retries and making assertions on process output. In addition to occasionally leaking credentials, this logic was difficult to work with and should probably have been implemented in test cases rather than a helper package. So, we've removed the `cmdRunner` entirely. If your tests use `cmdRunner` (with the constructor `NewCmdRunner`), you'll need to modify your tests when you upgrade cf-test-helpers.

It's likely that this change will not affect anybody too seriously, since most people are using the package function `Run()` to shell out to sub-processes, and that interface has not changed at all. Still, some of you may be using the `cmdRunner` and deserve a head-up about the change.

If you'd like to update your dependency on cf-test-helpers, and you're using the `cmdRunner` in your tests, please feel free to reach out to the Release Integration team for help in migrating your test code.

David && Dennis
CF Release Integration


Re: Scope Error Insufficient scope for user

Filip Hanik
 

ClientAuthenticationFailure ('Bad credentials'): principal=admin_ui_client

your password for client admin_ui_client is incorrect

On Thu, Apr 14, 2016 at 1:32 PM, V Kumar <vikramvilli(a)gmail.com> wrote:

Recently I started using cloud foundry admin_ui .When I logging in I am
getting Scope Error Insufficient scope for user while logging in.I
followd all the steps in
https://github.com/cloudfoundry-incubator/admin-ui/blob/534fd698ff504c286531022110b6205cf91cd029/README.md#running-with-bosh-lite-cloudfoundry
I am giving correct user name and password while logging in.I checked
UAA.log I found password is not macthing.I Even tried by creating new user
same issue I am getting.Please help me on this

uaa.log
DEBUG --- JdbcTemplate: Executing prepared SQL statement [select
client_id, client_secret, resource_ids, scope, authorized_grant_types,
web_server_redirect_uri, authorities, access_token_validity,
refresh_token_validity, additional_information, autoapprove from
oauth_client_details where client_id = ?]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... DEBUG ---
DaoAuthenticationProvider: Authentication failed: password does not match
stored value
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... INFO ---
Audit: PrincipalAuthenticationFailure ('null'): principal=admin_ui_client,
origin=[10.22.0.82]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... INFO ---
Audit: ClientAuthenticationFailure ('Bad credentials'):
principal=admin_ui_client, origin=[remoteAddress=10.22.0.82,
clientId=admin_ui_client]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... DEBUG ---
BasicAuthenticationFilter: Authentication request for failed:
org.springframework.security.authentication.BadCredentialsException: Bad
credentials


Scope Error Insufficient scope for user

V Kumar
 

Recently I started using cloud foundry admin_ui .When I logging in I am getting Scope Error Insufficient scope for user while logging in.I followd all the steps in https://github.com/cloudfoundry-incubator/admin-ui/blob/534fd698ff504c286531022110b6205cf91cd029/README.md#running-with-bosh-lite-cloudfoundry
I am giving correct user name and password while logging in.I checked UAA.log I found password is not macthing.I Even tried by creating new user same issue I am getting.Please help me on this

uaa.log
DEBUG --- JdbcTemplate: Executing prepared SQL statement [select client_id, client_secret, resource_ids, scope, authorized_grant_types, web_server_redirect_uri, authorities, access_token_validity, refresh_token_validity, additional_information, autoapprove from oauth_client_details where client_id = ?]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... DEBUG --- DaoAuthenticationProvider: Authentication failed: password does not match stored value
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... INFO --- Audit: PrincipalAuthenticationFailure ('null'): principal=admin_ui_client, origin=[10.22.0.82]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... INFO --- Audit: ClientAuthenticationFailure ('Bad credentials'): principal=admin_ui_client, origin=[remoteAddress=10.22.0.82, clientId=admin_ui_client]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... DEBUG --- BasicAuthenticationFilter: Authentication request for failed: org.springframework.security.authentication.BadCredentialsException: Bad credentials

4781 - 4800 of 9425