If this is your first local set-up attempt, I encourage you to do it using bosh-lite. You can find very comprehensive information in the READMEs on setting up bosh-lite < https://github.com/cloudfoundry/bosh-lite#bosh-lite> and deploying CF < https://github.com/cloudfoundry/bosh-lite/blob/master/docs/deploy-cf.md#deploy-cloud-foundry>, and be up and running quickly. If you insisted on a manual set-up and assuming that your UAA is up and running and that there are no obvious errors in the logs, then you can try setting up uaac < http://docs.cloudfoundry.org/adminguide/uaa-user-management.html> to see if your default admin user made it into UAA. And before you can do all that, you may have to add an admin OAuth client to your config/uaa.yml like this one here < https://github.com/cloudfoundry/uaa/blob/091c5e5961dd33c8c7ca5a15f4020e47d266a1c3/uaa/src/test/resources/test/profiles/vcap/uaa.yml#L42> (as I'm not sure what your current set-up looks like). So how about try with bosh-lite first? On Thu, May 28, 2015 at 9:04 PM, Pravin Mishra <pravinmishra88(a)gmail.com> wrote: Hi Ivan,
I have tried, it's not working. I am following this <https://groups.google.com/a/cloudfoundry.org/forum/#!msg/vcap-dev/IC8U-AdtPLg/Mq02EvPsJBAJ> for uaa setup and configuration. As per Elisabeth, default user and password should configuration form config/uaa.yml file. Some how, it's not working for me.
Best Regards, Pravin Mishra
On 29 May 2015 at 08:56, Ivan Sim <ivans(a)activestate.com> wrote:
Hi Pravin,
Have you tried
cf auth admin admin
as suggested in the bosh-lite readme <https://github.com/cloudfoundry/bosh-lite/blob/01db9da7b4122f7d02858d92e0fe938e91256649/docs/deploy-cf.md#try-your-cloud-foundry-deployment>. If you are intending to create other admin users, you will need to use the uaac tool as outlined in the uaac doc <http://docs.cloudfoundry.org/adminguide/uaa-user-management.html#creating-admin-users> .
On Thu, May 28, 2015 at 7:59 PM, Pravin Mishra <pravinmishra88(a)gmail.com> wrote:
Hello All,
I am doing local setup of cloud foundry golang clinet, cloud controller ng, uaa and nats. So far I have configure all and it's communicating to each other. Now when I am doing target:
cf login -a 127.0.0.1:8181
I need default admin user details to login and create user, org and spaces for further investigation.
What is the best way to create default admin user?
Best Regards, Pravin Mishra
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
-- Ivan Sim
-- Ivan Sim
|
|
Pravin Mishra <pravinmishra88@...>
Hi Ivan, I have tried, it's not working. I am following this < https://groups.google.com/a/cloudfoundry.org/forum/#!msg/vcap-dev/IC8U-AdtPLg/Mq02EvPsJBAJ> for uaa setup and configuration. As per Elisabeth, default user and password should configuration form config/uaa.yml file. Some how, it's not working for me. Best Regards, Pravin Mishra
toggle quoted messageShow quoted text
On 29 May 2015 at 08:56, Ivan Sim <ivans(a)activestate.com> wrote: Hi Pravin,
Have you tried
cf auth admin admin
as suggested in the bosh-lite readme <https://github.com/cloudfoundry/bosh-lite/blob/01db9da7b4122f7d02858d92e0fe938e91256649/docs/deploy-cf.md#try-your-cloud-foundry-deployment>. If you are intending to create other admin users, you will need to use the uaac tool as outlined in the uaac doc <http://docs.cloudfoundry.org/adminguide/uaa-user-management.html#creating-admin-users> .
On Thu, May 28, 2015 at 7:59 PM, Pravin Mishra <pravinmishra88(a)gmail.com> wrote:
Hello All,
I am doing local setup of cloud foundry golang clinet, cloud controller ng, uaa and nats. So far I have configure all and it's communicating to each other. Now when I am doing target:
cf login -a 127.0.0.1:8181
I need default admin user details to login and create user, org and spaces for further investigation.
What is the best way to create default admin user?
Best Regards, Pravin Mishra
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
-- Ivan Sim
|
|
Hi Pravin, Have you tried cf auth admin admin as suggested in the bosh-lite readme < https://github.com/cloudfoundry/bosh-lite/blob/01db9da7b4122f7d02858d92e0fe938e91256649/docs/deploy-cf.md#try-your-cloud-foundry-deployment>. If you are intending to create other admin users, you will need to use the uaac tool as outlined in the uaac doc < http://docs.cloudfoundry.org/adminguide/uaa-user-management.html#creating-admin-users> . On Thu, May 28, 2015 at 7:59 PM, Pravin Mishra <pravinmishra88(a)gmail.com> wrote: Hello All,
I am doing local setup of cloud foundry golang clinet, cloud controller ng, uaa and nats. So far I have configure all and it's communicating to each other. Now when I am doing target:
cf login -a 127.0.0.1:8181
I need default admin user details to login and create user, org and spaces for further investigation.
What is the best way to create default admin user?
Best Regards, Pravin Mishra
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
-- Ivan Sim
|
|
Pravin Mishra <pravinmishra88@...>
Hello All,
I am doing local setup of cloud foundry golang clinet, cloud controller ng, uaa and nats. So far I have configure all and it's communicating to each other. Now when I am doing target:
cf login -a 127.0.0.1:8181
I need default admin user details to login and create user, org and spaces for further investigation.
What is the best way to create default admin user?
Best Regards, Pravin Mishra
|
|
Re: Setting Org Manager via API
Hi Daniel, The purpose of the Associate User with the Organization endpoint is to add a user to an organization. This endpoint needs to be called before a user can be added to the organization's spaces. This is why the CLI automatically adds the user to the organization's user list. Associate Managed Organization with the User is the same call as Associate Manager with the Organization < http://apidocs.cloudfoundry.org/210/organizations/associate_manager_with_the_organization.html>, but it uses the relation starting from the user rather than the organization. We don't recommend using this endpoint, as only admins have full access to the users list, and they may not be able to look the user up this way. We have stories in our backlog to address this. The CF Runtime Team, Utako && Dan On Thu, May 28, 2015 at 12:38 AM, Daniel Jones < daniel.jones(a)engineerbetter.com> wrote: Hi all,
I'm working on some automation for my client to declaratively configure orgs and spaces across multiple Cloud Foundry instances (hopefully they'll permit open-sourcing this).
I erroneously tried to set a user as an OrgManager by first calling Associate Managed Organization with the User <http://apidocs.cloudfoundry.org/210/users/associate_managed_organization_with_the_user.html>; after getting InvalidRelation errors I used CF_TRACE to spy on the CLI, and realised that it instead uses Associate Manager with the Organization <http://apidocs.cloudfoundry.org/210/organizations/associate_manager_with_the_organization.html> .
I've got a few questions:
- What's the purpose of the Associate User with the Organization <http://apidocs.cloudfoundry.org/210/organizations/associate_user_with_the_organization.html> CC API call?
- If I don't call Associate User with the Organization <http://apidocs.cloudfoundry.org/210/organizations/associate_user_with_the_organization.html>, what effects can I expect to see?
- Is Associate User with the Organization <http://apidocs.cloudfoundry.org/210/organizations/associate_user_with_the_organization.html> something that only exists for the benefit of the Pivotal console app?
- What's the correct usage of Associate Managed Organization with the User <http://apidocs.cloudfoundry.org/210/users/associate_managed_organization_with_the_user.html> ?
Admin Adding a user to an org Adding user as manager to org Adding user to manager list Not an admin Many thanks in advance.
-- Regards,
Daniel Jones EngineerBetter.com
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: api and api_worker jobs fail to bosh update, but monit start OK

Guillaume Berche
toggle quoted messageShow quoted text
On Thu, May 28, 2015 at 7:57 PM, Mike Youngstrom <youngm(a)gmail.com> wrote: We discovered this fix out of sheer luck. So, no help there sorry. :)
Mike
On Thu, May 28, 2015 at 1:56 AM, Guillaume Berche <bercheg(a)gmail.com> wrote:
Thanks a lot Mike and Dieu. Indeed moving the nfs_mounter last seemed indeed to fix the issue in v207. If ever this reproduces on master and can help, I submitted https://github.com/cloudfoundry/cf-release/pull/689 against develop branch.
Out of curiosity, and for improving my next diagnostic task, how was the root cause diagnosed? I was not observing any faulty output traces into jobs outputs: [...]/cloud_controller_worker_ctl.log, /var/vcap/sys/log/cloud_controller_ng_ctl.err.log or [...]/cloud_controller_ng/cloud_controller_ng.log
@Dieu, is there a way the runtime pipelines output could be shared with the community (of course hiding sensitive data), as to help the community better understand which case went through the automated cases and report issues on different settings? E.g. a public concourse job for the pipeline runing stemcell 2977 (runtime-bb-2 ?).
Thanks,
Guillaume.
On Wed, May 27, 2015 at 7:55 PM, Dieu Cao <dcao(a)pivotal.io> wrote:
We have environments on stemcell 2977 that are running well.
We have an environment using NFS that ran into that same issue and we have this bug open. [1] Specifying the nfs_mounter job last should work in the mean time until we get the order switched. This was apparently introduced when we added consul_agent to the cloud controller jobs. I'll update the release notes for the affected releases.
-Dieu CF Runtime PM
[1] https://www.pivotaltracker.com/story/show/94152506
On Wed, May 27, 2015 at 10:09 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
We recently experienced a similar issue. Not sure if it is the same. But it was caused when we moved the nfs_mounter job template to the first item in the list of templates for the CC job. We moved nfs_mounter to the last job template in the list and we haven't had a problem since. It was really strange cause you think you'd want nfs_mounter first. Anyway, something to try.
Mike
On Wed, May 27, 2015 at 4:51 AM, Guillaume Berche <bercheg(a)gmail.com> wrote:
Hi,
I'm experiencing a weird situation where api and api_worker jobs fail to update through bosh and end up being reported as "not running". However, manually running "monit start cloud_controller_ng" (or rebooting the vm), the faulty jobs starts fine, and bosh deployment proceeds without errors. Looking at monit logs, it seems that there is an extra monit stop request for the cc_ng.
Below are detailed traces illustrating the issue.
$ bosh deploy
[..] Started updating job ha_proxy_z1 > ha_proxy_z1/0 (canary). Done (00:00:39) Started updating job api_z1 > api_z1/0 (canary). Failed: `api_z1/0' is not running after update (00:10:44)
When instructing bosh to update the job (in this case only a config change), we indeed see the bosh agent asking monit to stop jobs, restart monit itself, start jobs, and then we see the extra stop (at* 12:33:26) *before the bosh director ends up timeouting and calling the canary failed.
$ less /var/vcap/monit/monit.log
[UTC May 22 12:33:17] info : Awakened by User defined signal 1[UTC May 22 12:33:17] info : Awakened by the SIGHUP signal[UTC May 22 12:33:17] info : Reinitializing monit - Control file '/var/vcap/bosh/etc/monitrc'[UTC May 22 12:33:17] info : Shutting down monit HTTP server[UTC May 22 12:33:18] info : monit HTTP server stopped[UTC May 22 12:33:18] info : Starting monit HTTP server at [127.0.0.1:2822][UTC May 22 12:33:18] info : monit HTTP server started[UTC May 22 12:33:18] info : 'system_897cdb8d-f9f7-4bfa-a748-512489b676e0' Monit reloaded[UTC May 22 12:33:23] info : start service 'consul_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : Awakened by User defined signal 1[UTC May 22 12:33:23] info : 'consul_agent' start: /var/vcap/jobs/consul_agent/bin/agent_ctl[UTC May 22 12:33:23] info : start service 'nfs_mounter' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'metron_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'cloud_controller_worker_1' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:24] info : 'consul_agent' start action done[UTC May 22 12:33:24] info : 'nfs_mounter' start: /var/vcap/jobs/nfs_mounter/bin/nfs_mounter_ctl[UTC May 22 12:33:24] info : 'cloud_controller_worker_1' start: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl*[UTC May 22 12:33:25] info : 'cloud_controller_worker_1' start action done *[UTC May 22 12:33:25] info : 'metron_agent' start: /var/vcap/jobs/metron_agent/bin/metron_agent_ctl[UTC May 22 12:33:26] info : 'metron_agent' start action done*[UTC May 22 12:33:26] info : 'cloud_controller_worker_1' stop: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl *[UTC May 22 12:33:27] info : 'nfs_mounter' start action done[UTC May 22 12:33:27] info : Awakened by User defined signal 1
There is no associated traces of the bosh agent asking this extra stop:
$ less /var/vcap/bosh/log/current 2015-05-22_12:33:23.73606 [monitJobSupervisor] 2015/05/22 12:33:23 DEBUG - Starting service cloud_controller_worker_12015-05-22_12:33:23.73608 [http-client] 2015/05/22 12:33:23 DEBUG - Monit request: url='http://127.0.0.1:2822/cloud_controller_worker_1' body='action=start'2015-05-22_12:33:23.73608 [attemptRetryStrategy] 2015/05/22 12:33:23 DEBUG - Making attempt #02015-05-22_12:33:23.73609 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Requesting (attempt=1): Request{ Method: 'POST', URL: 'http://127.0.0.1:2822/cloud_controller_worker_1' }2015-05-22_12:33:23.73647 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Request succeeded (attempts=1), response: Response{ StatusCode: 200, Status: '200 OK'}2015-05-22_12:33:23.73648 [MBus Handler] 2015/05/22 12:33:23 INFO - Responding2015-05-22_12:33:23.73650 [MBus Handler] 2015/05/22 12:33:23 DEBUG - Payload2015-05-22_12:33:23.73650 ********************2015-05-22_12:33:23.73651 {"value":"started"}2015-05-22_12:33:23.73651 ******************** 2015-05-22_12:33:36.69397 [NATS Handler] 2015/05/22 12:33:36 DEBUG - Message Payload2015-05-22_12:33:36.69397 ********************2015-05-22_12:33:36.69397 {"job":"api_worker_z1","index":0,"job_state":"failing","vitals":{"cpu":{"sys":"6.5","user":"14.4","wait":"0.4"},"disk":{"ephemeral":{"inode_percent":"10","percent":"14"},"persistent":{"inode_percent":"36","percent":"48"},"system":{"inode_percent":"36","percent":"48"}},"load":["0.19","0.06","0.06"],"mem":{"kb":"81272","percent":"8"},"swap":{"kb":"0","percent":"0"}}}
This is reproducing systematically on our set up using bosh release 152 with stemcell bosh-vcloud-esxi-ubuntu-trusty-go_agent version 2889, and cf release 207 running stemcell 2889.
Enabling monit verbose logs discarded the theory of monit restarting cc_ng jobs because of too much ram usage, or failed http health check (along with the short time window in which the extra stop is requested: ~15s). I also discarded possibility of multiple monit instances, or pid inconsistency with cc_ng process. I'm now suspecting either the bosh agent to send extra stop request, or something with the cc_ng ctl scripts.
As a side question, can someone explain how the cc_ng ctl script works, I'm suprised with the following process tree, where ruby seems to call the ctl script. Is the cc spawning it self ?
$ ps auxf --cols=2000 | less [...] vcap 8011 0.6 7.4 793864 299852 ? S<l May26 6:01 ruby /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/bin/cloud_controller -m -c /var/vcap/jobs/cloud_controller_ng/config/cloud_controller_ng.yml root 8014 0.0 0.0 19596 1436 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8023 0.0 0.0 5924 1828 ? S< May26 0:00 | \_ tee -a /dev/fd/63 root 8037 0.0 0.0 19600 1696 ? S< May26 0:00 | | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8061 0.0 0.0 5916 1924 ? S< May26 0:00 | | \_ logger -p user.info -t vcap.cloud_controller_ng_ctl.stdout root 8024 0.0 0.0 7552 1788 ? S< May26 0:00 | \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) } root 8015 0.0 0.0 19600 1440 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8021 0.0 0.0 5924 1832 ? S< May26 0:00 \_ tee -a /dev/fd/63 root 8033 0.0 0.0 19600 1696 ? S< May26 0:00 | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8060 0.0 0.0 5912 1920 ? S< May26 0:00 | \_ logger -p user.error -t vcap.cloud_controller_ng_ctl.stderr root 8022 0.0 0.0 7552 1748 ? S< May26 0:00 \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) }
I was wondering whether this could come from our setting running CF with a more recent stemcell version (2922) than what the cf release notes are mentionning as "tested configuration". Are the latest stemcells tested against latest CF release ? Is there any way to see what stemcell version the runtime team pipelines is using [1] seemed to accept env vars and [2] required logging in ? I scanned through the bosh agent commit logs to spot something related but without luck so far.
Thanks in advance for your help,
Guillaume.
[1] https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh <https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh> [2] https://concourse.diego-ci.cf-app.com/ <https://concourse.diego-ci.cf-app.com/>
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: api and api_worker jobs fail to bosh update, but monit start OK
We discovered this fix out of sheer luck. So, no help there sorry. :)
Mike
toggle quoted messageShow quoted text
On Thu, May 28, 2015 at 1:56 AM, Guillaume Berche <bercheg(a)gmail.com> wrote: Thanks a lot Mike and Dieu. Indeed moving the nfs_mounter last seemed indeed to fix the issue in v207. If ever this reproduces on master and can help, I submitted https://github.com/cloudfoundry/cf-release/pull/689 against develop branch.
Out of curiosity, and for improving my next diagnostic task, how was the root cause diagnosed? I was not observing any faulty output traces into jobs outputs: [...]/cloud_controller_worker_ctl.log, /var/vcap/sys/log/cloud_controller_ng_ctl.err.log or [...]/cloud_controller_ng/cloud_controller_ng.log
@Dieu, is there a way the runtime pipelines output could be shared with the community (of course hiding sensitive data), as to help the community better understand which case went through the automated cases and report issues on different settings? E.g. a public concourse job for the pipeline runing stemcell 2977 (runtime-bb-2 ?).
Thanks,
Guillaume.
On Wed, May 27, 2015 at 7:55 PM, Dieu Cao <dcao(a)pivotal.io> wrote:
We have environments on stemcell 2977 that are running well.
We have an environment using NFS that ran into that same issue and we have this bug open. [1] Specifying the nfs_mounter job last should work in the mean time until we get the order switched. This was apparently introduced when we added consul_agent to the cloud controller jobs. I'll update the release notes for the affected releases.
-Dieu CF Runtime PM
[1] https://www.pivotaltracker.com/story/show/94152506
On Wed, May 27, 2015 at 10:09 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
We recently experienced a similar issue. Not sure if it is the same. But it was caused when we moved the nfs_mounter job template to the first item in the list of templates for the CC job. We moved nfs_mounter to the last job template in the list and we haven't had a problem since. It was really strange cause you think you'd want nfs_mounter first. Anyway, something to try.
Mike
On Wed, May 27, 2015 at 4:51 AM, Guillaume Berche <bercheg(a)gmail.com> wrote:
Hi,
I'm experiencing a weird situation where api and api_worker jobs fail to update through bosh and end up being reported as "not running". However, manually running "monit start cloud_controller_ng" (or rebooting the vm), the faulty jobs starts fine, and bosh deployment proceeds without errors. Looking at monit logs, it seems that there is an extra monit stop request for the cc_ng.
Below are detailed traces illustrating the issue.
$ bosh deploy
[..] Started updating job ha_proxy_z1 > ha_proxy_z1/0 (canary). Done (00:00:39) Started updating job api_z1 > api_z1/0 (canary). Failed: `api_z1/0' is not running after update (00:10:44)
When instructing bosh to update the job (in this case only a config change), we indeed see the bosh agent asking monit to stop jobs, restart monit itself, start jobs, and then we see the extra stop (at* 12:33:26) *before the bosh director ends up timeouting and calling the canary failed.
$ less /var/vcap/monit/monit.log
[UTC May 22 12:33:17] info : Awakened by User defined signal 1[UTC May 22 12:33:17] info : Awakened by the SIGHUP signal[UTC May 22 12:33:17] info : Reinitializing monit - Control file '/var/vcap/bosh/etc/monitrc'[UTC May 22 12:33:17] info : Shutting down monit HTTP server[UTC May 22 12:33:18] info : monit HTTP server stopped[UTC May 22 12:33:18] info : Starting monit HTTP server at [127.0.0.1:2822][UTC May 22 12:33:18] info : monit HTTP server started[UTC May 22 12:33:18] info : 'system_897cdb8d-f9f7-4bfa-a748-512489b676e0' Monit reloaded[UTC May 22 12:33:23] info : start service 'consul_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : Awakened by User defined signal 1[UTC May 22 12:33:23] info : 'consul_agent' start: /var/vcap/jobs/consul_agent/bin/agent_ctl[UTC May 22 12:33:23] info : start service 'nfs_mounter' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'metron_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'cloud_controller_worker_1' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:24] info : 'consul_agent' start action done[UTC May 22 12:33:24] info : 'nfs_mounter' start: /var/vcap/jobs/nfs_mounter/bin/nfs_mounter_ctl[UTC May 22 12:33:24] info : 'cloud_controller_worker_1' start: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl*[UTC May 22 12:33:25] info : 'cloud_controller_worker_1' start action done *[UTC May 22 12:33:25] info : 'metron_agent' start: /var/vcap/jobs/metron_agent/bin/metron_agent_ctl[UTC May 22 12:33:26] info : 'metron_agent' start action done*[UTC May 22 12:33:26] info : 'cloud_controller_worker_1' stop: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl *[UTC May 22 12:33:27] info : 'nfs_mounter' start action done[UTC May 22 12:33:27] info : Awakened by User defined signal 1
There is no associated traces of the bosh agent asking this extra stop:
$ less /var/vcap/bosh/log/current 2015-05-22_12:33:23.73606 [monitJobSupervisor] 2015/05/22 12:33:23 DEBUG - Starting service cloud_controller_worker_12015-05-22_12:33:23.73608 [http-client] 2015/05/22 12:33:23 DEBUG - Monit request: url='http://127.0.0.1:2822/cloud_controller_worker_1' body='action=start'2015-05-22_12:33:23.73608 [attemptRetryStrategy] 2015/05/22 12:33:23 DEBUG - Making attempt #02015-05-22_12:33:23.73609 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Requesting (attempt=1): Request{ Method: 'POST', URL: 'http://127.0.0.1:2822/cloud_controller_worker_1' }2015-05-22_12:33:23.73647 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Request succeeded (attempts=1), response: Response{ StatusCode: 200, Status: '200 OK'}2015-05-22_12:33:23.73648 [MBus Handler] 2015/05/22 12:33:23 INFO - Responding2015-05-22_12:33:23.73650 [MBus Handler] 2015/05/22 12:33:23 DEBUG - Payload2015-05-22_12:33:23.73650 ********************2015-05-22_12:33:23.73651 {"value":"started"}2015-05-22_12:33:23.73651 ******************** 2015-05-22_12:33:36.69397 [NATS Handler] 2015/05/22 12:33:36 DEBUG - Message Payload2015-05-22_12:33:36.69397 ********************2015-05-22_12:33:36.69397 {"job":"api_worker_z1","index":0,"job_state":"failing","vitals":{"cpu":{"sys":"6.5","user":"14.4","wait":"0.4"},"disk":{"ephemeral":{"inode_percent":"10","percent":"14"},"persistent":{"inode_percent":"36","percent":"48"},"system":{"inode_percent":"36","percent":"48"}},"load":["0.19","0.06","0.06"],"mem":{"kb":"81272","percent":"8"},"swap":{"kb":"0","percent":"0"}}}
This is reproducing systematically on our set up using bosh release 152 with stemcell bosh-vcloud-esxi-ubuntu-trusty-go_agent version 2889, and cf release 207 running stemcell 2889.
Enabling monit verbose logs discarded the theory of monit restarting cc_ng jobs because of too much ram usage, or failed http health check (along with the short time window in which the extra stop is requested: ~15s). I also discarded possibility of multiple monit instances, or pid inconsistency with cc_ng process. I'm now suspecting either the bosh agent to send extra stop request, or something with the cc_ng ctl scripts.
As a side question, can someone explain how the cc_ng ctl script works, I'm suprised with the following process tree, where ruby seems to call the ctl script. Is the cc spawning it self ?
$ ps auxf --cols=2000 | less [...] vcap 8011 0.6 7.4 793864 299852 ? S<l May26 6:01 ruby /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/bin/cloud_controller -m -c /var/vcap/jobs/cloud_controller_ng/config/cloud_controller_ng.yml root 8014 0.0 0.0 19596 1436 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8023 0.0 0.0 5924 1828 ? S< May26 0:00 | \_ tee -a /dev/fd/63 root 8037 0.0 0.0 19600 1696 ? S< May26 0:00 | | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8061 0.0 0.0 5916 1924 ? S< May26 0:00 | | \_ logger -p user.info -t vcap.cloud_controller_ng_ctl.stdout root 8024 0.0 0.0 7552 1788 ? S< May26 0:00 | \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) } root 8015 0.0 0.0 19600 1440 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8021 0.0 0.0 5924 1832 ? S< May26 0:00 \_ tee -a /dev/fd/63 root 8033 0.0 0.0 19600 1696 ? S< May26 0:00 | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8060 0.0 0.0 5912 1920 ? S< May26 0:00 | \_ logger -p user.error -t vcap.cloud_controller_ng_ctl.stderr root 8022 0.0 0.0 7552 1748 ? S< May26 0:00 \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) }
I was wondering whether this could come from our setting running CF with a more recent stemcell version (2922) than what the cf release notes are mentionning as "tested configuration". Are the latest stemcells tested against latest CF release ? Is there any way to see what stemcell version the runtime team pipelines is using [1] seemed to accept env vars and [2] required logging in ? I scanned through the bosh agent commit logs to spot something related but without luck so far.
Thanks in advance for your help,
Guillaume.
[1] https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh <https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh> [2] https://concourse.diego-ci.cf-app.com/ <https://concourse.diego-ci.cf-app.com/>
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: api and api_worker jobs fail to bosh update, but monit start OK
Hey Guillaume,
To be fair, this did take some time to figure out on our end. It has to do with the interaction between BOSH and Monit. As Dieu mentioned, runtime has a story to potentially determine a better solution than "nfs_mounter must be last in the list."
Nick
toggle quoted messageShow quoted text
On Thursday, May 28, 2015, Guillaume Berche <bercheg(a)gmail.com> wrote: Oups, sorry, I had probably overlooked this faulty trace that should have hinted me.
[2015-05-26 16:37:51+0000] ------------ STARTING cloud_controller_ng_ctl at Tue May 26 16:37:51 UTC 2015 -------------- [2015-05-26 16:37:51+0000] chown: changing ownership of ‘/var/vcap/nfs/shared’: Operation not permitted
On Thu, May 28, 2015 at 9:56 AM, Guillaume Berche <bercheg(a)gmail.com <javascript:_e(%7B%7D,'cvml','bercheg(a)gmail.com');>> wrote:
Thanks a lot Mike and Dieu. Indeed moving the nfs_mounter last seemed indeed to fix the issue in v207. If ever this reproduces on master and can help, I submitted https://github.com/cloudfoundry/cf-release/pull/689 against develop branch.
Out of curiosity, and for improving my next diagnostic task, how was the root cause diagnosed? I was not observing any faulty output traces into jobs outputs: [...]/cloud_controller_worker_ctl.log, /var/vcap/sys/log/cloud_controller_ng_ctl.err.log or [...]/cloud_controller_ng/cloud_controller_ng.log
@Dieu, is there a way the runtime pipelines output could be shared with the community (of course hiding sensitive data), as to help the community better understand which case went through the automated cases and report issues on different settings? E.g. a public concourse job for the pipeline runing stemcell 2977 (runtime-bb-2 ?).
Thanks,
Guillaume.
On Wed, May 27, 2015 at 7:55 PM, Dieu Cao <dcao(a)pivotal.io <javascript:_e(%7B%7D,'cvml','dcao(a)pivotal.io');>> wrote:
We have environments on stemcell 2977 that are running well.
We have an environment using NFS that ran into that same issue and we have this bug open. [1] Specifying the nfs_mounter job last should work in the mean time until we get the order switched. This was apparently introduced when we added consul_agent to the cloud controller jobs. I'll update the release notes for the affected releases.
-Dieu CF Runtime PM
[1] https://www.pivotaltracker.com/story/show/94152506
On Wed, May 27, 2015 at 10:09 AM, Mike Youngstrom <youngm(a)gmail.com <javascript:_e(%7B%7D,'cvml','youngm(a)gmail.com');>> wrote:
We recently experienced a similar issue. Not sure if it is the same. But it was caused when we moved the nfs_mounter job template to the first item in the list of templates for the CC job. We moved nfs_mounter to the last job template in the list and we haven't had a problem since. It was really strange cause you think you'd want nfs_mounter first. Anyway, something to try.
Mike
On Wed, May 27, 2015 at 4:51 AM, Guillaume Berche <bercheg(a)gmail.com <javascript:_e(%7B%7D,'cvml','bercheg(a)gmail.com');>> wrote:
Hi,
I'm experiencing a weird situation where api and api_worker jobs fail to update through bosh and end up being reported as "not running". However, manually running "monit start cloud_controller_ng" (or rebooting the vm), the faulty jobs starts fine, and bosh deployment proceeds without errors. Looking at monit logs, it seems that there is an extra monit stop request for the cc_ng.
Below are detailed traces illustrating the issue.
$ bosh deploy
[..] Started updating job ha_proxy_z1 > ha_proxy_z1/0 (canary). Done (00:00:39) Started updating job api_z1 > api_z1/0 (canary). Failed: `api_z1/0' is not running after update (00:10:44)
When instructing bosh to update the job (in this case only a config change), we indeed see the bosh agent asking monit to stop jobs, restart monit itself, start jobs, and then we see the extra stop (at* 12:33:26) *before the bosh director ends up timeouting and calling the canary failed.
$ less /var/vcap/monit/monit.log
[UTC May 22 12:33:17] info : Awakened by User defined signal 1[UTC May 22 12:33:17] info : Awakened by the SIGHUP signal[UTC May 22 12:33:17] info : Reinitializing monit - Control file '/var/vcap/bosh/etc/monitrc'[UTC May 22 12:33:17] info : Shutting down monit HTTP server[UTC May 22 12:33:18] info : monit HTTP server stopped[UTC May 22 12:33:18] info : Starting monit HTTP server at [127.0.0.1:2822][UTC May 22 12:33:18] info : monit HTTP server started[UTC May 22 12:33:18] info : 'system_897cdb8d-f9f7-4bfa-a748-512489b676e0' Monit reloaded[UTC May 22 12:33:23] info : start service 'consul_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : Awakened by User defined signal 1[UTC May 22 12:33:23] info : 'consul_agent' start: /var/vcap/jobs/consul_agent/bin/agent_ctl[UTC May 22 12:33:23] info : start service 'nfs_mounter' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'metron_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'cloud_controller_worker_1' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:24] info : 'consul_agent' start action done[UTC May 22 12:33:24] info : 'nfs_mounter' start: /var/vcap/jobs/nfs_mounter/bin/nfs_mounter_ctl[UTC May 22 12:33:24] info : 'cloud_controller_worker_1' start: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl*[UTC May 22 12:33:25] info : 'cloud_controller_worker_1' start action done *[UTC May 22 12:33:25] info : 'metron_agent' start: /var/vcap/jobs/metron_agent/bin/metron_agent_ctl[UTC May 22 12:33:26] info : 'metron_agent' start action done*[UTC May 22 12:33:26] info : 'cloud_controller_worker_1' stop: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl *[UTC May 22 12:33:27] info : 'nfs_mounter' start action done[UTC May 22 12:33:27] info : Awakened by User defined signal 1
There is no associated traces of the bosh agent asking this extra stop:
$ less /var/vcap/bosh/log/current 2015-05-22_12:33:23.73606 [monitJobSupervisor] 2015/05/22 12:33:23 DEBUG - Starting service cloud_controller_worker_12015-05-22_12:33:23.73608 [http-client] 2015/05/22 12:33:23 DEBUG - Monit request: url='http://127.0.0.1:2822/cloud_controller_worker_1' body='action=start'2015-05-22_12:33:23.73608 [attemptRetryStrategy] 2015/05/22 12:33:23 DEBUG - Making attempt #02015-05-22_12:33:23.73609 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Requesting (attempt=1): Request{ Method: 'POST', URL: 'http://127.0.0.1:2822/cloud_controller_worker_1' }2015-05-22_12:33:23.73647 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Request succeeded (attempts=1), response: Response{ StatusCode: 200, Status: '200 OK'}2015-05-22_12:33:23.73648 [MBus Handler] 2015/05/22 12:33:23 INFO - Responding2015-05-22_12:33:23.73650 [MBus Handler] 2015/05/22 12:33:23 DEBUG - Payload2015-05-22_12:33:23.73650 ********************2015-05-22_12:33:23.73651 {"value":"started"}2015-05-22_12:33:23.73651 ******************** 2015-05-22_12:33:36.69397 [NATS Handler] 2015/05/22 12:33:36 DEBUG - Message Payload2015-05-22_12:33:36.69397 ********************2015-05-22_12:33:36.69397 {"job":"api_worker_z1","index":0,"job_state":"failing","vitals":{"cpu":{"sys":"6.5","user":"14.4","wait":"0.4"},"disk":{"ephemeral":{"inode_percent":"10","percent":"14"},"persistent":{"inode_percent":"36","percent":"48"},"system":{"inode_percent":"36","percent":"48"}},"load":["0.19","0.06","0.06"],"mem":{"kb":"81272","percent":"8"},"swap":{"kb":"0","percent":"0"}}}
This is reproducing systematically on our set up using bosh release 152 with stemcell bosh-vcloud-esxi-ubuntu-trusty-go_agent version 2889, and cf release 207 running stemcell 2889.
Enabling monit verbose logs discarded the theory of monit restarting cc_ng jobs because of too much ram usage, or failed http health check (along with the short time window in which the extra stop is requested: ~15s). I also discarded possibility of multiple monit instances, or pid inconsistency with cc_ng process. I'm now suspecting either the bosh agent to send extra stop request, or something with the cc_ng ctl scripts.
As a side question, can someone explain how the cc_ng ctl script works, I'm suprised with the following process tree, where ruby seems to call the ctl script. Is the cc spawning it self ?
$ ps auxf --cols=2000 | less [...] vcap 8011 0.6 7.4 793864 299852 ? S<l May26 6:01 ruby /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/bin/cloud_controller -m -c /var/vcap/jobs/cloud_controller_ng/config/cloud_controller_ng.yml root 8014 0.0 0.0 19596 1436 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8023 0.0 0.0 5924 1828 ? S< May26 0:00 | \_ tee -a /dev/fd/63 root 8037 0.0 0.0 19600 1696 ? S< May26 0:00 | | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8061 0.0 0.0 5916 1924 ? S< May26 0:00 | | \_ logger -p user.info -t vcap.cloud_controller_ng_ctl.stdout root 8024 0.0 0.0 7552 1788 ? S< May26 0:00 | \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) } root 8015 0.0 0.0 19600 1440 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8021 0.0 0.0 5924 1832 ? S< May26 0:00 \_ tee -a /dev/fd/63 root 8033 0.0 0.0 19600 1696 ? S< May26 0:00 | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8060 0.0 0.0 5912 1920 ? S< May26 0:00 | \_ logger -p user.error -t vcap.cloud_controller_ng_ctl.stderr root 8022 0.0 0.0 7552 1748 ? S< May26 0:00 \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) }
I was wondering whether this could come from our setting running CF with a more recent stemcell version (2922) than what the cf release notes are mentionning as "tested configuration". Are the latest stemcells tested against latest CF release ? Is there any way to see what stemcell version the runtime team pipelines is using [1] seemed to accept env vars and [2] required logging in ? I scanned through the bosh agent commit logs to spot something related but without luck so far.
Thanks in advance for your help,
Guillaume.
[1] https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh <https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh> [2] https://concourse.diego-ci.cf-app.com/ <https://concourse.diego-ci.cf-app.com/>
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org <javascript:_e(%7B%7D,'cvml','cf-dev(a)lists.cloudfoundry.org');> https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org <javascript:_e(%7B%7D,'cvml','cf-dev(a)lists.cloudfoundry.org');> https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org <javascript:_e(%7B%7D,'cvml','cf-dev(a)lists.cloudfoundry.org');> https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: Custom Login Server with UAA 2.0+
toggle quoted messageShow quoted text
On Thu, May 28, 2015 at 8:38 AM, Filip Hanik <fhanik(a)pivotal.io> wrote: hi Bob, you can still leverage an outside login server, none of the endpoints to accommodate have changed. We are 100% backwards compatible. I don't believe that we ever stated that we would not use the login.<domain> entry, as we used it with the previous login server. The name was never up for grabs as we weren't even aware that it had been recycled in outside installation.
Now, with that being said, I don't think there is an issue with freeing that name up as a configuration option. That's just a matter of a request being made, a story created and implemented.
Filip
On Thu, May 28, 2015 at 9:33 AM, Bob Brumfield <bob.brumfield(a)gmail.com> wrote:
Filip,
When the uaa and login were merged it was state that replacing replacing the login server would remain a supported scenario and this seems a departure from that statement. At the very least, its more complex that it was pre-merge and requires us to modify and issue our own version of cf-release which we've tried to avoid as much as possible.
Is there some other approach we should be taking here that we're missing?
Thanks,
Bob Brumfield
On Thu, May 28, 2015 at 6:49 AM, Filip Hanik <fhanik(a)pivotal.io> wrote:
hi Sree and Matt,
Matt is actually not referring to the wild cards. He wants the login.<domain> for his own application.
Matt, at this time we are claiming that domain name, as we did with the login job. We just moved it from one job to another. You may certainly take it out of the cf-registrar script and use it yourself. It is not a configuration that we have tested yet, but I don't foresee that you run into any major challenges. There may be some additional settings that you may have to tinker with
https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/login.yml.erb#L87-L89
You can correspond with Sree, if there is a need for us to completely free up the 'login' sub domain
Filip
On Thu, May 28, 2015 at 7:44 AM, Sree Tummidi <stummidi(a)pivotal.io> wrote:
Hi Matt, This new wild card route pattern was introduced for multi-tenancy in UAA post merge. Anything before login or uaa in the URL is now treated as a zone subdomain and the zone context is derived from it.
We will have to look into various approaches to solve this because even if you take over the login subdomain there is possibility for the code to misinterpret the url as a zone specific one.
Let me discuss this with the team and get back to you with possible solutions for the same.
Thanks, Sree
Sent from my iPad
On May 27, 2015, at 9:58 PM, Matt Cholick <cholick(a)gmail.com> wrote:
Prior to the consolidation of uaa and the login server in uaa release 2.0, we were running our own login server to handle auth to our platform. We simply reduced the instance of the bundled CF login server to 0 and put our own in place, which snagged the login subdomain. This worked just fine; our solution implemented all the needed endpoints to login.
We're now upgrading to a newer release with uaa 2.0+ and having difficulties. The uaa registrar hardcodes grabbing the login subdomains: ... - login.<%= properties.domain %> - '*.login.<%= properties.domain %>' ...
See:
https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/cf-registrar.config.yml.erb
This prevents us from taking over login. We locally removed those list items and our custom login server does continue to work. We have some questions about the right approach going forward though.
Are uaa and the login server going to continue to merge: to the point where we can no longer take over the login subdomain? Will this strategy no longer be feasible? What's the right answer non ldap/saml environments, if the uaa project's roadmap makes this replacement impossible?
If our current solution will continue to work for the foreseeable future, would the uaa team be amenable to a pull-request making the uri values configurable, so we can continue to take over the login subdomain?
-Matt Cholick
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: Custom Login Server with UAA 2.0+
hi Bob, you can still leverage an outside login server, none of the endpoints to accommodate have changed. We are 100% backwards compatible. I don't believe that we ever stated that we would not use the login.<domain> entry, as we used it with the previous login server. The name was never up for grabs as we weren't even aware that it had been recycled in outside installation. Now, with that being said, I don't think there is an issue with freeing that name up as a configuration option. That's just a matter of a request being made, a story created and implemented. Filip On Thu, May 28, 2015 at 9:33 AM, Bob Brumfield <bob.brumfield(a)gmail.com> wrote: Filip,
When the uaa and login were merged it was state that replacing replacing the login server would remain a supported scenario and this seems a departure from that statement. At the very least, its more complex that it was pre-merge and requires us to modify and issue our own version of cf-release which we've tried to avoid as much as possible.
Is there some other approach we should be taking here that we're missing?
Thanks,
Bob Brumfield
On Thu, May 28, 2015 at 6:49 AM, Filip Hanik <fhanik(a)pivotal.io> wrote:
hi Sree and Matt,
Matt is actually not referring to the wild cards. He wants the login.<domain> for his own application.
Matt, at this time we are claiming that domain name, as we did with the login job. We just moved it from one job to another. You may certainly take it out of the cf-registrar script and use it yourself. It is not a configuration that we have tested yet, but I don't foresee that you run into any major challenges. There may be some additional settings that you may have to tinker with
https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/login.yml.erb#L87-L89
You can correspond with Sree, if there is a need for us to completely free up the 'login' sub domain
Filip
On Thu, May 28, 2015 at 7:44 AM, Sree Tummidi <stummidi(a)pivotal.io> wrote:
Hi Matt, This new wild card route pattern was introduced for multi-tenancy in UAA post merge. Anything before login or uaa in the URL is now treated as a zone subdomain and the zone context is derived from it.
We will have to look into various approaches to solve this because even if you take over the login subdomain there is possibility for the code to misinterpret the url as a zone specific one.
Let me discuss this with the team and get back to you with possible solutions for the same.
Thanks, Sree
Sent from my iPad
On May 27, 2015, at 9:58 PM, Matt Cholick <cholick(a)gmail.com> wrote:
Prior to the consolidation of uaa and the login server in uaa release 2.0, we were running our own login server to handle auth to our platform. We simply reduced the instance of the bundled CF login server to 0 and put our own in place, which snagged the login subdomain. This worked just fine; our solution implemented all the needed endpoints to login.
We're now upgrading to a newer release with uaa 2.0+ and having difficulties. The uaa registrar hardcodes grabbing the login subdomains: ... - login.<%= properties.domain %> - '*.login.<%= properties.domain %>' ...
See:
https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/cf-registrar.config.yml.erb
This prevents us from taking over login. We locally removed those list items and our custom login server does continue to work. We have some questions about the right approach going forward though.
Are uaa and the login server going to continue to merge: to the point where we can no longer take over the login subdomain? Will this strategy no longer be feasible? What's the right answer non ldap/saml environments, if the uaa project's roadmap makes this replacement impossible?
If our current solution will continue to work for the foreseeable future, would the uaa team be amenable to a pull-request making the uri values configurable, so we can continue to take over the login subdomain?
-Matt Cholick
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: Custom Login Server with UAA 2.0+
Bob Brumfield <bob.brumfield@...>
Filip,
When the uaa and login were merged it was state that replacing replacing the login server would remain a supported scenario and this seems a departure from that statement. At the very least, its more complex that it was pre-merge and requires us to modify and issue our own version of cf-release which we've tried to avoid as much as possible.
Is there some other approach we should be taking here that we're missing?
Thanks,
Bob Brumfield
toggle quoted messageShow quoted text
On Thu, May 28, 2015 at 6:49 AM, Filip Hanik <fhanik(a)pivotal.io> wrote: hi Sree and Matt,
Matt is actually not referring to the wild cards. He wants the login.<domain> for his own application.
Matt, at this time we are claiming that domain name, as we did with the login job. We just moved it from one job to another. You may certainly take it out of the cf-registrar script and use it yourself. It is not a configuration that we have tested yet, but I don't foresee that you run into any major challenges. There may be some additional settings that you may have to tinker with
https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/login.yml.erb#L87-L89
You can correspond with Sree, if there is a need for us to completely free up the 'login' sub domain
Filip
On Thu, May 28, 2015 at 7:44 AM, Sree Tummidi <stummidi(a)pivotal.io> wrote:
Hi Matt, This new wild card route pattern was introduced for multi-tenancy in UAA post merge. Anything before login or uaa in the URL is now treated as a zone subdomain and the zone context is derived from it.
We will have to look into various approaches to solve this because even if you take over the login subdomain there is possibility for the code to misinterpret the url as a zone specific one.
Let me discuss this with the team and get back to you with possible solutions for the same.
Thanks, Sree
Sent from my iPad
On May 27, 2015, at 9:58 PM, Matt Cholick <cholick(a)gmail.com> wrote:
Prior to the consolidation of uaa and the login server in uaa release 2.0, we were running our own login server to handle auth to our platform. We simply reduced the instance of the bundled CF login server to 0 and put our own in place, which snagged the login subdomain. This worked just fine; our solution implemented all the needed endpoints to login.
We're now upgrading to a newer release with uaa 2.0+ and having difficulties. The uaa registrar hardcodes grabbing the login subdomains: ... - login.<%= properties.domain %> - '*.login.<%= properties.domain %>' ...
See:
https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/cf-registrar.config.yml.erb
This prevents us from taking over login. We locally removed those list items and our custom login server does continue to work. We have some questions about the right approach going forward though.
Are uaa and the login server going to continue to merge: to the point where we can no longer take over the login subdomain? Will this strategy no longer be feasible? What's the right answer non ldap/saml environments, if the uaa project's roadmap makes this replacement impossible?
If our current solution will continue to work for the foreseeable future, would the uaa team be amenable to a pull-request making the uri values configurable, so we can continue to take over the login subdomain?
-Matt Cholick
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: Custom Login Server with UAA 2.0+
hi Sree and Matt, Matt is actually not referring to the wild cards. He wants the login.<domain> for his own application. Matt, at this time we are claiming that domain name, as we did with the login job. We just moved it from one job to another. You may certainly take it out of the cf-registrar script and use it yourself. It is not a configuration that we have tested yet, but I don't foresee that you run into any major challenges. There may be some additional settings that you may have to tinker with https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/login.yml.erb#L87-L89You can correspond with Sree, if there is a need for us to completely free up the 'login' sub domain Filip
toggle quoted messageShow quoted text
On Thu, May 28, 2015 at 7:44 AM, Sree Tummidi <stummidi(a)pivotal.io> wrote: Hi Matt, This new wild card route pattern was introduced for multi-tenancy in UAA post merge. Anything before login or uaa in the URL is now treated as a zone subdomain and the zone context is derived from it.
We will have to look into various approaches to solve this because even if you take over the login subdomain there is possibility for the code to misinterpret the url as a zone specific one.
Let me discuss this with the team and get back to you with possible solutions for the same.
Thanks, Sree
Sent from my iPad
On May 27, 2015, at 9:58 PM, Matt Cholick <cholick(a)gmail.com> wrote:
Prior to the consolidation of uaa and the login server in uaa release 2.0, we were running our own login server to handle auth to our platform. We simply reduced the instance of the bundled CF login server to 0 and put our own in place, which snagged the login subdomain. This worked just fine; our solution implemented all the needed endpoints to login.
We're now upgrading to a newer release with uaa 2.0+ and having difficulties. The uaa registrar hardcodes grabbing the login subdomains: ... - login.<%= properties.domain %> - '*.login.<%= properties.domain %>' ...
See:
https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/cf-registrar.config.yml.erb
This prevents us from taking over login. We locally removed those list items and our custom login server does continue to work. We have some questions about the right approach going forward though.
Are uaa and the login server going to continue to merge: to the point where we can no longer take over the login subdomain? Will this strategy no longer be feasible? What's the right answer non ldap/saml environments, if the uaa project's roadmap makes this replacement impossible?
If our current solution will continue to work for the foreseeable future, would the uaa team be amenable to a pull-request making the uri values configurable, so we can continue to take over the login subdomain?
-Matt Cholick
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: Custom Login Server with UAA 2.0+
Hi Matt, This new wild card route pattern was introduced for multi-tenancy in UAA post merge. Anything before login or uaa in the URL is now treated as a zone subdomain and the zone context is derived from it.
We will have to look into various approaches to solve this because even if you take over the login subdomain there is possibility for the code to misinterpret the url as a zone specific one.
Let me discuss this with the team and get back to you with possible solutions for the same.
Thanks, Sree
Sent from my iPad
toggle quoted messageShow quoted text
On May 27, 2015, at 9:58 PM, Matt Cholick <cholick(a)gmail.com> wrote:
Prior to the consolidation of uaa and the login server in uaa release 2.0, we were running our own login server to handle auth to our platform. We simply reduced the instance of the bundled CF login server to 0 and put our own in place, which snagged the login subdomain. This worked just fine; our solution implemented all the needed endpoints to login.
We're now upgrading to a newer release with uaa 2.0+ and having difficulties. The uaa registrar hardcodes grabbing the login subdomains: ... - login.<%= properties.domain %> - '*.login.<%= properties.domain %>' ...
See: https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/cf-registrar.config.yml.erb
This prevents us from taking over login. We locally removed those list items and our custom login server does continue to work. We have some questions about the right approach going forward though.
Are uaa and the login server going to continue to merge: to the point where we can no longer take over the login subdomain? Will this strategy no longer be feasible? What's the right answer non ldap/saml environments, if the uaa project's roadmap makes this replacement impossible?
If our current solution will continue to work for the foreseeable future, would the uaa team be amenable to a pull-request making the uri values configurable, so we can continue to take over the login subdomain?
-Matt Cholick _______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: [vcap-dev] bosh create release --force
Thanks again Filip, I got that resolved. Gradle needed my proxy settings to be entered here. $cat ~/.gradle/gradle.properties systemProp.http.proxyHost=proxyServer systemProp.http.proxyPort=proxyport Also adding a ‘--debug’ flag along with gradlew command helped me progress. $./gradlew assemble –info --debug Regards, Dhilip From: Dhilip Kumar S Sent: Thursday, May 28, 2015 11:17 AM To: 'Filip Hanik'; CF Developers Mailing List Subject: RE: [vcap-dev] bosh create release --force Hi, Thanks for the response. It seems to get stuck here .. ./gradlew assemble --info Downloading http://localhost:8585/gradle-2.0-bin.zip......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Unzipping /home/dhilip/.gradle/wrapper/dists/gradle-2.0-bin/9snioba15mo3vjvn9rteu43rt/gradle-2.0-bin.zip to /home/dhilip/.gradle/wrapper/dists/gradle-2.0-bin/9snioba15mo3vjvn9rteu43rt Set executable permissions for: /home/dhilip/.gradle/wrapper/dists/gradle-2.0-bin/9snioba15mo3vjvn9rteu43rt/gradle-2.0/bin/gradle Starting Build Settings evaluated using settings file '/home/dhilip/workspace/cf-release/src/uaa/settings.gradle'. Projects loaded. Root project using build file '/home/dhilip/workspace/cf-release/src/uaa/build.gradle'. Included projects: [root project 'cloudfoundry-identity-parent', project ':cloudfoundry-identity-common', project ':cloudfoundry-identity-login', project ':cloudfoundry-identity-samples', project ':cloudfoundry-identity-scim', project ':cloudfoundry-identity-uaa', project ':cloudfoundry-identity-samples:cloudfoundry-identity-api', project ':cloudfoundry-identity-samples:cloudfoundry-identity-app', project ':cloudfoundry-identity-samples:cloudfoundry-identity-oauth-showcase'] Evaluating root project 'cloudfoundry-identity-parent' using build file '/home/dhilip/workspace/cf-release/src/uaa/build.gradle'. Configuring > 0/9 projects > root project > Resolving dependencies ':classpath' Is it trying to download something? Regards, Dhilip From: Filip Hanik [mailto:fhanik(a)pivotal.io] Sent: Wednesday, May 27, 2015 7:03 PM To: CF Developers Mailing List; Dhilip Kumar S Subject: Re: [vcap-dev] bosh create release --force The script that is executing at the time is: https://github.com/cloudfoundry/cf-release/blob/master/packages/uaa/pre_packaging#L36So what my suggestion is to test if this works is that you can do 1. 'cd src/uaa' 2. ensure that you have a JDK 7 installed 3. run the command './gradlew assemble --info' and this will tell us if the build process works on your machine. We're looking for the output BUILD SUCCESSFUL Total time: 40.509 secs Task timings: 579ms :cloudfoundry-identity-common:jar 7056ms :cloudfoundry-identity-common:javadoc 1981ms :cloudfoundry-identity-scim:compileJava 747ms :cloudfoundry-identity-login:compileJava 3800ms :cloudfoundry-identity-scim:javadoc 3141ms :cloudfoundry-identity-login:javadoc 3055ms :cloudfoundry-identity-uaa:war 1379ms :cloudfoundry-identity-samples:cloudfoundry-identity-api:javadoc 2176ms :cloudfoundry-identity-samples:cloudfoundry-identity-api:war 1443ms :cloudfoundry-identity-samples:cloudfoundry-identity-app:javadoc 2178ms :cloudfoundry-identity-samples:cloudfoundry-identity-app:war On Wed, May 27, 2015 at 7:22 AM, Dhilip Kumar S <dhilip.kumar.s(a)huawei.com<mailto:dhilip.kumar.s(a)huawei.com>> wrote: Hi All, While I was following the bosh release steps to deploy diego in bosh-lite environment . It gets stuck at at the below area for hours how do I debug this? Any clue would be great Building golang1.4... Using final version 'f57ddbc8d55d7a0f08775bf76bb6a27dc98c7ea7' Building cloud_controller_ng... Using final version 'e20142a32939a531038ace16a3cbe3b8242987e9' Building libpq... Using final version '49cc7477fcf9a3fef7a1f61e1494b32288587ed8' Building nginx... Using final version 'c916c10937c83a8be507d3100133101eb403c826' Building rtr... Using final version 'cd0d40ad56132a4d1cbc19223078f8ff96727d22' Building doppler... Using final version '2135434c91dc5e6f4aab6406b03ac02f9c2207fa' Building uaa... No artifact found for uaa Generating... Pre-packaging... Regards, Dhilip From: Matthew Sykes [mailto:matthew.sykes(a)gmail.com<mailto:matthew.sykes(a)gmail.com>] Sent: Friday, May 22, 2015 3:32 PM To: vcap-dev(a)cloudfoundry.org<mailto:vcap-dev(a)cloudfoundry.org> Subject: Re: [vcap-dev] container cannot communicate with the host Warden explicitly disables access to the container host. If you move up to a more recent level of cf-release, that behavior is configurable with the `allow_host_access` flag. When that flag is true, this line is skipped: https://github.com/cloudfoundry/warden/blob/4f1e5c049a12199fdd1f29cde15c9a786bd5fac8/warden/root/linux/net.sh#L128At the level you're at, that rule is always specified so you'd have to manually change it. https://github.com/cloudfoundry/warden/blob/17f34e2d7ff1994856a61961210a82e83f24ecac/warden/root/linux/net.sh#L124On Fri, May 22, 2015 at 3:17 AM, Youzhi Zhu <zhuyouzhi03(a)gmail.com<mailto:zhuyouzhi03(a)gmail.com>> wrote: Hi all I have an app A and a service B, service B is running on the dea server(ip 10.0.0.254), app A need to connect with service B through tcp, it works normally in my LAN, but when I push A to cf, it cannot connect to B, then I execute bin/wsh to get into the container and ping the host ip, it's unreachable, as below: root(a)18mkbd9n808:~# ping 10.0.0.254 PING 10.0.0.254 (10.0.0.254) 56(84) bytes of data. From 10.0.0.254 icmp_seq=1 Destination Port Unreachable From 10.0.0.254 icmp_seq=2 Destination Port Unreachable ^C --- 10.0.0.254 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1002ms But if I ping another host in the LAN. it can be reached!!! root(a)18mkbd9n808:~# ping 10.0.0.253 PING 10.0.0.253 (10.0.0.253) 56(84) bytes of data. 64 bytes from 10.0.0.253< http://10.0.0.253>: icmp_seq=1 ttl=63 time=1.60 ms 64 bytes from 10.0.0.253< http://10.0.0.253>: icmp_seq=2 ttl=63 time=0.421 ms ^C --- 10.0.0.253 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.421/1.013/1.606/0.593 ms It's wired!!! my cf-release is cf-175 and I have only one dea server.Does anyone met this situation before? thanks! -- This mailing list is for closed, and is available for archival purposes only. For active discussion, please visit https://lists.cloudfoundry.org/mailman/listinfo/cf-dev or email cf-dev(a)lists.cloudfoundry.org<mailto:cf-dev(a)lists.cloudfoundry.org> --- You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group. To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAGO-E6pv-Z3kEVUwMu2Wce1wBDHUpa49mjdOe1PXXrO-bKpVPg%40mail.gmail.com< https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAGO-E6pv-Z3kEVUwMu2Wce1wBDHUpa49mjdOe1PXXrO-bKpVPg%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- Matthew Sykes matthew.sykes(a)gmail.com<mailto:matthew.sykes(a)gmail.com> -- This mailing list is for closed, and is available for archival purposes only. For active discussion, please visit https://lists.cloudfoundry.org/mailman/listinfo/cf-dev or email cf-dev(a)lists.cloudfoundry.org<mailto:cf-dev(a)lists.cloudfoundry.org> --- You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group. To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAFcj6oQfRC5fQCEBaK24WeMSBWWhkBZBcZzEzO49zy-PLBRpYg%40mail.gmail.com< https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAFcj6oQfRC5fQCEBaK24WeMSBWWhkBZBcZzEzO49zy-PLBRpYg%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- This mailing list is for closed, and is available for archival purposes only. For active discussion, please visit https://lists.cloudfoundry.org/mailman/listinfo/cf-dev or email cf-dev(a)lists.cloudfoundry.org<mailto:cf-dev(a)lists.cloudfoundry.org> --- You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group. To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DB525612315D2C41BB1177C30F24024D295850%40blreml508-mbx< https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DB525612315D2C41BB1177C30F24024D295850%40blreml508-mbx?utm_medium=email&utm_source=footer>.
|
|
Re: api and api_worker jobs fail to bosh update, but monit start OK

Guillaume Berche
Oups, sorry, I had probably overlooked this faulty trace that should have hinted me.
[2015-05-26 16:37:51+0000] ------------ STARTING cloud_controller_ng_ctl at Tue May 26 16:37:51 UTC 2015 -------------- [2015-05-26 16:37:51+0000] chown: changing ownership of ‘/var/vcap/nfs/shared’: Operation not permitted
toggle quoted messageShow quoted text
On Thu, May 28, 2015 at 9:56 AM, Guillaume Berche <bercheg(a)gmail.com> wrote: Thanks a lot Mike and Dieu. Indeed moving the nfs_mounter last seemed indeed to fix the issue in v207. If ever this reproduces on master and can help, I submitted https://github.com/cloudfoundry/cf-release/pull/689 against develop branch.
Out of curiosity, and for improving my next diagnostic task, how was the root cause diagnosed? I was not observing any faulty output traces into jobs outputs: [...]/cloud_controller_worker_ctl.log, /var/vcap/sys/log/cloud_controller_ng_ctl.err.log or [...]/cloud_controller_ng/cloud_controller_ng.log
@Dieu, is there a way the runtime pipelines output could be shared with the community (of course hiding sensitive data), as to help the community better understand which case went through the automated cases and report issues on different settings? E.g. a public concourse job for the pipeline runing stemcell 2977 (runtime-bb-2 ?).
Thanks,
Guillaume.
On Wed, May 27, 2015 at 7:55 PM, Dieu Cao <dcao(a)pivotal.io> wrote:
We have environments on stemcell 2977 that are running well.
We have an environment using NFS that ran into that same issue and we have this bug open. [1] Specifying the nfs_mounter job last should work in the mean time until we get the order switched. This was apparently introduced when we added consul_agent to the cloud controller jobs. I'll update the release notes for the affected releases.
-Dieu CF Runtime PM
[1] https://www.pivotaltracker.com/story/show/94152506
On Wed, May 27, 2015 at 10:09 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
We recently experienced a similar issue. Not sure if it is the same. But it was caused when we moved the nfs_mounter job template to the first item in the list of templates for the CC job. We moved nfs_mounter to the last job template in the list and we haven't had a problem since. It was really strange cause you think you'd want nfs_mounter first. Anyway, something to try.
Mike
On Wed, May 27, 2015 at 4:51 AM, Guillaume Berche <bercheg(a)gmail.com> wrote:
Hi,
I'm experiencing a weird situation where api and api_worker jobs fail to update through bosh and end up being reported as "not running". However, manually running "monit start cloud_controller_ng" (or rebooting the vm), the faulty jobs starts fine, and bosh deployment proceeds without errors. Looking at monit logs, it seems that there is an extra monit stop request for the cc_ng.
Below are detailed traces illustrating the issue.
$ bosh deploy
[..] Started updating job ha_proxy_z1 > ha_proxy_z1/0 (canary). Done (00:00:39) Started updating job api_z1 > api_z1/0 (canary). Failed: `api_z1/0' is not running after update (00:10:44)
When instructing bosh to update the job (in this case only a config change), we indeed see the bosh agent asking monit to stop jobs, restart monit itself, start jobs, and then we see the extra stop (at* 12:33:26) *before the bosh director ends up timeouting and calling the canary failed.
$ less /var/vcap/monit/monit.log
[UTC May 22 12:33:17] info : Awakened by User defined signal 1[UTC May 22 12:33:17] info : Awakened by the SIGHUP signal[UTC May 22 12:33:17] info : Reinitializing monit - Control file '/var/vcap/bosh/etc/monitrc'[UTC May 22 12:33:17] info : Shutting down monit HTTP server[UTC May 22 12:33:18] info : monit HTTP server stopped[UTC May 22 12:33:18] info : Starting monit HTTP server at [127.0.0.1:2822][UTC May 22 12:33:18] info : monit HTTP server started[UTC May 22 12:33:18] info : 'system_897cdb8d-f9f7-4bfa-a748-512489b676e0' Monit reloaded[UTC May 22 12:33:23] info : start service 'consul_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : Awakened by User defined signal 1[UTC May 22 12:33:23] info : 'consul_agent' start: /var/vcap/jobs/consul_agent/bin/agent_ctl[UTC May 22 12:33:23] info : start service 'nfs_mounter' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'metron_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'cloud_controller_worker_1' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:24] info : 'consul_agent' start action done[UTC May 22 12:33:24] info : 'nfs_mounter' start: /var/vcap/jobs/nfs_mounter/bin/nfs_mounter_ctl[UTC May 22 12:33:24] info : 'cloud_controller_worker_1' start: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl*[UTC May 22 12:33:25] info : 'cloud_controller_worker_1' start action done *[UTC May 22 12:33:25] info : 'metron_agent' start: /var/vcap/jobs/metron_agent/bin/metron_agent_ctl[UTC May 22 12:33:26] info : 'metron_agent' start action done*[UTC May 22 12:33:26] info : 'cloud_controller_worker_1' stop: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl *[UTC May 22 12:33:27] info : 'nfs_mounter' start action done[UTC May 22 12:33:27] info : Awakened by User defined signal 1
There is no associated traces of the bosh agent asking this extra stop:
$ less /var/vcap/bosh/log/current 2015-05-22_12:33:23.73606 [monitJobSupervisor] 2015/05/22 12:33:23 DEBUG - Starting service cloud_controller_worker_12015-05-22_12:33:23.73608 [http-client] 2015/05/22 12:33:23 DEBUG - Monit request: url='http://127.0.0.1:2822/cloud_controller_worker_1' body='action=start'2015-05-22_12:33:23.73608 [attemptRetryStrategy] 2015/05/22 12:33:23 DEBUG - Making attempt #02015-05-22_12:33:23.73609 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Requesting (attempt=1): Request{ Method: 'POST', URL: 'http://127.0.0.1:2822/cloud_controller_worker_1' }2015-05-22_12:33:23.73647 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Request succeeded (attempts=1), response: Response{ StatusCode: 200, Status: '200 OK'}2015-05-22_12:33:23.73648 [MBus Handler] 2015/05/22 12:33:23 INFO - Responding2015-05-22_12:33:23.73650 [MBus Handler] 2015/05/22 12:33:23 DEBUG - Payload2015-05-22_12:33:23.73650 ********************2015-05-22_12:33:23.73651 {"value":"started"}2015-05-22_12:33:23.73651 ******************** 2015-05-22_12:33:36.69397 [NATS Handler] 2015/05/22 12:33:36 DEBUG - Message Payload2015-05-22_12:33:36.69397 ********************2015-05-22_12:33:36.69397 {"job":"api_worker_z1","index":0,"job_state":"failing","vitals":{"cpu":{"sys":"6.5","user":"14.4","wait":"0.4"},"disk":{"ephemeral":{"inode_percent":"10","percent":"14"},"persistent":{"inode_percent":"36","percent":"48"},"system":{"inode_percent":"36","percent":"48"}},"load":["0.19","0.06","0.06"],"mem":{"kb":"81272","percent":"8"},"swap":{"kb":"0","percent":"0"}}}
This is reproducing systematically on our set up using bosh release 152 with stemcell bosh-vcloud-esxi-ubuntu-trusty-go_agent version 2889, and cf release 207 running stemcell 2889.
Enabling monit verbose logs discarded the theory of monit restarting cc_ng jobs because of too much ram usage, or failed http health check (along with the short time window in which the extra stop is requested: ~15s). I also discarded possibility of multiple monit instances, or pid inconsistency with cc_ng process. I'm now suspecting either the bosh agent to send extra stop request, or something with the cc_ng ctl scripts.
As a side question, can someone explain how the cc_ng ctl script works, I'm suprised with the following process tree, where ruby seems to call the ctl script. Is the cc spawning it self ?
$ ps auxf --cols=2000 | less [...] vcap 8011 0.6 7.4 793864 299852 ? S<l May26 6:01 ruby /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/bin/cloud_controller -m -c /var/vcap/jobs/cloud_controller_ng/config/cloud_controller_ng.yml root 8014 0.0 0.0 19596 1436 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8023 0.0 0.0 5924 1828 ? S< May26 0:00 | \_ tee -a /dev/fd/63 root 8037 0.0 0.0 19600 1696 ? S< May26 0:00 | | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8061 0.0 0.0 5916 1924 ? S< May26 0:00 | | \_ logger -p user.info -t vcap.cloud_controller_ng_ctl.stdout root 8024 0.0 0.0 7552 1788 ? S< May26 0:00 | \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) } root 8015 0.0 0.0 19600 1440 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8021 0.0 0.0 5924 1832 ? S< May26 0:00 \_ tee -a /dev/fd/63 root 8033 0.0 0.0 19600 1696 ? S< May26 0:00 | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8060 0.0 0.0 5912 1920 ? S< May26 0:00 | \_ logger -p user.error -t vcap.cloud_controller_ng_ctl.stderr root 8022 0.0 0.0 7552 1748 ? S< May26 0:00 \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) }
I was wondering whether this could come from our setting running CF with a more recent stemcell version (2922) than what the cf release notes are mentionning as "tested configuration". Are the latest stemcells tested against latest CF release ? Is there any way to see what stemcell version the runtime team pipelines is using [1] seemed to accept env vars and [2] required logging in ? I scanned through the bosh agent commit logs to spot something related but without luck so far.
Thanks in advance for your help,
Guillaume.
[1] https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh <https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh> [2] https://concourse.diego-ci.cf-app.com/ <https://concourse.diego-ci.cf-app.com/>
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Re: api and api_worker jobs fail to bosh update, but monit start OK

Guillaume Berche
Thanks a lot Mike and Dieu. Indeed moving the nfs_mounter last seemed indeed to fix the issue in v207. If ever this reproduces on master and can help, I submitted https://github.com/cloudfoundry/cf-release/pull/689against develop branch. Out of curiosity, and for improving my next diagnostic task, how was the root cause diagnosed? I was not observing any faulty output traces into jobs outputs: [...]/cloud_controller_worker_ctl.log, /var/vcap/sys/log/cloud_controller_ng_ctl.err.log or [...]/cloud_controller_ng/cloud_controller_ng.log @Dieu, is there a way the runtime pipelines output could be shared with the community (of course hiding sensitive data), as to help the community better understand which case went through the automated cases and report issues on different settings? E.g. a public concourse job for the pipeline runing stemcell 2977 (runtime-bb-2 ?). Thanks, Guillaume.
toggle quoted messageShow quoted text
On Wed, May 27, 2015 at 7:55 PM, Dieu Cao <dcao(a)pivotal.io> wrote: We have environments on stemcell 2977 that are running well.
We have an environment using NFS that ran into that same issue and we have this bug open. [1] Specifying the nfs_mounter job last should work in the mean time until we get the order switched. This was apparently introduced when we added consul_agent to the cloud controller jobs. I'll update the release notes for the affected releases.
-Dieu CF Runtime PM
[1] https://www.pivotaltracker.com/story/show/94152506
On Wed, May 27, 2015 at 10:09 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:
We recently experienced a similar issue. Not sure if it is the same. But it was caused when we moved the nfs_mounter job template to the first item in the list of templates for the CC job. We moved nfs_mounter to the last job template in the list and we haven't had a problem since. It was really strange cause you think you'd want nfs_mounter first. Anyway, something to try.
Mike
On Wed, May 27, 2015 at 4:51 AM, Guillaume Berche <bercheg(a)gmail.com> wrote:
Hi,
I'm experiencing a weird situation where api and api_worker jobs fail to update through bosh and end up being reported as "not running". However, manually running "monit start cloud_controller_ng" (or rebooting the vm), the faulty jobs starts fine, and bosh deployment proceeds without errors. Looking at monit logs, it seems that there is an extra monit stop request for the cc_ng.
Below are detailed traces illustrating the issue.
$ bosh deploy
[..] Started updating job ha_proxy_z1 > ha_proxy_z1/0 (canary). Done (00:00:39) Started updating job api_z1 > api_z1/0 (canary). Failed: `api_z1/0' is not running after update (00:10:44)
When instructing bosh to update the job (in this case only a config change), we indeed see the bosh agent asking monit to stop jobs, restart monit itself, start jobs, and then we see the extra stop (at* 12:33:26) *before the bosh director ends up timeouting and calling the canary failed.
$ less /var/vcap/monit/monit.log
[UTC May 22 12:33:17] info : Awakened by User defined signal 1[UTC May 22 12:33:17] info : Awakened by the SIGHUP signal[UTC May 22 12:33:17] info : Reinitializing monit - Control file '/var/vcap/bosh/etc/monitrc'[UTC May 22 12:33:17] info : Shutting down monit HTTP server[UTC May 22 12:33:18] info : monit HTTP server stopped[UTC May 22 12:33:18] info : Starting monit HTTP server at [127.0.0.1:2822][UTC May 22 12:33:18] info : monit HTTP server started[UTC May 22 12:33:18] info : 'system_897cdb8d-f9f7-4bfa-a748-512489b676e0' Monit reloaded[UTC May 22 12:33:23] info : start service 'consul_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : Awakened by User defined signal 1[UTC May 22 12:33:23] info : 'consul_agent' start: /var/vcap/jobs/consul_agent/bin/agent_ctl[UTC May 22 12:33:23] info : start service 'nfs_mounter' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'metron_agent' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:23] info : start service 'cloud_controller_worker_1' on user request[UTC May 22 12:33:23] info : monit daemon at 1050 awakened[UTC May 22 12:33:24] info : 'consul_agent' start action done[UTC May 22 12:33:24] info : 'nfs_mounter' start: /var/vcap/jobs/nfs_mounter/bin/nfs_mounter_ctl[UTC May 22 12:33:24] info : 'cloud_controller_worker_1' start: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl*[UTC May 22 12:33:25] info : 'cloud_controller_worker_1' start action done *[UTC May 22 12:33:25] info : 'metron_agent' start: /var/vcap/jobs/metron_agent/bin/metron_agent_ctl[UTC May 22 12:33:26] info : 'metron_agent' start action done*[UTC May 22 12:33:26] info : 'cloud_controller_worker_1' stop: /var/vcap/jobs/cloud_controller_worker/bin/cloud_controller_worker_ctl *[UTC May 22 12:33:27] info : 'nfs_mounter' start action done[UTC May 22 12:33:27] info : Awakened by User defined signal 1
There is no associated traces of the bosh agent asking this extra stop:
$ less /var/vcap/bosh/log/current 2015-05-22_12:33:23.73606 [monitJobSupervisor] 2015/05/22 12:33:23 DEBUG - Starting service cloud_controller_worker_12015-05-22_12:33:23.73608 [http-client] 2015/05/22 12:33:23 DEBUG - Monit request: url='http://127.0.0.1:2822/cloud_controller_worker_1' body='action=start'2015-05-22_12:33:23.73608 [attemptRetryStrategy] 2015/05/22 12:33:23 DEBUG - Making attempt #02015-05-22_12:33:23.73609 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Requesting (attempt=1): Request{ Method: 'POST', URL: 'http://127.0.0.1:2822/cloud_controller_worker_1' }2015-05-22_12:33:23.73647 [clientRetryable] 2015/05/22 12:33:23 DEBUG - [requestID=52ede4f0-427d-4e65-6da1-d3b5c4b5cafd] Request succeeded (attempts=1), response: Response{ StatusCode: 200, Status: '200 OK'}2015-05-22_12:33:23.73648 [MBus Handler] 2015/05/22 12:33:23 INFO - Responding2015-05-22_12:33:23.73650 [MBus Handler] 2015/05/22 12:33:23 DEBUG - Payload2015-05-22_12:33:23.73650 ********************2015-05-22_12:33:23.73651 {"value":"started"}2015-05-22_12:33:23.73651 ******************** 2015-05-22_12:33:36.69397 [NATS Handler] 2015/05/22 12:33:36 DEBUG - Message Payload2015-05-22_12:33:36.69397 ********************2015-05-22_12:33:36.69397 {"job":"api_worker_z1","index":0,"job_state":"failing","vitals":{"cpu":{"sys":"6.5","user":"14.4","wait":"0.4"},"disk":{"ephemeral":{"inode_percent":"10","percent":"14"},"persistent":{"inode_percent":"36","percent":"48"},"system":{"inode_percent":"36","percent":"48"}},"load":["0.19","0.06","0.06"],"mem":{"kb":"81272","percent":"8"},"swap":{"kb":"0","percent":"0"}}}
This is reproducing systematically on our set up using bosh release 152 with stemcell bosh-vcloud-esxi-ubuntu-trusty-go_agent version 2889, and cf release 207 running stemcell 2889.
Enabling monit verbose logs discarded the theory of monit restarting cc_ng jobs because of too much ram usage, or failed http health check (along with the short time window in which the extra stop is requested: ~15s). I also discarded possibility of multiple monit instances, or pid inconsistency with cc_ng process. I'm now suspecting either the bosh agent to send extra stop request, or something with the cc_ng ctl scripts.
As a side question, can someone explain how the cc_ng ctl script works, I'm suprised with the following process tree, where ruby seems to call the ctl script. Is the cc spawning it self ?
$ ps auxf --cols=2000 | less [...] vcap 8011 0.6 7.4 793864 299852 ? S<l May26 6:01 ruby /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/bin/cloud_controller -m -c /var/vcap/jobs/cloud_controller_ng/config/cloud_controller_ng.yml root 8014 0.0 0.0 19596 1436 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8023 0.0 0.0 5924 1828 ? S< May26 0:00 | \_ tee -a /dev/fd/63 root 8037 0.0 0.0 19600 1696 ? S< May26 0:00 | | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8061 0.0 0.0 5916 1924 ? S< May26 0:00 | | \_ logger -p user.info -t vcap.cloud_controller_ng_ctl.stdout root 8024 0.0 0.0 7552 1788 ? S< May26 0:00 | \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) } root 8015 0.0 0.0 19600 1440 ? S< May26 0:00 \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8021 0.0 0.0 5924 1832 ? S< May26 0:00 \_ tee -a /dev/fd/63 root 8033 0.0 0.0 19600 1696 ? S< May26 0:00 | \_ /bin/bash /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl start root 8060 0.0 0.0 5912 1920 ? S< May26 0:00 | \_ logger -p user.error -t vcap.cloud_controller_ng_ctl.stderr root 8022 0.0 0.0 7552 1748 ? S< May26 0:00 \_ awk -W Interactive {lineWithDate="echo [`date +\"%Y-%m-%d %H:%M:%S%z\"`] \"" $0 "\""; system(lineWithDate) }
I was wondering whether this could come from our setting running CF with a more recent stemcell version (2922) than what the cf release notes are mentionning as "tested configuration". Are the latest stemcells tested against latest CF release ? Is there any way to see what stemcell version the runtime team pipelines is using [1] seemed to accept env vars and [2] required logging in ? I scanned through the bosh agent commit logs to spot something related but without luck so far.
Thanks in advance for your help,
Guillaume.
[1] https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh <https://github.com/cloudfoundry/bosh-lite/blob/master/ci/ci-stemcell-bats.sh> [2] https://concourse.diego-ci.cf-app.com/ <https://concourse.diego-ci.cf-app.com/>
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
_______________________________________________ cf-dev mailing list cf-dev(a)lists.cloudfoundry.org https://lists.cloudfoundry.org/mailman/listinfo/cf-dev
|
|
Setting Org Manager via API
|
|
Re: [vcap-dev] bosh create release --force
Hi, Thanks for the response. It seems to get stuck here .. ./gradlew assemble --info Downloading http://localhost:8585/gradle-2.0-bin.zip......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Unzipping /home/dhilip/.gradle/wrapper/dists/gradle-2.0-bin/9snioba15mo3vjvn9rteu43rt/gradle-2.0-bin.zip to /home/dhilip/.gradle/wrapper/dists/gradle-2.0-bin/9snioba15mo3vjvn9rteu43rt Set executable permissions for: /home/dhilip/.gradle/wrapper/dists/gradle-2.0-bin/9snioba15mo3vjvn9rteu43rt/gradle-2.0/bin/gradle Starting Build Settings evaluated using settings file '/home/dhilip/workspace/cf-release/src/uaa/settings.gradle'. Projects loaded. Root project using build file '/home/dhilip/workspace/cf-release/src/uaa/build.gradle'. Included projects: [root project 'cloudfoundry-identity-parent', project ':cloudfoundry-identity-common', project ':cloudfoundry-identity-login', project ':cloudfoundry-identity-samples', project ':cloudfoundry-identity-scim', project ':cloudfoundry-identity-uaa', project ':cloudfoundry-identity-samples:cloudfoundry-identity-api', project ':cloudfoundry-identity-samples:cloudfoundry-identity-app', project ':cloudfoundry-identity-samples:cloudfoundry-identity-oauth-showcase'] Evaluating root project 'cloudfoundry-identity-parent' using build file '/home/dhilip/workspace/cf-release/src/uaa/build.gradle'. Configuring > 0/9 projects > root project > Resolving dependencies ':classpath' Is it trying to download something? Regards, Dhilip From: Filip Hanik [mailto:fhanik(a)pivotal.io] Sent: Wednesday, May 27, 2015 7:03 PM To: CF Developers Mailing List; Dhilip Kumar S Subject: Re: [vcap-dev] bosh create release --force The script that is executing at the time is: https://github.com/cloudfoundry/cf-release/blob/master/packages/uaa/pre_packaging#L36So what my suggestion is to test if this works is that you can do 1. 'cd src/uaa' 2. ensure that you have a JDK 7 installed 3. run the command './gradlew assemble --info' and this will tell us if the build process works on your machine. We're looking for the output BUILD SUCCESSFUL Total time: 40.509 secs Task timings: 579ms :cloudfoundry-identity-common:jar 7056ms :cloudfoundry-identity-common:javadoc 1981ms :cloudfoundry-identity-scim:compileJava 747ms :cloudfoundry-identity-login:compileJava 3800ms :cloudfoundry-identity-scim:javadoc 3141ms :cloudfoundry-identity-login:javadoc 3055ms :cloudfoundry-identity-uaa:war 1379ms :cloudfoundry-identity-samples:cloudfoundry-identity-api:javadoc 2176ms :cloudfoundry-identity-samples:cloudfoundry-identity-api:war 1443ms :cloudfoundry-identity-samples:cloudfoundry-identity-app:javadoc 2178ms :cloudfoundry-identity-samples:cloudfoundry-identity-app:war On Wed, May 27, 2015 at 7:22 AM, Dhilip Kumar S <dhilip.kumar.s(a)huawei.com<mailto:dhilip.kumar.s(a)huawei.com>> wrote: Hi All, While I was following the bosh release steps to deploy diego in bosh-lite environment . It gets stuck at at the below area for hours how do I debug this? Any clue would be great Building golang1.4... Using final version 'f57ddbc8d55d7a0f08775bf76bb6a27dc98c7ea7' Building cloud_controller_ng... Using final version 'e20142a32939a531038ace16a3cbe3b8242987e9' Building libpq... Using final version '49cc7477fcf9a3fef7a1f61e1494b32288587ed8' Building nginx... Using final version 'c916c10937c83a8be507d3100133101eb403c826' Building rtr... Using final version 'cd0d40ad56132a4d1cbc19223078f8ff96727d22' Building doppler... Using final version '2135434c91dc5e6f4aab6406b03ac02f9c2207fa' Building uaa... No artifact found for uaa Generating... Pre-packaging... Regards, Dhilip From: Matthew Sykes [mailto:matthew.sykes(a)gmail.com<mailto:matthew.sykes(a)gmail.com>] Sent: Friday, May 22, 2015 3:32 PM To: vcap-dev(a)cloudfoundry.org<mailto:vcap-dev(a)cloudfoundry.org> Subject: Re: [vcap-dev] container cannot communicate with the host Warden explicitly disables access to the container host. If you move up to a more recent level of cf-release, that behavior is configurable with the `allow_host_access` flag. When that flag is true, this line is skipped: https://github.com/cloudfoundry/warden/blob/4f1e5c049a12199fdd1f29cde15c9a786bd5fac8/warden/root/linux/net.sh#L128At the level you're at, that rule is always specified so you'd have to manually change it. https://github.com/cloudfoundry/warden/blob/17f34e2d7ff1994856a61961210a82e83f24ecac/warden/root/linux/net.sh#L124On Fri, May 22, 2015 at 3:17 AM, Youzhi Zhu <zhuyouzhi03(a)gmail.com<mailto:zhuyouzhi03(a)gmail.com>> wrote: Hi all I have an app A and a service B, service B is running on the dea server(ip 10.0.0.254), app A need to connect with service B through tcp, it works normally in my LAN, but when I push A to cf, it cannot connect to B, then I execute bin/wsh to get into the container and ping the host ip, it's unreachable, as below: root(a)18mkbd9n808:~# ping 10.0.0.254 PING 10.0.0.254 (10.0.0.254) 56(84) bytes of data. From 10.0.0.254 icmp_seq=1 Destination Port Unreachable From 10.0.0.254 icmp_seq=2 Destination Port Unreachable ^C --- 10.0.0.254 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1002ms But if I ping another host in the LAN. it can be reached!!! root(a)18mkbd9n808:~# ping 10.0.0.253 PING 10.0.0.253 (10.0.0.253) 56(84) bytes of data. 64 bytes from 10.0.0.253< http://10.0.0.253>: icmp_seq=1 ttl=63 time=1.60 ms 64 bytes from 10.0.0.253< http://10.0.0.253>: icmp_seq=2 ttl=63 time=0.421 ms ^C --- 10.0.0.253 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.421/1.013/1.606/0.593 ms It's wired!!! my cf-release is cf-175 and I have only one dea server.Does anyone met this situation before? thanks! -- This mailing list is for closed, and is available for archival purposes only. For active discussion, please visit https://lists.cloudfoundry.org/mailman/listinfo/cf-dev or email cf-dev(a)lists.cloudfoundry.org<mailto:cf-dev(a)lists.cloudfoundry.org> --- You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group. To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAGO-E6pv-Z3kEVUwMu2Wce1wBDHUpa49mjdOe1PXXrO-bKpVPg%40mail.gmail.com< https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAGO-E6pv-Z3kEVUwMu2Wce1wBDHUpa49mjdOe1PXXrO-bKpVPg%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- Matthew Sykes matthew.sykes(a)gmail.com<mailto:matthew.sykes(a)gmail.com> -- This mailing list is for closed, and is available for archival purposes only. For active discussion, please visit https://lists.cloudfoundry.org/mailman/listinfo/cf-dev or email cf-dev(a)lists.cloudfoundry.org<mailto:cf-dev(a)lists.cloudfoundry.org> --- You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group. To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAFcj6oQfRC5fQCEBaK24WeMSBWWhkBZBcZzEzO49zy-PLBRpYg%40mail.gmail.com< https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAFcj6oQfRC5fQCEBaK24WeMSBWWhkBZBcZzEzO49zy-PLBRpYg%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- This mailing list is for closed, and is available for archival purposes only. For active discussion, please visit https://lists.cloudfoundry.org/mailman/listinfo/cf-dev or email cf-dev(a)lists.cloudfoundry.org<mailto:cf-dev(a)lists.cloudfoundry.org> --- You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group. To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DB525612315D2C41BB1177C30F24024D295850%40blreml508-mbx< https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DB525612315D2C41BB1177C30F24024D295850%40blreml508-mbx?utm_medium=email&utm_source=footer>.
|
|
Custom Login Server with UAA 2.0+
Prior to the consolidation of uaa and the login server in uaa release 2.0, we were running our own login server to handle auth to our platform. We simply reduced the instance of the bundled CF login server to 0 and put our own in place, which snagged the login subdomain. This worked just fine; our solution implemented all the needed endpoints to login. We're now upgrading to a newer release with uaa 2.0+ and having difficulties. The uaa registrar hardcodes grabbing the login subdomains: ... - login.<%= properties.domain %> - '*.login.<%= properties.domain %>' ... See: https://github.com/cloudfoundry/cf-release/blob/master/jobs/uaa/templates/cf-registrar.config.yml.erbThis prevents us from taking over login. We locally removed those list items and our custom login server does continue to work. We have some questions about the right approach going forward though. Are uaa and the login server going to continue to merge: to the point where we can no longer take over the login subdomain? Will this strategy no longer be feasible? What's the right answer non ldap/saml environments, if the uaa project's roadmap makes this replacement impossible? If our current solution will continue to work for the foreseeable future, would the uaa team be amenable to a pull-request making the uri values configurable, so we can continue to take over the login subdomain? -Matt Cholick
|
|
Re: Multiple Availability Zone
I updated my bosh (using bosh-init) with enabling ignore_server_availability_zone. but it still failed when I deployed my cf. Anything suggestion? openstack: &openstack auth_url: http://137.172.74.78:5000/v2.0 # <--- Replace with OpenStack Identity API endpoint tenant: cf # <--- Replace with OpenStack tenant name username: cf-admin # <--- Replace with OpenStack username api_key: passw0rd # <--- Replace with OpenStack password default_key_name: cf-keypair default_security_groups: [default,bosh] ignore_server_availability_zone: true Error message from the deployment of cf: Started updating job etcd_z1 > etcd_z1/0 (canary). Failed: OpenStack API Bad Request (Invalid input received: Availability zone 'cloud-cf-az2' is invalid). Check task debug log for details. (00:00:19) Error 100: OpenStack API Bad Request (Invalid input received: Availability zone 'cloud-cf-az2' is invalid). Check task debug log for details. I checked the api request on first computing node. (/var/log/cinder/api.log) 2015-05-27 16:28:40.652 32174 DEBUG cinder.api.v1.volumes [req-4df6ac85-e986-438a-a953-5a2190ec5f62 8b0d5a75bd9c4539ba7fa64e5669c6c8 48a0898a9c4944f1b321da699ca4c37a - - -] Create volume request body: {u'volume': {'scheduler_hints': {}, u'availability_zone': u'cloud-cf-az2', u'display_name': u'volume-36f9a2eb-8bc9-4f27-9530-34c9d24fa881', u'display_description': u'', u'size': 10}} create /usr/lib/python2.6/site-packages/cinder/api/v1/volumes.py:316 Attached my cf deployment file for reference. cf-deployment-single-az.yml < http://cf-dev.70369.x6.nabble.com/file/n206/cf-deployment-single-az.yml> -- View this message in context: http://cf-dev.70369.x6.nabble.com/cf-dev-Multiple-Availability-Zone-tp192p206.htmlSent from the CF Dev mailing list archive at Nabble.com.
|
|