Date
1 - 20 of 26
`api_z1/0' is not running after update to CF v231
Wayne Ha <wayne.h.ha@...>
Sorry for the late response. I didn't get a chance to try again until today. It turned out by setting require_https to false will let me run "cf login".
Properties uaa + require_https: false Meta No changes Deploying --------- Director task 10 Started preparing deployment Started preparing deployment > Binding deployment. Done (00:00:00) Started preparing deployment > Binding releases. Done (00:00:00) Started preparing deployment > Binding existing deployment. Done (00:00:00) Started preparing deployment > Binding resource pools. Done (00:00:00) Started preparing deployment > Binding stemcells. Done (00:00:00) Started preparing deployment > Binding templates. Done (00:00:00) Started preparing deployment > Binding properties. Done (00:00:00) Started preparing deployment > Binding unallocated VMs. Done (00:00:00) Started preparing deployment > Binding instance networks. Done (00:00:00) Done preparing deployment (00:00:00) Started preparing package compilation > Finding packages to compile. Done (00:00:00) Started preparing dns > Binding DNS. Done (00:00:00) Started preparing configuration > Binding configuration. Done (00:00:03) Started updating job uaa_z1 > uaa_z1/0. Done (00:01:09) |
|
Filip Hanik
follow the steps in here
toggle quoted message
Show quoted text
https://github.com/cloudfoundry/bosh-lite/blob/master/bin/provision_cf for a working bosh-lite. key is to start with a virtualbox image. the fusion/workstation are not up to date. Filip On Monday, March 7, 2016, Wayne Ha <wayne.h.ha(a)gmail.com> wrote:
Filip, |
|
Amit Kumar Gupta
Hey Wayne,
toggle quoted message
Show quoted text
What command did you run to generate your manifest? Amit On Mon, Mar 7, 2016 at 8:03 PM, Wayne Ha <wayne.h.ha(a)gmail.com> wrote:
Filip, |
|
Wayne Ha <wayne.h.ha@...>
Filip,
I am running with the latest CF v231. Initially, I ran it with older stemcell and got: `api_z1/0' is not running after update After running with the latest stemcell, I got a successful deployment but failed to login with error: Error performing request: Get https://login.bosh-lite.com/login: stopped after 1 redirect Could there be some configurations that I missed? Note that I am using default bosh-lite-v231.yml. Thanks, |
|
Filip Hanik
Error performing request: Get https://login.bosh-lite.com/login: stopped
toggle quoted message
Show quoted text
after 1 redirect that's the error right there. it's a redirect loop what version of CF is this? upgrade to the latest. On Monday, March 7, 2016, sridhar vennela <sridhar.vennela(a)gmail.com> wrote:
Hi Wayne, |
|
sridhar vennela
Hi Wayne,
I am not seeing any errors in above. To capture UAA errors, It is better to open 2 terminals in one terminal you can do tail -f uaa.log and in another terminal try to do cf login -a api.bosh-lite.com -u admin -p admin --skip-ssl-validation. Thank you, Sridhar |
|
Wayne Ha <wayne.h.ha@...>
Zach,
After using the latest stemcell, I got a successful deployment. But after that, cf login fails: vagrant(a)agent-id-bosh-0:~$ cf login -a api.bosh-lite.com -u admin -p admin API endpoint: api.bosh-lite.com FAILED Invalid SSL Cert for api.bosh-lite.com TIP: Use 'cf login --skip-ssl-validation' to continue with an insecure API endpoint vagrant(a)agent-id-bosh-0:~$ cf login -a api.bosh-lite.com -u admin -p admin --skip-ssl-validation API endpoint: api.bosh-lite.com FAILED Error performing request: Get https://login.bosh-lite.com/login: stopped after 1 redirect API endpoint: https://api.bosh-lite.com (API version: 2.51.0) Not logged in. Use 'cf login' to log in. I saw the following in uaa.log: root(a)d142fabc-f823-40df-b9ea-97d306bf7209:/var/vcap/sys/log/uaa# grep -A9 -i error uaa.log | cut -c 65-650 DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/error' DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/email_sent' DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/create_account*' DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/accounts/email_sent' DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/invalid_request' DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/saml_error' DEBUG --- UaaRequestMatcher: [loginAuthenticateRequestMatcher] Checking match of request : '/login'; '/authenticate' with parameters={} and headers {Authorization=[bearer ], accept=[application/json]} DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/authenticate/**' DEBUG --- UaaRequestMatcher: [loginAuthorizeRequestMatcher] Checking match of request : '/login'; '/oauth/authorize' with parameters={source=login} and headers {accept=[application/json]} DEBUG --- UaaRequestMatcher: [loginTokenRequestMatcher] Checking match of request : '/login'; '/oauth/token' with parameters={source=login, grant_type=password, add_new=} and headers {Authorization=[bearer ], accept=[application/json]} DEBUG --- UaaRequestMatcher: [loginAuthorizeRequestMatcherOld] Checking match of request : '/login'; '/oauth/authorize' with parameters={login={} and headers {accept=[application/json]} DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/password_*' DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/email_*' DEBUG --- AntPathRequestMatcher: Checking match of request : '/login'; against '/oauth/token/revoke/**' DEBUG --- UaaRequestMatcher: [passcodeTokenMatcher] Checking match of request : '/login'; '/oauth/token' with parameters={grant_type=password, passcode=} and headers {accept=[application/json, application/x-www-form-urlencoded]} But I don't know what the above mean. Thanks, |
|
Wayne Ha <wayne.h.ha@...>
Zach,
Thanks for the hints. You are right, I am not using latest stemcell: vagrant(a)agent-id-bosh-0:~$ bosh stemcells +---------------------------------------------+---------+--------------------------------------+ | Name | Version | CID | +---------------------------------------------+---------+--------------------------------------+ | bosh-warden-boshlite-ubuntu-trusty-go_agent | 389* | cb6ee28c-a703-4a7e-581b-b63be2302e3d | I will try the stemcell you recommended to see if it helps. Thanks, |
|
Zach Robinson
Wayne,
Can you verify that you are using the latest bosh-lite stemcell 3147? Older stemcells are known to have issues with consul which is what many of the CF components use for service discovery. Latest bosh-lite stemcells can be found at http://bosh.io Just search for lite. See this similar issue: https://github.com/cloudfoundry/cf-release/issues/919 -Zach |
|
Amit Kumar Gupta
As of cf v231, CC has switched from using NFS to WebDav as the default
toggle quoted message
Show quoted text
blobstore. There are more details in the release notes: https://github.com/cloudfoundry/cf-release/releases/tag/v231. I don't know off-hand how to debug the issue you're seeing, but I will reach out to some folks with more knowledge of Cloud Controller. Best, Amit On Mon, Mar 7, 2016 at 8:48 AM, Wayne Ha <wayne.h.ha(a)gmail.com> wrote:
Kayode, |
|
Wayne Ha <wayne.h.ha@...>
Kayode,
I am using the default bosh-lite-v231.yml file and the instances for nfs server is set to 0: vagrant(a)agent-id-bosh-0:~$ egrep -i "name:.*nfs|instances" bosh-lite-v231.yml.1603041454 etc... - instances: 0 - instances: 0 - instances: 0 name: nfs_z1 - name: debian_nfs_server - instances: 1 - instances: 1 - instances: 1 etc... So it is not running. Thanks, |
|
Paul Bakare
Wayne, is the nfs_server-partition running?
toggle quoted message
Show quoted text
On Mon, Mar 7, 2016 at 1:43 AM, Wayne Ha <wayne.h.ha(a)gmail.com> wrote:
I checked the blobstore is running: |
|
Wayne Ha <wayne.h.ha@...>
I checked the blobstore is running:
root(a)e83575d2-dfbf-4f7c-97ee-5112560fa137:/var/vcap/sys/log# monit summary The Monit daemon 5.2.4 uptime: 4h 14m Process 'consul_agent' running Process 'metron_agent' running Process 'blobstore_nginx' running Process 'route_registrar' running System 'system_e83575d2-dfbf-4f7c-97ee-5112560fa137' running But there are thousands of errors saying DopplerForwarder: can't forward message, loggregator client pool is empty: root(a)e83575d2-dfbf-4f7c-97ee-5112560fa137:/var/vcap/sys/log# find . -name "*.log" | xargs grep -i error | cut -c 73-500 | sort -u ,"process_id":246,"source":"metron","log_level": "error","message":"DopplerForwarder: can't forward message","data":{ "error":"loggregator client pool is empty"}, "file":"/var/vcap/data/compile/metron_agent/loggregator/src/metron/writers/dopplerforwarder/doppler_forwarder.go", "line":104, "method":"metron/writers/dopplerforwarder.(*DopplerForwarder).networkWrite"} Not sure what is wrong. |
|
Wayne Ha <wayne.h.ha@...>
Amit,
toggle quoted message
Show quoted text
Thanks for letting me know I might have looked at the wrong log files. I saw the following in cloud_controller log files: root(a)7a1f2221-c31a-494b-b16c-d4a97c16c9ab:/var/vcap/sys/log# tail ./cloud_controller_ng_ctl.log [2016-03-06 22:40:28+0000] ------------ STARTING cloud_controller_ng_ctl at Sun Mar 6 22:40:28 UTC 2016 -------------- [2016-03-06 22:40:28+0000] Checking for blobstore availability [2016-03-06 22:41:03+0000] Blobstore is not available root(a)7a1f2221-c31a-494b-b16c-d4a97c16c9ab:/var/vcap/sys/log# tail ./cloud_controller_worker_ctl.log [2016-03-06 22:41:13+0000] Killing /var/vcap/sys/run/cloud_controller_ng/cloud_controller_worker_2.pid: 12145 [2016-03-06 22:41:13+0000] .Stopped [2016-03-06 22:41:36+0000] Blobstore is not available [2016-03-06 22:41:48+0000] ------------ STARTING cloud_controller_worker_ctl at Sun Mar 6 22:41:48 UTC 2016 -------------- [2016-03-06 22:41:48+0000] Checking for blobstore availability [2016-03-06 22:41:48+0000] Removing stale pidfile... So maybe the cause is Blobstore is not available? Thanks, On Sun, Mar 6, 2016 at 1:15 PM, Amit Gupta <agupta(a)pivotal.io> wrote:
The log lines saying "/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock |
|
Amit Kumar Gupta
The log lines saying
"/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock is not found" is probably just a symptom of the problem, not the root cause. You're probably seeing those in the nginx logs? Cloud Controller is failing to start, hence it is not establishing a connection on the socket. You need to dig deeper into failures in logs in /var/vcap/sys/log/cloud_controller_ng. On Sun, Mar 6, 2016 at 10:00 AM, sridhar vennela <sridhar.vennela(a)gmail.com> wrote: Hi Wayne, |
|
sridhar vennela
Hi Wayne,
Looks like it, It is trying to connect to loggregator and failing I guess. https://github.com/cloudfoundry/cloud_controller_ng/blob/master/app/controllers/runtime/syslog_drain_urls_controller.rb Thank you, Sridhar |
|
Wayne Ha <wayne.h.ha@...>
Since it is complaining /var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock is not found, I thought I would just touch that file. Now I get:
2016/03/06 17:14:11 [error] 18497#0: *5 connect() to unix:/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock failed (111: Connection refused) while connecting to upstream, client: <bosh director>, server: _, request: "GET /v2/syslog_drain_urls?batch_size=1000 HTTP/1.1", upstream: "http://unix:/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock:/v2/syslog_drain_urls?batch_size=1000", host: "api.bosh-lite.com" Maybe there is network configuration problem in my environment? |
|
Wayne Ha <wayne.h.ha@...>
Sridhar,
Thanks for your response. I have tried your suggestion and it doesn't help. But I might have misled you with the consul error. That error only got logged once at the beginning. So like you said, maybe VM was not able to join consul server before it came up. But after that, the following error keeps logging every minute or so: 2016/03/06 17:04:41 [crit] 11480#0: *4 connect() to unix:/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock failed (2: No such file or directory) while connecting to upstream, server: _, request: "GET /v2/syslog_drain_urls?batch_size=1000 HTTP/1.1", upstream: "http://unix:/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock:/v2/syslog_drain_urls?batch_size=1000", host: "api.bosh-lite.com" So maybe the above is the cause of the problem? Thanks, On Sun, Mar 6, 2016 at 12:51 AM, sridhar vennela <sridhar.vennela(a)gmail.com> wrote: Hi Wayne, |
|
sridhar vennela
Hi Wayne,
Somehow VM is not able to join consul server. You can try below steps. ps -ef | grep consul kill consul-serverpid monit restart <consul-job> Thank you, Sridhar |
|
Wayne Ha <wayne.h.ha@...>
Sridhar,
Thanks for your response. I found the VM is listening to port 8500: root(a)c6822dcb-fb02-4858-ae5d-3ab45d593896:/var/vcap/sys/log# netstat -anp | grep LISTEN tcp 0 0 127.0.0.1:8400 0.0.0.0:* LISTEN 18162/consul tcp 0 0 127.0.0.1:8500 0.0.0.0:* LISTEN 18162/consul tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 18162/consul tcp 0 0 127.0.0.1:2822 0.0.0.0:* LISTEN 72/monit tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 31/sshd tcp 0 0 10.244.0.138:8301 0.0.0.0:* LISTEN 18162/consul If I run "monit stop all" then it only listens to the following: root(a)c6822dcb-fb02-4858-ae5d-3ab45d593896:/var/vcap/sys/log# netstat -anp | grep LISTEN tcp 0 0 127.0.0.1:2822 0.0.0.0:* LISTEN 72/monit tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 31/sshd Note that 10.244.0.138 is the IP of this VM. Thanks, On Sat, Mar 5, 2016 at 12:58 AM, sridhar vennela <sridhar.vennela(a)gmail.com> wrote: Hi Wayne, |
|