Date
1 - 2 of 2
BOSH deployment times out pinging agent after 600 seconds (s390x platform)
R M
I see some of these messages in NATS .. not sure if this is a problem:
Interestingly, Director is throwing Auth errors but VM is able to post to NATS:
/==================================/
Interestingly, Director is throwing Auth errors but VM is able to post to NATS:
/==================================/
[1] 2019/06/11 12:32:14.863072 [TRC] 192.168.20.10:35232 - cid:3 - ->> [CONNECT {"verbose":false,"pedantic":false,"lang":"ruby","version":"0.9.2","protocol":1,"ssl_required":true,"tls_required":true}]
[1] 2019/06/11 12:32:14.863095 [ERR] 192.168.20.10:35232 - cid:3 - Authorization Error <--- Director VM to NATS fails
[1] 2019/06/11 12:32:14.863101 [TRC] 192.168.20.10:35232 - cid:3 - <<- [-ERR Authorization Violation]
[1] 2019/06/11 12:32:14.863138 [DBG] 192.168.20.10:35232 - cid:3 - Client connection closed
[1] 2019/06/11 12:32:14.865154 [DBG] 192.168.20.10:35234 - cid:4 - Client connection created
....
[1] 2019/06/11 12:32:48.757875 [TRC] 192.168.20.3:42022 - cid:5 - ->> [CONNECT {"user":"nats","pass":"739rv2lksjyyfdhmqo49","verbose":true,"pedantic":true}]
[1] 2019/06/11 12:32:48.757925 [TRC] 192.168.20.3:42022 - cid:5 - <<- [OK] <--- Agent VM seems ok and is able to publish using PWD authentication (?)
[1] 2019/06/11 12:32:48.759034 [TRC] 192.168.20.3:42022 - cid:5 - ->> [PUB hm.agent.heartbeat.e01ca0da-d340-4f08-ad1a-8a29ebb8abc9 356]
[1] 2019/06/11 12:32:48.759056 [TRC] 192.168.20.3:42022 - cid:5 - ->> MSG_PAYLOAD: [{"deployment":"","job":null,"index":null,"job_state":"running","vitals":{"cpu":{"sys":"0.4","user":"0.3","wait":"0.0"},"disk":{"ephemeral":{"inode_percent":"0","percent":"0"},"system":{"inode_percent":"28","percent":"42"}},"load":["0.00","0.00","0.00"],"mem":{"kb":"152908","percent":"2"},"swap":{"kb":"0","percent":"0"},"uptime":{"secs":24}},"node_id":""}]
[1] 2019/06/11 12:32:48.759069 [TRC] 192.168.20.3:42022 - cid:5 - <<- [OK]
[1] 2019/06/11 12:32:48.759637 [TRC] 192.168.20.3:42022 - cid:5 - ->> [SUB agent.e01ca0da-d340-4f08-ad1a-8a29ebb8abc9 1]
/==================================/
Do I need to specify NATS certs as part of my deployment request? - BOSH_LOG_LEVEL=info bosh -e bosh-1 -d redis-deployment deploy manifest.yml --certs?
Do I need to specify NATS certs as part of my deployment request? - BOSH_LOG_LEVEL=info bosh -e bosh-1 -d redis-deployment deploy manifest.yml --certs?
R M
Hi there,
- Using OpenStack Rocky on s390x
I built (Xenial) stemcell and ported BOSH over to s390x platform. For the most part it seems to work. However, deployment times out during "Compiling packages" stage. I am unable to figure out why this could be a problem. Director VM and compilation VM seem to be able to ping each other. NATS messages are also being posted by compilation VM. Please let me know where else I could look for clues:
Here are my steps:
/====================================/
BOSH_LOG_LEVEL=info bosh -e bosh-1 -d redis-deployment deploy manifest.yml
....
My compilation VM Agent logs from /var/vcap/bosh/log/current doesn't seem to indicate any issues:
/====================================/
...
/====================================/
I have also removed "ephemeral" option from my OpenStack flavor as per https://github.com/cloudfoundry/bosh/issues/2044
Any tips to debug this further greatly appreciated.
Thanks.
- Using OpenStack Rocky on s390x
I built (Xenial) stemcell and ported BOSH over to s390x platform. For the most part it seems to work. However, deployment times out during "Compiling packages" stage. I am unable to figure out why this could be a problem. Director VM and compilation VM seem to be able to ping each other. NATS messages are also being posted by compilation VM. Please let me know where else I could look for clues:
Here are my steps:
/====================================/
BOSH_LOG_LEVEL=info bosh -e bosh-1 -d redis-deployment deploy manifest.yml
....
Task 61 | 19:19:07 | Preparing deployment: Preparing deployment (00:00:00)
Task 61 | 19:19:07 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 61 | 19:19:07 | Compiling packages: redis/b8455f0a7551849b841b759fc44d2c1eff79331b (00:10:27)
L Error: Timed out pinging to c3080c5f-d79b-48f8-a117-8629cf4b6c3c after 600 seconds
Task 61 | 19:29:34 | Error: Timed out pinging to c3080c5f-d79b-48f8-a117-8629cf4b6c3c after 600 seconds
Task 61 Started Fri Jun 7 19:19:07 UTC 2019
Task 61 Finished Fri Jun 7 19:29:34 UTC 2019
Task 61 Duration 00:10:27
Task 61 error
[CLI] 2019/06/07 15:29:34 ERROR - Updating deployment: Expected task '61' to succeed but state is 'error'
/====================================/My compilation VM Agent logs from /var/vcap/bosh/log/current doesn't seem to indicate any issues:
/====================================/
...
2019-06-07_18:02:48.15948 [File System] 2019/06/07 18:02:48 DEBUG - Checking if file exists /var/vcap/bosh/spec.json
2019-06-07_18:02:48.15948 [File System] 2019/06/07 18:02:48 DEBUG - Stat '/var/vcap/bosh/spec.json'
2019-06-07_18:02:48.15949 [File System] 2019/06/07 18:02:48 DEBUG - Writing /var/vcap/instance/health.json
2019-06-07_18:02:48.15949 [File System] 2019/06/07 18:02:48 DEBUG - Making dir /var/vcap/instance with perm 0777
2019-06-07_18:02:48.15949 [File System] 2019/06/07 18:02:48 DEBUG - Write content
2019-06-07_18:02:48.15949 ********************
2019-06-07_18:02:48.15950 {"state":"running"}
2019-06-07_18:02:48.15950 ********************
2019-06-07_18:02:48.15950 [NATS Handler] 2019/06/07 18:02:48 INFO - Sending hm message 'heartbeat'
2019-06-07_18:02:48.15950 [NATS Handler] 2019/06/07 18:02:48 DEBUG - Message Payload
2019-06-07_18:02:48.15951 ********************
2019-06-07_18:02:48.15951 {"deployment":"","job":null,"index":null,"job_state":"running","vitals":{"cpu":{"sys":"0.0","user":"0.0","wait":"0.0"},"disk":{"ephemeral":{"inode_percent":"0","percent":"0"},"system":{"inode_percent":"28","percent":"42"}},"load":["0.00","0.00","0.00"],"mem":{"kb":"156596","percent":"2"},"swap":{"kb":"0","percent":"0"},"uptime":{"secs":289}},"node_id":""}
2019-06-07_18:02:48.15952 ********************
2019-06-07_18:02:48.15952 [Cmd Runner] 2019/06/07 18:02:48 DEBUG - Running command 'route -n'
2019-06-07_18:02:48.16047 [Cmd Runner] 2019/06/07 18:02:48 DEBUG - Successful: true (0)
I have also removed "ephemeral" option from my OpenStack flavor as per https://github.com/cloudfoundry/bosh/issues/2044
Any tips to debug this further greatly appreciated.
Thanks.