BOSH deployment times out pinging agent after 600 seconds (s390x platform)


R M
 

Hi there,

- Using OpenStack Rocky on s390x

I built (Xenial) stemcell and ported BOSH over to s390x platform.  For the most part it seems to work.  However, deployment times out during "Compiling packages" stage.  I am unable to figure out why this could be a problem.  Director VM and compilation VM seem to be able to ping each other.  NATS messages are also being posted by compilation VM.  Please let me know where else I could look for clues:

Here are my steps:

/====================================/
BOSH_LOG_LEVEL=info bosh -e bosh-1 -d redis-deployment deploy manifest.yml
....

Task 61 | 19:19:07 | Preparing deployment: Preparing deployment (00:00:00)
Task 61 | 19:19:07 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 61 | 19:19:07 | Compiling packages: redis/b8455f0a7551849b841b759fc44d2c1eff79331b (00:10:27)
                   L Error: Timed out pinging to c3080c5f-d79b-48f8-a117-8629cf4b6c3c after 600 seconds
Task 61 | 19:29:34 | Error: Timed out pinging to c3080c5f-d79b-48f8-a117-8629cf4b6c3c after 600 seconds
 
Task 61 Started  Fri Jun  7 19:19:07 UTC 2019
Task 61 Finished Fri Jun  7 19:29:34 UTC 2019
Task 61 Duration 00:10:27
Task 61 error
[CLI] 2019/06/07 15:29:34 ERROR - Updating deployment: Expected task '61' to succeed but state is 'error'
/====================================/

My compilation VM Agent logs from /var/vcap/bosh/log/current doesn't seem to indicate any issues:

/====================================/
...
2019-06-07_18:02:48.15948 [File System] 2019/06/07 18:02:48 DEBUG - Checking if file exists /var/vcap/bosh/spec.json
2019-06-07_18:02:48.15948 [File System] 2019/06/07 18:02:48 DEBUG - Stat '/var/vcap/bosh/spec.json'
2019-06-07_18:02:48.15949 [File System] 2019/06/07 18:02:48 DEBUG - Writing /var/vcap/instance/health.json
2019-06-07_18:02:48.15949 [File System] 2019/06/07 18:02:48 DEBUG - Making dir /var/vcap/instance with perm 0777
2019-06-07_18:02:48.15949 [File System] 2019/06/07 18:02:48 DEBUG - Write content
2019-06-07_18:02:48.15949 ********************
2019-06-07_18:02:48.15950 {"state":"running"}
2019-06-07_18:02:48.15950 ********************
2019-06-07_18:02:48.15950 [NATS Handler] 2019/06/07 18:02:48 INFO - Sending hm message 'heartbeat'
2019-06-07_18:02:48.15950 [NATS Handler] 2019/06/07 18:02:48 DEBUG - Message Payload
2019-06-07_18:02:48.15951 ********************
2019-06-07_18:02:48.15951 {"deployment":"","job":null,"index":null,"job_state":"running","vitals":{"cpu":{"sys":"0.0","user":"0.0","wait":"0.0"},"disk":{"ephemeral":{"inode_percent":"0","percent":"0"},"system":{"inode_percent":"28","percent":"42"}},"load":["0.00","0.00","0.00"],"mem":{"kb":"156596","percent":"2"},"swap":{"kb":"0","percent":"0"},"uptime":{"secs":289}},"node_id":""}
2019-06-07_18:02:48.15952 ********************
2019-06-07_18:02:48.15952 [Cmd Runner] 2019/06/07 18:02:48 DEBUG - Running command 'route -n'
2019-06-07_18:02:48.16047 [Cmd Runner] 2019/06/07 18:02:48 DEBUG - Successful: true (0)
 
/====================================/

I have also removed "ephemeral" option from my OpenStack flavor as per 
https://github.com/cloudfoundry/bosh/issues/2044

Any tips to debug this further greatly appreciated.

Thanks.

Join cf-bosh@lists.cloudfoundry.org to automatically receive all group messages.