How shoulld I debug a blobstore error?


Eyal Shalev
 

Hello, I am trying to deploy cloud foundry on openstack.

running "bosh deploy" gives me the following error:
Started updating job consul_z1 > consul_z1/0 (d248a6da-476a-4fb1-bc08-01ce23c94eb4) (canary). Done (00:01:03)
Started updating job ha_proxy_z1 > ha_proxy_z1/0 (f416b7c5-24d3-4d58-bbf9-066740971065) (canary). Done (00:00:43)
Started updating job nats_z1 > nats_z1/0 (074d00e8-96b3-4b7c-af1a-347549ff51dd) (canary). Done (00:00:44)
Started updating job etcd_z1 > etcd_z1/0 (ef27b75d-e0d0-4a95-8ec1-4b0c9371fa93) (canary). Done (00:01:23)
Started updating job stats_z1 > stats_z1/0 (20c1f5e9-3b7a-4c84-a940-3628cadf942c) (canary). Done (00:00:47)
Started updating job blobstore_z1 > blobstore_z1/0 (28f3cf80-502b-4e83-8dff-ba3477894a13) (canary). Done (00:01:20)
Started updating job postgres_z1 > postgres_z1/0 (601262d5-97d2-436c-b67d-1e63680f9dc6) (canary). Done (00:01:16)
Started updating job uaa_z1 > uaa_z1/0 (9792327b-05c6-4bd8-8434-8a48fa364a91) (canary). Done (00:00:52)
Started updating job api_z1 > api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396) (canary). Failed: 'api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396)' is not running after update. Review logs for failed jobs: cloud_controller_ng, cloud_controller_worker_local_1, cloud_controller_worker_local_2, nginx_cc, cloud_controller_worker_1 (00:11:11)

Error 400007: 'api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396)' is not running after update. Review logs for failed jobs: cloud_controller_ng, cloud_controller_worker_local_1, cloud_controller_worker_local_2, nginx_cc, cloud_controller_worker_1



"bosh vms" command seem to show blobstore be up and running:
+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| VM | State | AZ | VM Type | IPs |
+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396) | failing | n/a | large_z1 | 192.168.10.54 |
| blobstore_z1/0 (28f3cf80-502b-4e83-8dff-ba3477894a13) | running | n/a | medium_z1 | 192.168.10.52 |
| consul_z1/0 (d248a6da-476a-4fb1-bc08-01ce23c94eb4) | running | n/a | small_z1 | 192.168.10.76 |
| doppler_z1/0 (ec0232f4-e736-4bbf-9fe4-b35b3d0c673d) | running | n/a | medium_z1 | 192.168.10.57 |
| etcd_z1/0 (ef27b75d-e0d0-4a95-8ec1-4b0c9371fa93) | running | n/a | medium_z1 | 192.168.10.72 |
| ha_proxy_z1/0 (f416b7c5-24d3-4d58-bbf9-066740971065) | running | n/a | router_z1 | 192.168.10.64 |
| | | | | XX.XX.XX.XX |
| hm9000_z1/0 (c50fbd8e-1c8f-4b96-a1c8-a427fdabba79) | running | n/a | medium_z1 | 192.168.10.55 |
| loggregator_trafficcontroller_z1/0 (e187f9c1-dc5d-4d9b-850d-ca6118513228) | running | n/a | small_z1 | 192.168.10.58 |
| nats_z1/0 (074d00e8-96b3-4b7c-af1a-347549ff51dd) | running | n/a | medium_z1 | 192.168.10.66 |
| postgres_z1/0 (601262d5-97d2-436c-b67d-1e63680f9dc6) | running | n/a | medium_z1 | 192.168.10.68 |
| router_z1/0 (4da988b9-5cbb-4a6f-b082-5c4ed25e6a7c) | running | n/a | router_z1 | 192.168.10.69 |
| runner_z1/0 (4bce5874-2edc-4a0f-a074-0a136933de65) | running | n/a | runner_z1 | 192.168.10.56 |
| stats_z1/0 (20c1f5e9-3b7a-4c84-a940-3628cadf942c) | running | n/a | small_z1 | 192.168.10.51 |
| uaa_z1/0 (9792327b-05c6-4bd8-8434-8a48fa364a91) | running | n/a | medium_z1 | 192.168.10.53 |
+---------------------------------------------------------------------------+---------+-----+-----------+---------------+

The vcap job logs also seem to show it thinks that the blobstore node is healthey,

However, when I ssh to the api node, I see problems:
"monit summary" yields:
Process 'cloud_controller_ng' not monitored
Process 'cloud_controller_worker_local_1' not monitored
Process 'cloud_controller_worker_local_2' not monitored
Process 'nginx_cc' not monitored
Process 'cloud_controller_migration' running
Process 'cloud_controller_clock' running
Process 'cloud_controller_worker_1' running
Process 'metron_agent' running
Process 'statsd-injector' running
Process 'route_registrar' running
System 'system_localhost' running


, /var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng_ctl.log looks like this:
[2016-06-24 12:32:51+0000] Checking for blobstore availability
[2016-06-24 12:33:38+0000] ------------ STARTING cloud_controller_ng_ctl at Fri Jun 24 12:33:38 UTC 2016 --------------
[2016-06-24 12:33:38+0000] Checking for blobstore availability
[2016-06-24 12:34:24+0000] ------------ STARTING cloud_controller_ng_ctl at Fri Jun 24 12:34:24 UTC 2016 --------------
[2016-06-24 12:34:24+0000] Checking for blobstore availability
[2016-06-24 12:34:41+0000] Blobstore is not available

Other jobs on the same node are failing as well, all with similar messages both in stderr and stdout logs.
However, I do not see any additional information, such as an IP address or URL that I can test.

Additionally on the blobstore node, both "monit" and the vcap job logs seem to think that it is healthy.

How should I go about debugging this error? (I am looking for more verbose information on the error "blobstore is not available")

I do not mind tearing down this deployment and starting over with more verbose flags to expose the problem.
It seems to occur consitantly.

PS: my stemcell is bosh-stemcell-3232.6-openstack-kvm-ubuntu-trusty-go_agent.tgz , and I have not done anything special to configure blobstore on cf-stub besides adding TLS Ca's and certs. Also, I could find no instructions on how to add these certs to cf-stub, so I just added some self-generated keys & certs.


Amit Kumar Gupta
 

Hi Eyal,

No need to tear things down. By any chance, did you generate your manifest
using a stub? Do you have something that looks like this in your manifest?

<https://github.com/cloudfoundry/cf-release/blob/3290ed8/spec/fixtures/openstack/cf-stub.yml#L155-L187>
https://github.com/cloudfoundry/cf-release/blob/3290ed8/spec/fixtures/openstack/cf-stub.yml#L155-L187

Best,
Amit

On Fri, Jun 24, 2016 at 6:54 AM, Eyal Shalev <eshalev(a)cisco.com> wrote:

Hello, I am trying to deploy cloud foundry on openstack.

running "bosh deploy" gives me the following error:
Started updating job consul_z1 > consul_z1/0
(d248a6da-476a-4fb1-bc08-01ce23c94eb4) (canary). Done (00:01:03)
Started updating job ha_proxy_z1 > ha_proxy_z1/0
(f416b7c5-24d3-4d58-bbf9-066740971065) (canary). Done (00:00:43)
Started updating job nats_z1 > nats_z1/0
(074d00e8-96b3-4b7c-af1a-347549ff51dd) (canary). Done (00:00:44)
Started updating job etcd_z1 > etcd_z1/0
(ef27b75d-e0d0-4a95-8ec1-4b0c9371fa93) (canary). Done (00:01:23)
Started updating job stats_z1 > stats_z1/0
(20c1f5e9-3b7a-4c84-a940-3628cadf942c) (canary). Done (00:00:47)
Started updating job blobstore_z1 > blobstore_z1/0
(28f3cf80-502b-4e83-8dff-ba3477894a13) (canary). Done (00:01:20)
Started updating job postgres_z1 > postgres_z1/0
(601262d5-97d2-436c-b67d-1e63680f9dc6) (canary). Done (00:01:16)
Started updating job uaa_z1 > uaa_z1/0
(9792327b-05c6-4bd8-8434-8a48fa364a91) (canary). Done (00:00:52)
Started updating job api_z1 > api_z1/0
(03f63cee-dfc1-4a93-9ddc-1bf129eb7396) (canary). Failed: 'api_z1/0
(03f63cee-dfc1-4a93-9ddc-1bf129eb7396)' is not running after update. Review
logs for failed jobs: cloud_controller_ng, cloud_controller_worker_local_1,
cloud_controller_worker_local_2, nginx_cc, cloud_controller_worker_1
(00:11:11)

Error 400007: 'api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396)' is not
running after update. Review logs for failed jobs: cloud_controller_ng,
cloud_controller_worker_local_1, cloud_controller_worker_local_2, nginx_cc,
cloud_controller_worker_1



"bosh vms" command seem to show blobstore be up and running:

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| VM
| State | AZ | VM Type | IPs |

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396)
| failing | n/a | large_z1 | 192.168.10.54 |
| blobstore_z1/0 (28f3cf80-502b-4e83-8dff-ba3477894a13)
| running | n/a | medium_z1 | 192.168.10.52 |
| consul_z1/0 (d248a6da-476a-4fb1-bc08-01ce23c94eb4)
| running | n/a | small_z1 | 192.168.10.76 |
| doppler_z1/0 (ec0232f4-e736-4bbf-9fe4-b35b3d0c673d)
| running | n/a | medium_z1 | 192.168.10.57 |
| etcd_z1/0 (ef27b75d-e0d0-4a95-8ec1-4b0c9371fa93)
| running | n/a | medium_z1 | 192.168.10.72 |
| ha_proxy_z1/0 (f416b7c5-24d3-4d58-bbf9-066740971065)
| running | n/a | router_z1 | 192.168.10.64 |
|
| | | | XX.XX.XX.XX |
| hm9000_z1/0 (c50fbd8e-1c8f-4b96-a1c8-a427fdabba79)
| running | n/a | medium_z1 | 192.168.10.55 |
| loggregator_trafficcontroller_z1/0
(e187f9c1-dc5d-4d9b-850d-ca6118513228) | running | n/a | small_z1 |
192.168.10.58 |
| nats_z1/0 (074d00e8-96b3-4b7c-af1a-347549ff51dd)
| running | n/a | medium_z1 | 192.168.10.66 |
| postgres_z1/0 (601262d5-97d2-436c-b67d-1e63680f9dc6)
| running | n/a | medium_z1 | 192.168.10.68 |
| router_z1/0 (4da988b9-5cbb-4a6f-b082-5c4ed25e6a7c)
| running | n/a | router_z1 | 192.168.10.69 |
| runner_z1/0 (4bce5874-2edc-4a0f-a074-0a136933de65)
| running | n/a | runner_z1 | 192.168.10.56 |
| stats_z1/0 (20c1f5e9-3b7a-4c84-a940-3628cadf942c)
| running | n/a | small_z1 | 192.168.10.51 |
| uaa_z1/0 (9792327b-05c6-4bd8-8434-8a48fa364a91)
| running | n/a | medium_z1 | 192.168.10.53 |

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+

The vcap job logs also seem to show it thinks that the blobstore node is
healthey,

However, when I ssh to the api node, I see problems:
"monit summary" yields:
Process 'cloud_controller_ng' not monitored
Process 'cloud_controller_worker_local_1' not monitored
Process 'cloud_controller_worker_local_2' not monitored
Process 'nginx_cc' not monitored
Process 'cloud_controller_migration' running
Process 'cloud_controller_clock' running
Process 'cloud_controller_worker_1' running
Process 'metron_agent' running
Process 'statsd-injector' running
Process 'route_registrar' running
System 'system_localhost' running


, /var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng_ctl.log looks
like this:
[2016-06-24 12:32:51+0000] Checking for blobstore availability
[2016-06-24 12:33:38+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 12:33:38 UTC 2016 --------------
[2016-06-24 12:33:38+0000] Checking for blobstore availability
[2016-06-24 12:34:24+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 12:34:24 UTC 2016 --------------
[2016-06-24 12:34:24+0000] Checking for blobstore availability
[2016-06-24 12:34:41+0000] Blobstore is not available

Other jobs on the same node are failing as well, all with similar messages
both in stderr and stdout logs.
However, I do not see any additional information, such as an IP address or
URL that I can test.

Additionally on the blobstore node, both "monit" and the vcap job logs
seem to think that it is healthy.

How should I go about debugging this error? (I am looking for more verbose
information on the error "blobstore is not available")

I do not mind tearing down this deployment and starting over with more
verbose flags to expose the problem.
It seems to occur consitantly.

PS: my stemcell is
bosh-stemcell-3232.6-openstack-kvm-ubuntu-trusty-go_agent.tgz , and I have
not done anything special to configure blobstore on cf-stub besides adding
TLS Ca's and certs. Also, I could find no instructions on how to add these
certs to cf-stub, so I just added some self-generated keys & certs.



Pramod Mandagere
 

Hi Amit,
We have the exact same issue and our cf-stub looking exactly like the one u
pointed out. Stuck with this issue,
Regards,
Pramod

On Fri, Jun 24, 2016 at 10:26 AM Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Eyal,

No need to tear things down. By any chance, did you generate your
manifest using a stub? Do you have something that looks like this in your
manifest?


<https://github.com/cloudfoundry/cf-release/blob/3290ed8/spec/fixtures/openstack/cf-stub.yml#L155-L187>
https://github.com/cloudfoundry/cf-release/blob/3290ed8/spec/fixtures/openstack/cf-stub.yml#L155-L187

Best,
Amit

On Fri, Jun 24, 2016 at 6:54 AM, Eyal Shalev <eshalev(a)cisco.com> wrote:

Hello, I am trying to deploy cloud foundry on openstack.

running "bosh deploy" gives me the following error:
Started updating job consul_z1 > consul_z1/0
(d248a6da-476a-4fb1-bc08-01ce23c94eb4) (canary). Done (00:01:03)
Started updating job ha_proxy_z1 > ha_proxy_z1/0
(f416b7c5-24d3-4d58-bbf9-066740971065) (canary). Done (00:00:43)
Started updating job nats_z1 > nats_z1/0
(074d00e8-96b3-4b7c-af1a-347549ff51dd) (canary). Done (00:00:44)
Started updating job etcd_z1 > etcd_z1/0
(ef27b75d-e0d0-4a95-8ec1-4b0c9371fa93) (canary). Done (00:01:23)
Started updating job stats_z1 > stats_z1/0
(20c1f5e9-3b7a-4c84-a940-3628cadf942c) (canary). Done (00:00:47)
Started updating job blobstore_z1 > blobstore_z1/0
(28f3cf80-502b-4e83-8dff-ba3477894a13) (canary). Done (00:01:20)
Started updating job postgres_z1 > postgres_z1/0
(601262d5-97d2-436c-b67d-1e63680f9dc6) (canary). Done (00:01:16)
Started updating job uaa_z1 > uaa_z1/0
(9792327b-05c6-4bd8-8434-8a48fa364a91) (canary). Done (00:00:52)
Started updating job api_z1 > api_z1/0
(03f63cee-dfc1-4a93-9ddc-1bf129eb7396) (canary). Failed: 'api_z1/0
(03f63cee-dfc1-4a93-9ddc-1bf129eb7396)' is not running after update. Review
logs for failed jobs: cloud_controller_ng, cloud_controller_worker_local_1,
cloud_controller_worker_local_2, nginx_cc, cloud_controller_worker_1
(00:11:11)

Error 400007: 'api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396)' is not
running after update. Review logs for failed jobs: cloud_controller_ng,
cloud_controller_worker_local_1, cloud_controller_worker_local_2, nginx_cc,
cloud_controller_worker_1



"bosh vms" command seem to show blobstore be up and running:

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| VM
| State | AZ | VM Type | IPs |

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396)
| failing | n/a | large_z1 | 192.168.10.54 |
| blobstore_z1/0 (28f3cf80-502b-4e83-8dff-ba3477894a13)
| running | n/a | medium_z1 | 192.168.10.52 |
| consul_z1/0 (d248a6da-476a-4fb1-bc08-01ce23c94eb4)
| running | n/a | small_z1 | 192.168.10.76 |
| doppler_z1/0 (ec0232f4-e736-4bbf-9fe4-b35b3d0c673d)
| running | n/a | medium_z1 | 192.168.10.57 |
| etcd_z1/0 (ef27b75d-e0d0-4a95-8ec1-4b0c9371fa93)
| running | n/a | medium_z1 | 192.168.10.72 |
| ha_proxy_z1/0 (f416b7c5-24d3-4d58-bbf9-066740971065)
| running | n/a | router_z1 | 192.168.10.64 |
|
| | | | XX.XX.XX.XX |
| hm9000_z1/0 (c50fbd8e-1c8f-4b96-a1c8-a427fdabba79)
| running | n/a | medium_z1 | 192.168.10.55 |
| loggregator_trafficcontroller_z1/0
(e187f9c1-dc5d-4d9b-850d-ca6118513228) | running | n/a | small_z1 |
192.168.10.58 |
| nats_z1/0 (074d00e8-96b3-4b7c-af1a-347549ff51dd)
| running | n/a | medium_z1 | 192.168.10.66 |
| postgres_z1/0 (601262d5-97d2-436c-b67d-1e63680f9dc6)
| running | n/a | medium_z1 | 192.168.10.68 |
| router_z1/0 (4da988b9-5cbb-4a6f-b082-5c4ed25e6a7c)
| running | n/a | router_z1 | 192.168.10.69 |
| runner_z1/0 (4bce5874-2edc-4a0f-a074-0a136933de65)
| running | n/a | runner_z1 | 192.168.10.56 |
| stats_z1/0 (20c1f5e9-3b7a-4c84-a940-3628cadf942c)
| running | n/a | small_z1 | 192.168.10.51 |
| uaa_z1/0 (9792327b-05c6-4bd8-8434-8a48fa364a91)
| running | n/a | medium_z1 | 192.168.10.53 |

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+

The vcap job logs also seem to show it thinks that the blobstore node is
healthey,

However, when I ssh to the api node, I see problems:
"monit summary" yields:
Process 'cloud_controller_ng' not monitored
Process 'cloud_controller_worker_local_1' not monitored
Process 'cloud_controller_worker_local_2' not monitored
Process 'nginx_cc' not monitored
Process 'cloud_controller_migration' running
Process 'cloud_controller_clock' running
Process 'cloud_controller_worker_1' running
Process 'metron_agent' running
Process 'statsd-injector' running
Process 'route_registrar' running
System 'system_localhost' running


, /var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng_ctl.log looks
like this:
[2016-06-24 12:32:51+0000] Checking for blobstore availability
[2016-06-24 12:33:38+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 12:33:38 UTC 2016 --------------
[2016-06-24 12:33:38+0000] Checking for blobstore availability
[2016-06-24 12:34:24+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 12:34:24 UTC 2016 --------------
[2016-06-24 12:34:24+0000] Checking for blobstore availability
[2016-06-24 12:34:41+0000] Blobstore is not available

Other jobs on the same node are failing as well, all with similar
messages both in stderr and stdout logs.
However, I do not see any additional information, such as an IP address
or URL that I can test.

Additionally on the blobstore node, both "monit" and the vcap job logs
seem to think that it is healthy.

How should I go about debugging this error? (I am looking for more
verbose information on the error "blobstore is not available")

I do not mind tearing down this deployment and starting over with more
verbose flags to expose the problem.
It seems to occur consitantly.

PS: my stemcell is
bosh-stemcell-3232.6-openstack-kvm-ubuntu-trusty-go_agent.tgz , and I have
not done anything special to configure blobstore on cf-stub besides adding
TLS Ca's and certs. Also, I could find no instructions on how to add these
certs to cf-stub, so I just added some self-generated keys & certs.



Amit Kumar Gupta
 

Those stubs are out of date. Our OpenStack stubs may remain out of date
for a bit while we find more resource to support automated testing of the
stubs against and OpenStack environment. In the mean time, you should be
fine to just delete those offending lines. Please try that and let me know
if it works.

Best,
Amit

On Fri, Jun 24, 2016 at 10:40 AM, Pramod Mandagere <nagapramod(a)gmail.com>
wrote:

Hi Amit,
We have the exact same issue and our cf-stub looking exactly like the one
u pointed out. Stuck with this issue,
Regards,
Pramod

On Fri, Jun 24, 2016 at 10:26 AM Amit Gupta <agupta(a)pivotal.io> wrote:

Hi Eyal,

No need to tear things down. By any chance, did you generate your
manifest using a stub? Do you have something that looks like this in your
manifest?


<https://github.com/cloudfoundry/cf-release/blob/3290ed8/spec/fixtures/openstack/cf-stub.yml#L155-L187>
https://github.com/cloudfoundry/cf-release/blob/3290ed8/spec/fixtures/openstack/cf-stub.yml#L155-L187

Best,
Amit

On Fri, Jun 24, 2016 at 6:54 AM, Eyal Shalev <eshalev(a)cisco.com> wrote:

Hello, I am trying to deploy cloud foundry on openstack.

running "bosh deploy" gives me the following error:
Started updating job consul_z1 > consul_z1/0
(d248a6da-476a-4fb1-bc08-01ce23c94eb4) (canary). Done (00:01:03)
Started updating job ha_proxy_z1 > ha_proxy_z1/0
(f416b7c5-24d3-4d58-bbf9-066740971065) (canary). Done (00:00:43)
Started updating job nats_z1 > nats_z1/0
(074d00e8-96b3-4b7c-af1a-347549ff51dd) (canary). Done (00:00:44)
Started updating job etcd_z1 > etcd_z1/0
(ef27b75d-e0d0-4a95-8ec1-4b0c9371fa93) (canary). Done (00:01:23)
Started updating job stats_z1 > stats_z1/0
(20c1f5e9-3b7a-4c84-a940-3628cadf942c) (canary). Done (00:00:47)
Started updating job blobstore_z1 > blobstore_z1/0
(28f3cf80-502b-4e83-8dff-ba3477894a13) (canary). Done (00:01:20)
Started updating job postgres_z1 > postgres_z1/0
(601262d5-97d2-436c-b67d-1e63680f9dc6) (canary). Done (00:01:16)
Started updating job uaa_z1 > uaa_z1/0
(9792327b-05c6-4bd8-8434-8a48fa364a91) (canary). Done (00:00:52)
Started updating job api_z1 > api_z1/0
(03f63cee-dfc1-4a93-9ddc-1bf129eb7396) (canary). Failed: 'api_z1/0
(03f63cee-dfc1-4a93-9ddc-1bf129eb7396)' is not running after update. Review
logs for failed jobs: cloud_controller_ng, cloud_controller_worker_local_1,
cloud_controller_worker_local_2, nginx_cc, cloud_controller_worker_1
(00:11:11)

Error 400007: 'api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396)' is not
running after update. Review logs for failed jobs: cloud_controller_ng,
cloud_controller_worker_local_1, cloud_controller_worker_local_2, nginx_cc,
cloud_controller_worker_1



"bosh vms" command seem to show blobstore be up and running:

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| VM
| State | AZ | VM Type | IPs |

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| api_z1/0 (03f63cee-dfc1-4a93-9ddc-1bf129eb7396)
| failing | n/a | large_z1 | 192.168.10.54 |
| blobstore_z1/0 (28f3cf80-502b-4e83-8dff-ba3477894a13)
| running | n/a | medium_z1 | 192.168.10.52 |
| consul_z1/0 (d248a6da-476a-4fb1-bc08-01ce23c94eb4)
| running | n/a | small_z1 | 192.168.10.76 |
| doppler_z1/0 (ec0232f4-e736-4bbf-9fe4-b35b3d0c673d)
| running | n/a | medium_z1 | 192.168.10.57 |
| etcd_z1/0 (ef27b75d-e0d0-4a95-8ec1-4b0c9371fa93)
| running | n/a | medium_z1 | 192.168.10.72 |
| ha_proxy_z1/0 (f416b7c5-24d3-4d58-bbf9-066740971065)
| running | n/a | router_z1 | 192.168.10.64 |
|
| | | | XX.XX.XX.XX |
| hm9000_z1/0 (c50fbd8e-1c8f-4b96-a1c8-a427fdabba79)
| running | n/a | medium_z1 | 192.168.10.55 |
| loggregator_trafficcontroller_z1/0
(e187f9c1-dc5d-4d9b-850d-ca6118513228) | running | n/a | small_z1 |
192.168.10.58 |
| nats_z1/0 (074d00e8-96b3-4b7c-af1a-347549ff51dd)
| running | n/a | medium_z1 | 192.168.10.66 |
| postgres_z1/0 (601262d5-97d2-436c-b67d-1e63680f9dc6)
| running | n/a | medium_z1 | 192.168.10.68 |
| router_z1/0 (4da988b9-5cbb-4a6f-b082-5c4ed25e6a7c)
| running | n/a | router_z1 | 192.168.10.69 |
| runner_z1/0 (4bce5874-2edc-4a0f-a074-0a136933de65)
| running | n/a | runner_z1 | 192.168.10.56 |
| stats_z1/0 (20c1f5e9-3b7a-4c84-a940-3628cadf942c)
| running | n/a | small_z1 | 192.168.10.51 |
| uaa_z1/0 (9792327b-05c6-4bd8-8434-8a48fa364a91)
| running | n/a | medium_z1 | 192.168.10.53 |

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+

The vcap job logs also seem to show it thinks that the blobstore node is
healthey,

However, when I ssh to the api node, I see problems:
"monit summary" yields:
Process 'cloud_controller_ng' not monitored
Process 'cloud_controller_worker_local_1' not monitored
Process 'cloud_controller_worker_local_2' not monitored
Process 'nginx_cc' not monitored
Process 'cloud_controller_migration' running
Process 'cloud_controller_clock' running
Process 'cloud_controller_worker_1' running
Process 'metron_agent' running
Process 'statsd-injector' running
Process 'route_registrar' running
System 'system_localhost' running


, /var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng_ctl.log
looks like this:
[2016-06-24 12:32:51+0000] Checking for blobstore availability
[2016-06-24 12:33:38+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 12:33:38 UTC 2016 --------------
[2016-06-24 12:33:38+0000] Checking for blobstore availability
[2016-06-24 12:34:24+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 12:34:24 UTC 2016 --------------
[2016-06-24 12:34:24+0000] Checking for blobstore availability
[2016-06-24 12:34:41+0000] Blobstore is not available

Other jobs on the same node are failing as well, all with similar
messages both in stderr and stdout logs.
However, I do not see any additional information, such as an IP address
or URL that I can test.

Additionally on the blobstore node, both "monit" and the vcap job logs
seem to think that it is healthy.

How should I go about debugging this error? (I am looking for more
verbose information on the error "blobstore is not available")

I do not mind tearing down this deployment and starting over with more
verbose flags to expose the problem.
It seems to occur consitantly.

PS: my stemcell is
bosh-stemcell-3232.6-openstack-kvm-ubuntu-trusty-go_agent.tgz , and I have
not done anything special to configure blobstore on cf-stub besides adding
TLS Ca's and certs. Also, I could find no instructions on how to add these
certs to cf-stub, so I just added some self-generated keys & certs.



Eyal Shalev
 

Hello Amit,
I used the latest stub from GIT. As I was working last week, I could swear I saw the stub change as I was working...
I guess that that explains it.
Should I remove the lines in the highlighted block?

Thanks,
Eyal


Amit Kumar Gupta
 

Yes sir, please try removing them and regenerating your manifest before
redeploying.

Cheers,
Amit

On Friday, June 24, 2016, Eyal Shalev <eshalev(a)cisco.com> wrote:

Hello Amit,
I used the latest stub from GIT. As I was working last week, I could swear
I saw the stub change as I was working...
I guess that that explains it.
Should I remove the lines in the highlighted block?

Thanks,
Eyal


Eyal Shalev
 

Hello Amit,
I have removed the lines that you have marked.

Now I am getting a different error...
Process 'consul_agent' running
Process 'cloud_controller_ng' Connection failed
Process 'cloud_controller_worker_local_1' not monitored
Process 'cloud_controller_worker_local_2' not monitored
Process 'nginx_cc' initializing
Process 'cloud_controller_migration' running
Process 'metron_agent' running
Process 'statsd-injector' running
Process 'route_registrar' running
System 'system_localhost' running


The blob store is available, but still the process fails:
[2016-06-24 20:13:17+0000] ------------ STARTING cloud_controller_worker_ctl at Fri Jun 24 20:13:17 UTC 2016 --------------
[2016-06-24 20:13:17+0000] Removing stale pidfile
[2016-06-24 20:13:17+0000] Checking for blobstore availability
[2016-06-24 20:13:17+0000] Blobstore is available
[2016-06-24 20:13:18+0000] Buildpacks installation failed


and also:
[2016-06-24 20:33:16+0000] ------------ STARTING cloud_controller_ng_ctl at Fri Jun 24 20:33:16 UTC 2016 --------------
[2016-06-24 20:33:16+0000] Checking for blobstore availability
[2016-06-24 20:33:16+0000] Blobstore is available
[2016-06-24 20:33:38+0000] Killing /var/vcap/sys/run/cloud_controller_ng/cloud_controller_ng.pid: 28368
[2016-06-24 20:33:38+0000] Stopped
[2016-06-24 20:33:39+0000] ------------ STARTING cloud_controller_ng_ctl at Fri Jun 24 20:33:39 UTC 2016 --------------
[2016-06-24 20:33:39+0000] Checking for blobstore availability
[2016-06-24 20:33:39+0000] Blobstore is available
[2016-06-24 20:34:02+0000] Killing /var/vcap/sys/run/cloud_controller_ng/cloud_controller_ng.pid: 28818
[2016-06-24 20:34:03+0000] Stopped

Which brings me to another question:
Do you have a stable old release of CF for openstack? I don't mind downgrading, if the new releases are unstable. If that is not possible, can you post a valid cf-stub.yml without the need for any manual removal of invalid lines? (that way I have a reference to what tried and tested stub should look like)

Thanks alot for your help,
Eyal


Ronak Banka
 

Hello Eyal,

Can you paste the domain properties from your final manifest ( system
domain & app domain part).

Thanks
Ronak

On Saturday, 25 June 2016, Eyal Shalev <eshalev(a)cisco.com> wrote:

Hello Amit,
I have removed the lines that you have marked.

Now I am getting a different error...
Process 'consul_agent' running
Process 'cloud_controller_ng' Connection failed
Process 'cloud_controller_worker_local_1' not monitored
Process 'cloud_controller_worker_local_2' not monitored
Process 'nginx_cc' initializing
Process 'cloud_controller_migration' running
Process 'metron_agent' running
Process 'statsd-injector' running
Process 'route_registrar' running
System 'system_localhost' running


The blob store is available, but still the process fails:
[2016-06-24 20:13:17+0000] ------------ STARTING
cloud_controller_worker_ctl at Fri Jun 24 20:13:17 UTC 2016 --------------
[2016-06-24 20:13:17+0000] Removing stale pidfile
[2016-06-24 20:13:17+0000] Checking for blobstore availability
[2016-06-24 20:13:17+0000] Blobstore is available
[2016-06-24 20:13:18+0000] Buildpacks installation failed


and also:
[2016-06-24 20:33:16+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 20:33:16 UTC 2016 --------------
[2016-06-24 20:33:16+0000] Checking for blobstore availability
[2016-06-24 20:33:16+0000] Blobstore is available
[2016-06-24 20:33:38+0000] Killing
/var/vcap/sys/run/cloud_controller_ng/cloud_controller_ng.pid: 28368
[2016-06-24 20:33:38+0000] Stopped
[2016-06-24 20:33:39+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 20:33:39 UTC 2016 --------------
[2016-06-24 20:33:39+0000] Checking for blobstore availability
[2016-06-24 20:33:39+0000] Blobstore is available
[2016-06-24 20:34:02+0000] Killing
/var/vcap/sys/run/cloud_controller_ng/cloud_controller_ng.pid: 28818
[2016-06-24 20:34:03+0000] Stopped

Which brings me to another question:
Do you have a stable old release of CF for openstack? I don't mind
downgrading, if the new releases are unstable. If that is not possible, can
you post a valid cf-stub.yml without the need for any manual removal of
invalid lines? (that way I have a reference to what tried and tested stub
should look like)

Thanks alot for your help,
Eyal


Eyal Shalev
 

As I was not planning on sigining exposing this deployment to the outside, I was planning on using HAPROXY through it's floating IP.

I did not set a domain, because I have not yet interfaced with my DNS server. So I just left the domain as "DOMAIN"

When looking at manifest generated from the stuff, i see the following 2 lines repeating sveral times:
private_endpoint: https://blobstore.service.cf.internal
public_endpoint: http://blobstore.DOMAIN

Could it be that the api-node is trying to access blobstore through the haproxy node?

Is there anyway to avoid setting up a domain for CF? (as this is just a lab experiment)


Ronak Banka
 

You can use xip.io for temporary domain, try generating manifest again and
deploy.

On Saturday, 25 June 2016, Eyal Shalev <eshalev(a)cisco.com> wrote:

As I was not planning on sigining exposing this deployment to the outside,
I was planning on using HAPROXY through it's floating IP.

I did not set a domain, because I have not yet interfaced with my DNS
server. So I just left the domain as "DOMAIN"

When looking at manifest generated from the stuff, i see the following 2
lines repeating sveral times:
private_endpoint: https://blobstore.service.cf.internal
public_endpoint: http://blobstore.DOMAIN

Could it be that the api-node is trying to access blobstore through the
haproxy node?

Is there anyway to avoid setting up a domain for CF? (as this is just a
lab experiment)


Tom Sherrod <tom.sherrod@...>
 

I got past the api_z1 failure by adding:
- name: consul_agent
release: cf

to the api_z1 section.

I'm now confused by having to remove the other lines. I will need to test
this out.

Tom

On Fri, Jun 24, 2016 at 4:39 PM, Eyal Shalev <eshalev(a)cisco.com> wrote:

Hello Amit,
I have removed the lines that you have marked.

Now I am getting a different error...
Process 'consul_agent' running
Process 'cloud_controller_ng' Connection failed
Process 'cloud_controller_worker_local_1' not monitored
Process 'cloud_controller_worker_local_2' not monitored
Process 'nginx_cc' initializing
Process 'cloud_controller_migration' running
Process 'metron_agent' running
Process 'statsd-injector' running
Process 'route_registrar' running
System 'system_localhost' running


The blob store is available, but still the process fails:
[2016-06-24 20:13:17+0000] ------------ STARTING
cloud_controller_worker_ctl at Fri Jun 24 20:13:17 UTC 2016 --------------
[2016-06-24 20:13:17+0000] Removing stale pidfile
[2016-06-24 20:13:17+0000] Checking for blobstore availability
[2016-06-24 20:13:17+0000] Blobstore is available
[2016-06-24 20:13:18+0000] Buildpacks installation failed


and also:
[2016-06-24 20:33:16+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 20:33:16 UTC 2016 --------------
[2016-06-24 20:33:16+0000] Checking for blobstore availability
[2016-06-24 20:33:16+0000] Blobstore is available
[2016-06-24 20:33:38+0000] Killing
/var/vcap/sys/run/cloud_controller_ng/cloud_controller_ng.pid: 28368
[2016-06-24 20:33:38+0000] Stopped
[2016-06-24 20:33:39+0000] ------------ STARTING cloud_controller_ng_ctl
at Fri Jun 24 20:33:39 UTC 2016 --------------
[2016-06-24 20:33:39+0000] Checking for blobstore availability
[2016-06-24 20:33:39+0000] Blobstore is available
[2016-06-24 20:34:02+0000] Killing
/var/vcap/sys/run/cloud_controller_ng/cloud_controller_ng.pid: 28818
[2016-06-24 20:34:03+0000] Stopped

Which brings me to another question:
Do you have a stable old release of CF for openstack? I don't mind
downgrading, if the new releases are unstable. If that is not possible, can
you post a valid cf-stub.yml without the need for any manual removal of
invalid lines? (that way I have a reference to what tried and tested stub
should look like)

Thanks alot for your help,
Eyal


Eyal Shalev
 

Hello Ronak,
I have used XIP.
It does not seem to have helped.
I got much the same result.
API node does not complete the job.

looking at the vcap logs looks similar to above:
cloud_controller_ng_ctl reports that Blobstore is accessible, but it is still stuck in an endless restart loop.

looking at the generated cf-deployment.yml as you requested I find the following lines ( I have obfuscated the public ip):
properties:
acceptance_tests: null
app_domains:
- APPDOMAIN
...
packages:
app_package_directory_key: 10.60.18.186.xip.io-cc-packages
blobstore_type: webdav
...
public_endpoint: http://blobstore.10.60.18.186.xip.io



Also, in my original cf-stub I configured the properties as such:
properties:
domain: 10.60.18.186.xip.io
system_domain: SYSDOMAIN
system_domain_organization: EYALDOMAIN
app_domains:
- APPDOMAIN

So unless I was using improperly, adding the xip domain did not seem to help.


Eyal Shalev
 

Following up on my previously posted config,
I found the following message in /var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.log (The error log was empty... ):

Problem is that I don't understand ow "APPDOMAIN" violates the rules in the error message

{"timestamp":1467040322.8311825,"message":"Encountered error: Error for shared domain name APPDOMAIN: name can contain multiple subdomains, each having only alphanumeric characters and hyphens of up to 63 characters, see RFC 1035.\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/model/base.rb:1543:in `save'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/models/runtime/shared_domain.rb:35:in `block in find_or_create'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/transactions.rb:134:in `_transaction'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/transactions.rb:108:in `block in transaction'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/connecting.rb:249:in `block in synchron
ize'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/connection_pool/threaded.rb:103:in `hold'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/connecting.rb:249:in `synchronize'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/transactions.rb:97:in `transaction'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/models/runtime/shared_domain.rb:27:in `find_or_create'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/seeds.rb:57:in `block in create_seed_domains'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/seeds.rb:56
:in `
each'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/seeds.rb:56:in `create_seed_domains'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/seeds.rb:9:in `write_seed_data'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/runner.rb:88:in `block in run!'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:193:in `run_machine'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:193:in `run'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d
.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/runner.rb:82:in `run!'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/bin/cloud_controller:8:in `<main>'","log_level":"error","source":"cc.runner","data":{},"thread_id":47219093041420,"fiber_id":47219133477120,"process_id":27911,"file":"/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/runner.rb","lineno":102,"method":"rescue in block in run!"}


Amit Kumar Gupta
 

Please try replacing all occurrences of "SYSTEM_DOMAIN" in your manifest
with "sys.10.60.18.186.xip.io" and all instances of "APP_DOMAIN" with "
apps.10.60.18.186.xip.io".

On Mon, Jun 27, 2016 at 8:18 AM, Eyal Shalev <eshalev(a)cisco.com> wrote:

Following up on my previously posted config,
I found the following message in
/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.log (The error
log was empty... ):

Problem is that I don't understand ow "APPDOMAIN" violates the rules in
the error message

{"timestamp":1467040322.8311825,"message":"Encountered error: Error for
shared domain name APPDOMAIN: name can contain multiple subdomains, each
having only alphanumeric characters and hyphens of up to 63 characters, see
RFC
1035.\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/model/base.rb:1543:in
`save'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/models/runtime/shared_domain.rb:35:in
`block in
find_or_create'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/transactions.rb:134:in
`_transaction'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/transactions.rb:108:in
`block in
transaction'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/connecting.rb:249:in
`block in
synchron
ize'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/connection_pool/threaded.rb:103:in
`hold'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/connecting.rb:249:in
`synchronize'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/sequel-4.29.0/lib/sequel/database/transactions.rb:97:in
`transaction'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/models/runtime/shared_domain.rb:27:in
`find_or_create'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/seeds.rb:57:in
`block in
create_seed_domains'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/see
ds.rb:56
:in `
each'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/seeds.rb:56:in
`create_seed_domains'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/seeds.rb:9:in
`write_seed_data'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/runner.rb:88:in
`block in
run!'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:193:in
`run_machine'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/2.3.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:193:in
`run'\n/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e154
2502be4d
.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/runner.rb:82:in
`run!'\n/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/bin/cloud_controller:8:in
`<main>'","log_level":"error","source":"cc.runner","data":{},"thread_id":47219093041420,"fiber_id":47219133477120,"process_id":27911,"file":"/var/vcap/data/packages/cloud_controller_ng/f87cb49aa2cd87792cb9c2211a79e1542502be4d.1-08d000452ce2b287c00720587f20dc62976a73b6/cloud_controller_ng/lib/cloud_controller/runner.rb","lineno":102,"method":"rescue
in block in run!"}


Eyal Shalev
 

Can I replace it in the manifest stub and rerun generate? or do I need to replace it in the generated manifest?


Amit Kumar Gupta
 

You can replace it in the stub and rerun generate.

On Mon, Jun 27, 2016 at 11:10 AM, Eyal Shalev <eshalev(a)cisco.com> wrote:

Can I replace it in the manifest stub and rerun generate? or do I need to
replace it in the generated manifest?


Eyal Shalev
 

That works, but now I cannot connect the cf client.
I am getting a 404.
It does not explicilty say so in the docs, so I assuming that the API endoint is:
https://api.domain_for_haproxy_node is this correct?

my client is not accessing cf from within the security groups (an openstack limitation in the deployment that I use). As such I only opened ports 80,443,4443 & 2222 in the firewall . [internally all tcp traffic is enabled]

These are the commands that I ran (see the 404):

bosh vms
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Acting as user 'admin' on 'my-bosh'
Deployment 'ENVIRONMENT'

Director task 33

Task 33 done

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| VM | State | AZ | VM Type | IPs |
+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| api_worker_z1/0 (e9f91b0e-ad01-4053-975f-47715023b4cb) | running | n/a | small_z1 | 192.168.10.56 |
| api_z1/0 (34bf56c5-5bcc-496c-859d-c56a917a8901) | running | n/a | large_z1 | 192.168.10.54 |
| blobstore_z1/0 (4f12e375-1003-4a66-ac8b-a5eb5571f920) | running | n/a | medium_z1 | 192.168.10.52 |
| clock_global/0 (f099a159-9ae2-4d92-b88b-d0d55fdd5f3e) | running | n/a | medium_z1 | 192.168.10.55 |
| consul_z1/0 (ff08d8b8-fbba-474c-9640-a03577acf586) | running | n/a | small_z1 | 192.168.10.76 |
| doppler_z1/0 (437a1ab7-b6b8-4ae2-be0f-cd75b62b8228) | running | n/a | medium_z1 | 192.168.10.59 |
| etcd_z1/0 (a2527fc7-3e3e-489c-8ea0-cd3a443f1c7d) | running | n/a | medium_z1 | 192.168.10.72 |
| ha_proxy_z1/0 (e4fd4fdd-8d5e-4e85-90e5-6774f277c4a8) | running | n/a | router_z1 | 192.168.10.64 |
| | | | | 10.60.18.186 |
| hm9000_z1/0 (14d70eac-2687-4961-99f7-3f3f8f4e55c8) | running | n/a | medium_z1 | 192.168.10.57 |
| loggregator_trafficcontroller_z1/0 (ea59e739-15f9-4149-8d1a-cca3b1fbfb55) | running | n/a | small_z1 | 192.168.10.60 |
| nats_z1/0 (7a31a162-e5a3-4b29-82f8-fe76897d587d) | running | n/a | medium_z1 | 192.168.10.66 |
| postgres_z1/0 (8ed03c6f-8ea5-403a-bbb5-f1bc091b96b4) | running | n/a | medium_z1 | 192.168.10.68 |
| router_z1/0 (9749bd15-48f3-4b7d-a82e-d0aac34554fe) | running | n/a | router_z1 | 192.168.10.69 |
| runner_z1/0 (54e20fba-3185-45d2-9f3b-8da00de495f5) | running | n/a | runner_z1 | 192.168.10.58 |
| stats_z1/0 (9a107f21-7eb3-4df8-ac7b-13bd1d709e1f) | running | n/a | small_z1 | 192.168.10.51 |
| uaa_z1/0 (9b58319d-451a-4726-a4bf-e9431a467f47) | running | n/a | medium_z1 | 192.168.10.53 |
+---------------------------------------------------------------------------+---------+-----+-----------+---------------+

VMs total: 16


cf api api.10.60.18.186.xip.io --skip-ssl-validation
Setting api endpoint to api.10.60.18.186.xip.io...
OK


API endpoint: https://api.10.60.18.186.xip.io (API version: 2.56.0)
Not logged in. Use 'cf login' to log in.



cf -v login --skip-ssl-validation
API endpoint: https://api.10.60.18.186.xip.io

REQUEST: [2016-06-27T21:36:51Z]
GET /v2/info HTTP/1.1
Host: api.10.60.18.186.xip.io
Accept: application/json
Content-Type: application/json
User-Agent: go-cli 6.19.0+b29b4e0 / linux



RESPONSE: [2016-06-27T21:36:51Z]
HTTP/1.1 200 OK
Content-Length: 580
Content-Type: application/json;charset=utf-8
Date: Mon, 27 Jun 2016 21:36:57 GMT
Server: nginx
X-Content-Type-Options: nosniff
X-Vcap-Request-Id: 9170d9a4-3dce-45aa-7576-377a6d9c2940
X-Vcap-Request-Id: 9170d9a4-3dce-45aa-7576-377a6d9c2940::a4533964-ae04-4aa1-93ef-4626f4336187

{"name":"","build":"","support":"http://support.cloudfoundry.com","version":0,"description":"","authorization_endpoint":"http://login.sysdomain.10.60.18.186.xip.io","token_endpoint":"https://uaa.10.60.18.186.xip.io","min_cli_version":null,"min_recommended_cli_version":null,"api_version":"2.56.0","app_ssh_endpoint":"ssh.sysdomain.10.60.18.186.xip.io:2222","app_ssh_host_key_fingerprint":null,"app_ssh_oauth_client":"ssh-proxy","logging_endpoint":"wss://loggregator.sysdomain.10.60.18.186.xip.io:4443","doppler_logging_endpoint":"wss://doppler.sysdomain.10.60.18.186.xip.io:4443"}

REQUEST: [2016-06-27T21:36:52Z]
GET /login HTTP/1.1
Host: login.sysdomain.10.60.18.186.xip.io
Accept: application/json
Content-Type: application/json
User-Agent: go-cli 6.19.0+b29b4e0 / linux



RESPONSE: [2016-06-27T21:36:52Z]
HTTP/1.1 404 Not Found
Content-Length: 87
Content-Type: text/plain; charset=utf-8
Date: Mon, 27 Jun 2016 21:36:57 GMT
X-Cf-Routererror: unknown_route
X-Content-Type-Options: nosniff
X-Vcap-Request-Id: 4419650f-6a06-4b9d-5475-0f2790934fd5

404 Not Found: Requested route ('login.sysdomain.10.60.18.186.xip.io') does not exist.



API endpoint: https://api.10.60.18.186.xip.io (API version: 2.56.0)
Not logged in. Use 'cf login' to log in.
FAILED
Server error, status code: 404, error code: , message:


Ronak Banka
 

Eyal ,

In your final manifest , can you check what are the properties under
route-registrar for uaa job ?

https://github.com/cloudfoundry/cf-release/blob/master/templates/cf.yml#L194

On Tue, Jun 28, 2016 at 6:53 AM, Eyal Shalev <eshalev(a)cisco.com> wrote:

That works, but now I cannot connect the cf client.
I am getting a 404.
It does not explicilty say so in the docs, so I assuming that the API
endoint is:
https://api.domain_for_haproxy_node is this correct?

my client is not accessing cf from within the security groups (an
openstack limitation in the deployment that I use). As such I only opened
ports 80,443,4443 & 2222 in the firewall . [internally all tcp traffic is
enabled]

These are the commands that I ran (see the 404):

bosh vms
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Acting as user 'admin' on 'my-bosh'
Deployment 'ENVIRONMENT'

Director task 33

Task 33 done


+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| VM
| State | AZ | VM Type | IPs |

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+
| api_worker_z1/0 (e9f91b0e-ad01-4053-975f-47715023b4cb)
| running | n/a | small_z1 | 192.168.10.56 |
| api_z1/0 (34bf56c5-5bcc-496c-859d-c56a917a8901)
| running | n/a | large_z1 | 192.168.10.54 |
| blobstore_z1/0 (4f12e375-1003-4a66-ac8b-a5eb5571f920)
| running | n/a | medium_z1 | 192.168.10.52 |
| clock_global/0 (f099a159-9ae2-4d92-b88b-d0d55fdd5f3e)
| running | n/a | medium_z1 | 192.168.10.55 |
| consul_z1/0 (ff08d8b8-fbba-474c-9640-a03577acf586)
| running | n/a | small_z1 | 192.168.10.76 |
| doppler_z1/0 (437a1ab7-b6b8-4ae2-be0f-cd75b62b8228)
| running | n/a | medium_z1 | 192.168.10.59 |
| etcd_z1/0 (a2527fc7-3e3e-489c-8ea0-cd3a443f1c7d)
| running | n/a | medium_z1 | 192.168.10.72 |
| ha_proxy_z1/0 (e4fd4fdd-8d5e-4e85-90e5-6774f277c4a8)
| running | n/a | router_z1 | 192.168.10.64 |
|
| | | | 10.60.18.186 |
| hm9000_z1/0 (14d70eac-2687-4961-99f7-3f3f8f4e55c8)
| running | n/a | medium_z1 | 192.168.10.57 |
| loggregator_trafficcontroller_z1/0
(ea59e739-15f9-4149-8d1a-cca3b1fbfb55) | running | n/a | small_z1 |
192.168.10.60 |
| nats_z1/0 (7a31a162-e5a3-4b29-82f8-fe76897d587d)
| running | n/a | medium_z1 | 192.168.10.66 |
| postgres_z1/0 (8ed03c6f-8ea5-403a-bbb5-f1bc091b96b4)
| running | n/a | medium_z1 | 192.168.10.68 |
| router_z1/0 (9749bd15-48f3-4b7d-a82e-d0aac34554fe)
| running | n/a | router_z1 | 192.168.10.69 |
| runner_z1/0 (54e20fba-3185-45d2-9f3b-8da00de495f5)
| running | n/a | runner_z1 | 192.168.10.58 |
| stats_z1/0 (9a107f21-7eb3-4df8-ac7b-13bd1d709e1f)
| running | n/a | small_z1 | 192.168.10.51 |
| uaa_z1/0 (9b58319d-451a-4726-a4bf-e9431a467f47)
| running | n/a | medium_z1 | 192.168.10.53 |

+---------------------------------------------------------------------------+---------+-----+-----------+---------------+

VMs total: 16


cf api api.10.60.18.186.xip.io --skip-ssl-validation
Setting api endpoint to api.10.60.18.186.xip.io...
OK


API endpoint: https://api.10.60.18.186.xip.io (API version: 2.56.0)
Not logged in. Use 'cf login' to log in.



cf -v login --skip-ssl-validation
API endpoint: https://api.10.60.18.186.xip.io

REQUEST: [2016-06-27T21:36:51Z]
GET /v2/info HTTP/1.1
Host: api.10.60.18.186.xip.io
Accept: application/json
Content-Type: application/json
User-Agent: go-cli 6.19.0+b29b4e0 / linux



RESPONSE: [2016-06-27T21:36:51Z]
HTTP/1.1 200 OK
Content-Length: 580
Content-Type: application/json;charset=utf-8
Date: Mon, 27 Jun 2016 21:36:57 GMT
Server: nginx
X-Content-Type-Options: nosniff
X-Vcap-Request-Id: 9170d9a4-3dce-45aa-7576-377a6d9c2940
X-Vcap-Request-Id:
9170d9a4-3dce-45aa-7576-377a6d9c2940::a4533964-ae04-4aa1-93ef-4626f4336187

{"name":"","build":"","support":"http://support.cloudfoundry.com
","version":0,"description":"","authorization_endpoint":"
http://login.sysdomain.10.60.18.186.xip.io","token_endpoint":"
https://uaa.10.60.18.186.xip.io
","min_cli_version":null,"min_recommended_cli_version":null,"api_version":"2.56.0","app_ssh_endpoint":"
ssh.sysdomain.10.60.18.186.xip.io:2222
","app_ssh_host_key_fingerprint":null,"app_ssh_oauth_client":"ssh-proxy","logging_endpoint":"wss://
loggregator.sysdomain.10.60.18.186.xip.io:4443
","doppler_logging_endpoint":"wss://
doppler.sysdomain.10.60.18.186.xip.io:4443"}

REQUEST: [2016-06-27T21:36:52Z]
GET /login HTTP/1.1
Host: login.sysdomain.10.60.18.186.xip.io
Accept: application/json
Content-Type: application/json
User-Agent: go-cli 6.19.0+b29b4e0 / linux



RESPONSE: [2016-06-27T21:36:52Z]
HTTP/1.1 404 Not Found
Content-Length: 87
Content-Type: text/plain; charset=utf-8
Date: Mon, 27 Jun 2016 21:36:57 GMT
X-Cf-Routererror: unknown_route
X-Content-Type-Options: nosniff
X-Vcap-Request-Id: 4419650f-6a06-4b9d-5475-0f2790934fd5

404 Not Found: Requested route ('login.sysdomain.10.60.18.186.xip.io')
does not exist.



API endpoint: https://api.10.60.18.186.xip.io (API version: 2.56.0)
Not logged in. Use 'cf login' to log in.
FAILED
Server error, status code: 404, error code: , message:


Eyal Shalev
 

It seems to have generated two of them even through I am not using 2 zones.
Also I see port 8080 mentioned somewhere in there, as mentioned before port 8080 is only opened internally in the security group (between the CF nodes). Should it also be opened up for the client? (what are the ports that the the client needs to function [I have identified ports 80 and 443] ).

Here is the config:

- instances: 1
name: uaa_z1
networks:
- name: cf1
properties:
consul:
agent:
services:
uaa: {}
metron_agent:
zone: z1
route_registrar:
routes:
- health_check:
name: uaa-healthcheck
script_path: /var/vcap/jobs/uaa/bin/health_check
name: uaa
port: 8080
registration_interval: 4s
tags:
component: uaa
uris:
- uaa.10.60.18.186.xip.io
- '*.uaa.10.60.18.186.xip.io'
- login.10.60.18.186.xip.io
- '*.login.10.60.18.186.xip.io'
uaa:
proxy:
servers:
- 192.168.10.69
resource_pool: medium_z1
templates:
- name: uaa
release: cf
- name: metron_agent
release: cf
- name: consul_agent
release: cf
- name: route_registrar
release: cf
- name: statsd-injector
release: cf
update: {}
- instances: 0
name: uaa_z2
networks:
- name: cf2
properties:
consul:
agent:
services:
uaa: {}
metron_agent:
zone: z2
route_registrar:
routes:
- health_check:
name: uaa-healthcheck
script_path: /var/vcap/jobs/uaa/bin/health_check
name: uaa
port: 8080
registration_interval: 4s
tags:
component: uaa
uris:
- uaa.10.60.18.186.xip.io
- '*.uaa.10.60.18.186.xip.io'
- login.10.60.18.186.xip.io
- '*.login.10.60.18.186.xip.io'
uaa:
proxy:
servers:
- 192.168.10.69
resource_pool: medium_z2
templates:
- name: uaa
release: cf
- name: metron_agent
release: cf
- name: consul_agent
release: cf
- name: route_registrar
release: cf
- name: statsd-injector
release: cf
update: {}


Ronak Banka
 

Regarding z2 number of instances are 0 so it is same as having just 1 zone.

For login route error , route registrar on uaa job is adding
login.10.60.18.186.xip.io to routes but from cloud controller config login
endpoint is http://login.sysdomain.10.60.18.186.xip.io" which is why you
are not able to login.

Can you check the route registrar merge , and replace with system domain
instead of domain.

On Tue, Jun 28, 2016 at 2:28 PM, Eyal Shalev <eshalev(a)cisco.com> wrote:

It seems to have generated two of them even through I am not using 2 zones.
Also I see port 8080 mentioned somewhere in there, as mentioned before
port 8080 is only opened internally in the security group (between the CF
nodes). Should it also be opened up for the client? (what are the ports
that the the client needs to function [I have identified ports 80 and 443]
).

Here is the config:

- instances: 1
name: uaa_z1
networks:
- name: cf1
properties:
consul:
agent:
services:
uaa: {}
metron_agent:
zone: z1
route_registrar:
routes:
- health_check:
name: uaa-healthcheck
script_path: /var/vcap/jobs/uaa/bin/health_check
name: uaa
port: 8080
registration_interval: 4s
tags:
component: uaa
uris:
- uaa.10.60.18.186.xip.io
- '*.uaa.10.60.18.186.xip.io'
- login.10.60.18.186.xip.io
- '*.login.10.60.18.186.xip.io'
uaa:
proxy:
servers:
- 192.168.10.69
resource_pool: medium_z1
templates:
- name: uaa
release: cf
- name: metron_agent
release: cf
- name: consul_agent
release: cf
- name: route_registrar
release: cf
- name: statsd-injector
release: cf
update: {}
- instances: 0
name: uaa_z2
networks:
- name: cf2
properties:
consul:
agent:
services:
uaa: {}
metron_agent:
zone: z2
route_registrar:
routes:
- health_check:
name: uaa-healthcheck
script_path: /var/vcap/jobs/uaa/bin/health_check
name: uaa
port: 8080
registration_interval: 4s
tags:
component: uaa
uris:
- uaa.10.60.18.186.xip.io
- '*.uaa.10.60.18.186.xip.io'
- login.10.60.18.186.xip.io
- '*.login.10.60.18.186.xip.io'
uaa:
proxy:
servers:
- 192.168.10.69
resource_pool: medium_z2
templates:
- name: uaa
release: cf
- name: metron_agent
release: cf
- name: consul_agent
release: cf
- name: route_registrar
release: cf
- name: statsd-injector
release: cf
update: {}