Date   
Re: Help Required : Stem Upgrade 3008 to 3155 is not working while deploying cloud foundry

Dmitriy Kalinin
 

seems like monit was not happy on that machine. i would recommend taking a look at monit summary and /var/vcap/monit/monit.log to see whats going on. there is nothing special about those stemcell version that would lead to this problem.

Sent from my iPhone

On Mar 14, 2016, at 10:03 PM, Arun Kumar Vinayagamoorthy -X (arvinaya - TECH MAHINDRA LIM at Cisco) <arvinaya(a)cisco.com> wrote:

Hi Team,

I was working for a customer where we need to upgrade stemcell 3008 to 3155 versions, so we are trying out it locally in our environment but its failing,

Please find the details below.

Deployed cloudfoundry intially with the stem cell version 3008...

While trying to upgrade the release to 3155,

Followed below commands
=====================
$bosh upload stemcell bosh-stemcell-3155-openstack-kvm-ubuntu-trusty-go_agent-raw.tgz
...
Stemcell uploaded and created.
Then, $bosh -n deploy its ending with below error..

Started updating job cloud_controller > cloud_controller/0.

Failed: Action Failed get_task: Task ef79facb-3fbb-4b29-470b-ffbfdf860d3c result: Stopping Monitored Services: Stopping service cloud_controller_clock: Sending stop request to monit: Request failed, response: Response{ StatusCode: 503, Status: '503 Service Unavailable' } (00:05:22)

Error 450001: Action Failed get_task: Task ef79facb-3fbb-4b29-470b-ffbfdf860d3c result: Stopping Monitored Services: Stopping service cloud_controller_clock: Sending stop request to monit: Request failed, response: Response{ StatusCode: 503, Status: '503 Service Unavailable' }

=================================

Please guide me, what need to be done for this issue..

Fresh Install works fine with stemcell 3008 but we need to upgrade the stemcell version in customer environment.

Thanks in Advance

--- Arun

Help Required : Stem Upgrade 3008 to 3155 is not working while deploying cloud foundry

Arun Kumar Vinayagamoorthy -X (arvinaya - TECH MAHINDRA LIM@Cisco) <arvinaya at cisco.com...>
 

Hi Team,

I was working for a customer where we need to upgrade stemcell 3008 to 3155 versions, so we are trying out it locally in our environment but its failing,

Please find the details below.

Deployed cloudfoundry intially with the stem cell version 3008...

While trying to upgrade the release to 3155,

Followed below commands
=====================
$bosh upload stemcell bosh-stemcell-3155-openstack-kvm-ubuntu-trusty-go_agent-raw.tgz
...<https://lists.cloudfoundry.org/archives/list/cf-bosh(a)lists.cloudfoundry.org/thread/ZBTYHO25XEKHIIGSCBCWLRTEZCJNOM7F/>
Stemcell uploaded and created.
Then, $bosh -n deploy its ending with below error..

Started updating job cloud_controller > cloud_controller/0.

Failed: Action Failed get_task: Task ef79facb-3fbb-4b29-470b-ffbfdf860d3c result: Stopping Monitored Services: Stopping service cloud_controller_clock: Sending stop request to monit: Request failed, response: Response{ StatusCode: 503, Status: '503 Service Unavailable' } (00:05:22)

Error 450001: Action Failed get_task: Task ef79facb-3fbb-4b29-470b-ffbfdf860d3c result: Stopping Monitored Services: Stopping service cloud_controller_clock: Sending stop request to monit: Request failed, response: Response{ StatusCode: 503, Status: '503 Service Unavailable' }

=================================

Please guide me, what need to be done for this issue..

Fresh Install works fine with stemcell 3008 but we need to upgrade the stemcell version in customer environment.

Thanks in Advance

--- Arun

回复:vm state change into 'unknown/unknown' after a while

于长江 <yuchangjiang at cmss.chinamobile.com...>
 

then, after this problem, when i ‘bosh deploy’, another problem came, how can i continue deploy ?


Deploying
---------


Director task 6259
Started preparing deployment
Started preparing deployment Binding deployment. Done (00:00:00)
Started preparing deployment Binding releases. Done (00:00:00)
Started preparing deployment Binding existing deployment. Failed: VM `5fe98f76-5207-471b-9286-204e1f855076' is out of sync: expected to be a part of deployment `cf' but is actually a part of deployment `' (00:00:00)


Error 400003: VM `5fe98f76-5207-471b-9286-204e1f855076' is out of sync: expected to be a part of deployment `cf' but is actually a part of deployment `'


Task 6259 error




于长江
15101057694


原始邮件
发件人:于长江yuchangjiang(a)cmss.chinamobile.com
收件人:cf-dev(a)lists.cloudfoundry.org; cf-bosh(a)lists.cloudfoundry.org
发送时间:2016年3月15日(周二) 11:11
主题:vm state change into 'unknown/unknown' after a while


hello everybody,
when ‘bosh deploy’ all vms looks well , after a few hours some vms state turn into ‘unknown/unknown’ , like this:


+------------------------------------+--------------------+-----------+--------------+
| VM | State | VM Type | IPs |
+------------------------------------+--------------------+-----------+--------------+
| unknown/unknown | unresponsive agent | | |
| unknown/unknown | running | | |
| consul_z1/0 | running | small_z1 | 10.120.1.53 |
| doppler_z1/0 | running | medium_z1 | 10.120.1.105 |
| etcd_z1/0 | running | medium_z1 | 10.120.1.49 |
| ha_proxy_z1/0 | running | router_z1 | 10.120.1.41 |
| | | | 10.133.0.233 |
| hm9000_z1/0 | running | medium_z1 | 10.120.1.103 |
| loggregator_trafficcontroller_z1/0 | running | small_z1 | 10.120.1.106 |
| nats_z1/0 | running | medium_z1 | 10.120.1.43 |
| nfs_z1/0 | running | medium_z1 | 10.120.1.44 |
| router_z1/0 | running | router_z1 | 10.120.1.46 |
| runner_z1/0 | running | runner_z1 | 10.120.1.104 |
| uaa_z1/0 | running | medium_z1 | 10.120.1.101 |
+------------------------------------+--------------------+-----------+--------------+


VMs total: 13


---------------------------------------------------------------------------------------------
then i login into the unknown vm, ‘monit summary’ display none result, there is nothing in directory ‘/var/vcap/jobs/‘, logs below:


/var/vcap/bosh/etc/monitrc:8: Warning: include files not found '/var/vcap/monit/job/*.monitrc'
The Monit daemon 5.2.4 uptime: 1h 3m


System 'system_5ad45340-005b-4f74-9a63-524cbe627634’ running


# ls /var/vcap/jobs/
consul_agent dea_logging_agent dea_next metron_agent
---------------------------------------------------------------------------------------------


someone meet this problem ?


---------------------------------------------------------------------------------------------
bosh version:v250
bosh-openstack-cpi-release: v4






于长江
15101057694

vm state change into 'unknown/unknown' after a while

于长江 <yuchangjiang at cmss.chinamobile.com...>
 

hello everybody,
when ‘bosh deploy’ all vms looks well , after a few hours some vms state turn into ‘unknown/unknown’ , like this:


+------------------------------------+--------------------+-----------+--------------+
| VM | State | VM Type | IPs |
+------------------------------------+--------------------+-----------+--------------+
| unknown/unknown | unresponsive agent | | |
| unknown/unknown | running | | |
| consul_z1/0 | running | small_z1 | 10.120.1.53 |
| doppler_z1/0 | running | medium_z1 | 10.120.1.105 |
| etcd_z1/0 | running | medium_z1 | 10.120.1.49 |
| ha_proxy_z1/0 | running | router_z1 | 10.120.1.41 |
| | | | 10.133.0.233 |
| hm9000_z1/0 | running | medium_z1 | 10.120.1.103 |
| loggregator_trafficcontroller_z1/0 | running | small_z1 | 10.120.1.106 |
| nats_z1/0 | running | medium_z1 | 10.120.1.43 |
| nfs_z1/0 | running | medium_z1 | 10.120.1.44 |
| router_z1/0 | running | router_z1 | 10.120.1.46 |
| runner_z1/0 | running | runner_z1 | 10.120.1.104 |
| uaa_z1/0 | running | medium_z1 | 10.120.1.101 |
+------------------------------------+--------------------+-----------+--------------+


VMs total: 13


---------------------------------------------------------------------------------------------
then i login into the unknown vm, ‘monit summary’ display none result, there is nothing in directory ‘/var/vcap/jobs/‘, logs below:


/var/vcap/bosh/etc/monitrc:8: Warning: include files not found '/var/vcap/monit/job/*.monitrc'
The Monit daemon 5.2.4 uptime: 1h 3m


System 'system_5ad45340-005b-4f74-9a63-524cbe627634’ running


# ls /var/vcap/jobs/
consul_agent dea_logging_agent dea_next metron_agent
---------------------------------------------------------------------------------------------


someone meet this problem ?


---------------------------------------------------------------------------------------------
bosh version:v250
bosh-openstack-cpi-release: v4






于长江
15101057694

Re: bosh deploy failed with error: undefined method `split'' for nil:NilClass''

Yunlong Yang
 

I was updating my CloudFoundry deployment.

bosh deploy failed with error: undefined method `split'' for nil:NilClass''

Yunlong Yang
 

Hi bosh experts,
I am facing a blocking issue. I was trying to upgrade the RAM and DISK in my resource_pool and DEA.


[The console log]
Director task 712
Started unknown
Started unknown > Binding deployment. Done (00:00:00)

Started preparing deployment
Started preparing deployment > Binding releases. Done (00:00:00)
Started preparing deployment > Binding existing deployment. Done (00:00:00)
Started preparing deployment > Binding resource pools. Done (00:00:00)
Started preparing deployment > Binding stemcells. Done (00:00:00)
Started preparing deployment > Binding templates. Done (00:00:00)
Started preparing deployment > Binding properties. Done (00:00:00)
Started preparing deployment > Binding unallocated VMs. Done (00:00:00)
Started preparing deployment > Binding instance networks. Done (00:00:00)

Started preparing package compilation > Finding packages to compile. Done (00:00:00)

Started preparing dns > Binding DNS. Done (00:00:00)

Started creating bound missing vms > runner_z1/0. Failed: Unknown CPI error 'Unknown' with message 'undefined method `split' for nil:NilClass' (00:00:03)

Error 100: Unknown CPI error 'Unknown' with message 'undefined method `split' for nil:NilClass'


[The log from "bosh task 712 --debug]
[2016-03-15 01:58:39 #10090] [create_missing_vm(runner_z1, 0/1)] ERROR -- DirectorJobRunner: error creating vm: Unknown CPI error 'Unknown' with message 'undefined method `split' for nil:NilClass'
I, [2016-03-15 01:58:39 #10090] [create_missing_vm(runner_z1, 0/1)] INFO -- DirectorJobRunner: Cleaning up the created VM due to an error: Unknown CPI error 'Unknown' with message 'undefined method `split' for nil:NilClass'
D, [2016-03-15 01:58:39 #10090] [] DEBUG -- DirectorJobRunner: Worker thread raised exception: Unknown CPI error 'Unknown' with message 'undefined method `split' for nil:NilClass' - /var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.3138.0/lib/cloud/external_cpi.rb:108:in `handle_error'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.3138.0/lib/cloud/external_cpi.rb:89:in `invoke_cpi_method'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.3138.0/lib/cloud/external_cpi.rb:51:in `create_vm'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3138.0/lib/bosh/director/vm_creator.rb:41:in `create'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3138.0/lib/bosh/director/resource_pool_updater.rb:51:in `create_missing_vm'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3138.0/lib/bosh/director/resource_pool_updater.rb:34:in `block (4 levels) in create_missing_vms'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.3138.0/lib/common/thread_formatter.rb:49:in `with_thread_name'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3138.0/lib/bosh/director/resource_pool_updater.rb:32:in `block (3 levels) in create_missing_vms'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3138.0/lib/bosh/director/event_log.rb:97:in `call'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3138.0/lib/bosh/director/event_log.rb:97:in `advance_and_track'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3138.0/lib/bosh/director/event_log.rb:50:in `track'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3138.0/lib/bosh/director/resource_pool_updater.rb:31:in `block (2 levels) in create_missing_vms'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.3138.0/lib/common/thread_pool.rb:77:in `call'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.3138.0/lib/common/thread_pool.rb:77:in `block (2 levels) in create_thread'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.3138.0/lib/common/thread_pool.rb:63:in `loop'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.3138.0/lib/common/thread_pool.rb:63:in `block in create_thread'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `call'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'

Re: Stemcell Upgrade 3008 to 3115 failing

Arunkumar Vinayagamoorthy
 

Typo in the thread..

I tried to upgrade to 3155 stemcell version..

Thanks,
Arun

Re: Stemcell Upgrade 3008 to 3115 failing

Arunkumar Vinayagamoorthy
 

Typo in the thread..

I tried to upgrade to 3155 stemcell version..

Thanks,
Arun

Stemcell Upgrade 3008 to 3115 failing

Arunkumar Vinayagamoorthy
 

Deployed cloudfoundry intially with the stem cell version 3008...

While trying to upgrade the release to 3155,

Followed below commands
=====================
$bosh upload stemcell bosh-stemcell-3155-openstack-kvm-ubuntu-trusty-go_agent-raw.tgz
>>>>> Stemcell uploaded and created.

Then,
$bosh -n deploy

its ending with below error..

Started updating job cloud_controller > cloud_controller/0. Failed: Action Failed get_task: Task ef79facb-3fbb-4b29-470b-ffbfdf860d3c result: Stopping Monitored Services: Stopping service cloud_controller_clock: Sending stop request to monit: Request failed, response: Response{ StatusCode: 503, Status: '503 Service Unavailable' } (00:05:22)

Error 450001: Action Failed get_task: Task ef79facb-3fbb-4b29-470b-ffbfdf860d3c result: Stopping Monitored Services: Stopping service cloud_controller_clock: Sending stop request to monit: Request failed, response: Response{ StatusCode: 503, Status: '503 Service Unavailable' }
=================================

Please guide me, what need to be done for this issue.. Fresh Install works fine with stemcell 3008 but we need to upgrade the stemcell version in customer environment.

Thanks in Advance

--- Arun

Re: Proposing a change for a NTP issue in stemcell

Tomoe Sugihara
 

Hi Marco,

Thanks for the comment. Response inline:

On Wed, Mar 9, 2016 at 7:20 PM, Voelz, Marco <marco.voelz(a)sap.com> wrote:

Dear Tomoe,

you say that one of the use-cases is when port 123 is blocked by e.g. a
company firewall. Speaking for us at SAP, blocking this port is
*intentional*.
I understand it is natural that companies block privileged port
*intentionally*.


We have internal NTP servers and want everybody deploying a VM in the
infrastructure actually use those servers.
I cannot speak for other companies here, but switching to a different port
instead should not be the solution here.
I agree that's an encouraged good practice. However, we had a cloud foundry
deployment where it had pointed to a public NTP server, e.g. 0.pool.ntp.org,
and we tried to change it to internal one because of the firewall issue,
but we couldn't propagate to all the VMs easily.

So, I thought using an non-privileged port would be a good change as
- chances of returning packets getting blocked is small , and yet
- it doesn't sacrifice security much because ntpdate is not a running
daemon listing on a well-known port, instead it would just use a port from
ephemeral port range during a lifetime of the command execution.


Best,
Tomoe









Warm regards
Marco

On 09/03/16 03:19, "Tomoe Sugihara" <tsugihara(a)pivotal.io> wrote:

A friendly ping.

Could we decide on go/no-go for this change?
https://github.com/cloudfoundry/bosh/pull/1130

Best,
Tomoe



On Fri, Feb 19, 2016 at 12:31 AM, Tomoe Sugihara <tsugihara(a)pivotal.io>
wrote:

Hi bosh team,

I'd like to start a discussion about a change
<https://github.com/cloudfoundry/bosh/pull/1130>in stemcell regarding a
NTP issues we have seen multiple times.
The symptom of the issue was that system clock of the VMs got out of sync
because they were unable to sync time with ntp server.

Currently in bosh stemcells, there is a crontab entry for root user to
run ntpdate every 15 minutes, which I *think* (I could be wrong as I'm
pretty new to bosh and CF) is coming from here:

https://github.com/cloudfoundry/bosh/tree/develop/stemcell_builder/stages/bosh_ntpdate

The root cause of the problem was that incoming packets (response) from
the NTP server were blocked as they are destined to port 123 because
firewalls didn't allow that.
Since port 123(<1024) is a privileged port, it is not surprising that
some firewalls would block those traffic. In fact, we have seen this
happening multiple times. And annoyingly, this problem is tricky and time
consuming to track down.

So, I have submitted a pull request to use -u option, which direct
ntpdate to use unprivileged port so returning packets wouldn't be blocked
by those firewalls:
https://github.com/cloudfoundry/bosh/pull/1130

I would argue that this change would reduce that risk with no risk
introduced, but again I'm new to this field and wanted to get feedback from
the community.

Comments appreciated and hopefully the patch would be merged to git rid
of the problem.

Best,
Tomoe


Full-bosh deploy Test fail(ntp-server sample)

mbosh <jin-su.moon@...>
 

I finish deploying full bosh(known as multi-vm bosh) with micro-bosh base on
openstack IaaS
but I met a error when I deploy ntp-server sample manifest..

anyone who know this situation ?
compilation vm created but few second later.. raise errors
I think Full-bosh didn't work well..
anyone who help me?

root(a)sjsj:/home/ubuntu/bosh/deployment# bosh deploy
Acting as user 'admin' on deployment 'ntp-server' on 'bosh'
Getting deployment properties from director...
Please review all changes carefully

Deploying
---------
Are you sure you want to deploy? (type 'yes' to continue): yes

Director task 7
Started unknown
Started unknown > Binding deployment. Done (00:00:00)

Started preparing deployment
Started preparing deployment > Binding releases. Done (00:00:00)
Started preparing deployment > Binding existing deployment. Done
(00:00:00)
Started preparing deployment > Binding resource pools. Done (00:00:00)
Started preparing deployment > Binding stemcells. Done (00:00:00)
Started preparing deployment > Binding templates. Done (00:00:00)
Started preparing deployment > Binding properties. Done (00:00:00)
Started preparing deployment > Binding unallocated VMs. Done (00:00:00)
Started preparing deployment > Binding instance networks. Done (00:00:00)

Started preparing package compilation > Finding packages to compile. Done
(00:00:00)

Started compiling packages >
ntp-4.2.8p2/543219fbdaf6ec6f8af2956016055f2fb100d782. Failed: Cannot update
settings for 'vm-20265395-185e-41f9-923e-d5e38ff8e5a2', got HTTP 401
(00:00:16)

Error 100: Cannot update settings for
'vm-20265395-185e-41f9-923e-d5e38ff8e5a2', got HTTP 401

Task 7 error

For a more detailed error report, run: bosh task 7 --debug






I, [2016-03-10T09:50:23.235380 #4195] INFO : Analyzing agents...
I, [2016-03-10T09:50:23.235718 #4195] INFO : Analyzed 0 agents, took
6.411e-05 seconds
I, [2016-03-10T09:50:23.906135 #4195] INFO : [ALERT] Alert @ 2016-03-10
09:50:23 UTC, severity 4: Begin update deployment for 'ntp-server' against
Director 'bc7c40bf-31ab-4bb9-ac66-341ba0c02a12'
W, [2016-03-10T09:50:23.906426 #4195] WARN : (Resurrector) event did not
have deployment, job and index: Alert @ 2016-03-10 09:50:23 UTC, severity 4:
Begin update deployment for 'ntp-server' against Director
'bc7c40bf-31ab-4bb9-ac66-341ba0c02a12'
I, [2016-03-10T09:50:39.165888 #4195] INFO : [ALERT] Alert @ 2016-03-10
09:50:39 UTC, severity 3: Error during update deployment for 'ntp-server'
against Director 'bc7c40bf-31ab-4bb9-ac66-341ba0c02a12':
#<Bosh::Clouds::VMCreationFailed: Cannot update settings for
'vm-07bc8978-902a-497b-8be8-9aa623748157', got HTTP 401>
W, [2016-03-10T09:50:39.166286 #4195] WARN : (Resurrector) event did not
have deployment, job and index: Alert @ 2016-03-10 09:50:39 UTC, severity 3:
Error during update deployment for 'ntp-server' against Director
'bc7c40bf-31ab-4bb9-ac66-341ba0c02a12': #<Bosh::Clouds::VMCreationFailed:
Cannot update settings for 'vm-07bc8978-902a-497b-8be8-9aa623748157', got
HTTP 401>



--
View this message in context: http://cf-bosh.70367.x6.nabble.com/Full-bosh-deploy-Test-fail-ntp-server-sample-tp1455.html
Sent from the CF BOSH mailing list archive at Nabble.com.

Re: Deploying Full bosh with micro-bosh

sridhar vennela
 

Perfect. Good Luck.

Re: Deploying Full bosh with micro-bosh

sridhar vennela
 

Hi,

You can find more details about errors on your microbosh VM. Please go to below directory and attach the logs.
/var/vcap/sys/log/director

Thank you,
Sridhar

Re: Deploying Full bosh with micro-bosh

mbosh <jin-su.moon@...>
 

Thank for your answer
I solved this issue by adding listen_address propertys in full bosh
manifest.yml
I miss out propertys on redis, postgres listen_address
I thick bosh set default listen_address 127.0.0.1
so director vm didn't allow to access to redis(25255) and postgres(5432)


---

properties:
nats:
address: 192.168.1.51
user: nats
password: nats

redis:
listen_address: 0.0.0.0
address: 192.168.1.52
password: redis
port: 25255

postgres: &bosh_db
listen_address: 0.0.0.0
host: 192.168.1.53
port: 5432
user: postgres
password: postgres
database: bosh
adapter: postgres

dns:
address: 14.63.202.132
db: *bosh_db
recursor: 8.8.8.8

blobstore:
address: 192.168.1.55
port: 25250
provider: dav
agent:
user: agent
password: agent
director:
user: admin
password: admin

director:
name: bosh
address: 192.168.1.56
db: *bosh_db
cpi_job: cpi
max_threads: 3

registry:
address: 192.168.1.57
host: 192.168.1.57
db: *bosh_db
http:
user: registry
password: registry
port: 25777
username: admin
password: admin

hm:
http:
user: admin
password: admin
director_account:
user: admin
password: admin
resurrector_enabled: true




--
View this message in context: http://cf-bosh.70367.x6.nabble.com/cf-bosh-Re-Deploying-Full-bosh-with-micro-bosh-tp1452p1453.html
Sent from the CF BOSH mailing list archive at Nabble.com.

Deploying Full bosh with micro-bosh

mbosh <jin-su.moon@...>
 

HI every one
Raise a question for the first time .. I need help..

I try to deploy full bosh(known as multi-vm bosh) with micro-bosh base on
openstack IaaS
but I met a error
anyone who know this situation ?
when bosh perform canary test, directer repeat starting and fail status
I attach bosh.yml
check my bosh manifest file ...

+------------------+----------+---------------+---------------+
| Job/index | State | Resource Pool | IPs |
+------------------+----------+---------------+---------------+
| blobstore/0 | running | common | 192.168.1.55 |
| director/0 | starting | common | 14.63.202.142 |
| | | | 192.168.1.56 |
| health_monitor/0 | running | common | 192.168.1.58 |
| nats/0 | running | common | 192.168.1.51 |
| postgres/0 | running | common | 192.168.1.53 |
| powerdns/0 | running | common | 14.63.202.132 |
| | | | 192.168.1.54 |
| redis/0 | running | common | 192.168.1.52 |
| registry/0 | running | common | 192.168.1.57 |
+------------------+----------+---------------+---------------+


---
name: bosh-openstack

director_uuid: 4048d97d-f435-495a-9a70-33ca4dcd7e0e

release:
name: bosh
version: latest

compilation:
workers: 2
network: private
reuse_compilation_vms: false
cloud_properties:
instance_type: m1.small
availability_zone: nova

update:
canaries: 1
canary_watch_time: 3000-120000
update_watch_time: 3000-120000
max_in_flight: 1

networks:
- name: private
type: manual
subnets:
- range: 192.168.1.0/24
gateway: 192.168.1.1
static:
- 192.168.1.11 - 192.168.1.99
reserved:
- 192.168.1.2 - 192.168.1.10
- 192.168.1.100 - 192.168.1.120
- 192.168.1.250 - 192.168.1.254
dns: [8.8.8.8]
cloud_properties: {net_id: db791934-bbba-4b75-bb20-219aa4b7430d}

- name: floating
type: vip

resource_pools:
- name: common
network: private
stemcell:
name: bosh-openstack-kvm-ubuntu-trusty-go_agent
version: latest
cloud_properties:
instance_type: m1.small
availability_zone: nova

disk_pools:
- name: disks
disk_size: 10_000

jobs:
- name: nats
template: nats
instances: 1
resource_pool: common
networks:
- name: private
default: [dns, gateway]
static_ips: [192.168.1.51]

- name: redis
template: redis
instances: 1
resource_pool: common
networks:
- name: private
default: [dns, gateway]
static_ips: [192.168.1.52]

- name: postgres
template: postgres
instances: 1
resource_pool: common
persistent_disk: 4096
networks:
- name: private
default: [dns, gateway]
static_ips: [192.168.1.53]

- name: powerdns
template: powerdns
instances: 1
resource_pool: common
networks:
- name: private
default: [dns, gateway]
static_ips: [192.168.1.54]
- name: floating
static_ips:
- 14.63.202.132

- name: blobstore
template: blobstore
instances: 1
resource_pool: common
persistent_disk: 4096
networks:
- name: private
default: [dns, gateway]
static_ips: [192.168.1.55]

- name: director
template: director
instances: 1
resource_pool: common
persistent_disk: 4096
networks:
- name: private
default: [dns, gateway]
static_ips: [192.168.1.56]
- name: floating
static_ips:
- 14.63.202.142

- name: registry
template: registry
instances: 1
resource_pool: common
networks:
- name: private
default: [dns, gateway]
static_ips: [192.168.1.57]

- name: health_monitor
template: health_monitor
instances: 1
resource_pool: common
networks:
- name: private
default: [dns, gateway]
static_ips: [192.168.1.58]

properties:
nats:
address: 192.168.1.51
user: nats
password: nats

redis:
address: 192.168.1.52
password: redis
port: 25255

postgres: &bosh_db
host: 192.168.1.53
port: 5432
user: postgres
password: postgres
database: bosh

dns:
address: 14.63.202.132
db: *bosh_db
recursor: 8.8.8.8

blobstore:
address: 192.168.1.55
agent:
user: agent
password: agent
director:
user: admin
password: admin

director:
name: bosh
address: 192.168.1.56
db: *bosh_db

registry:
address: 192.168.1.57
db: *bosh_db
http:
user: registry
password: registry

hm:
http:
user: admin
password: admin
director_account:
user: admin
password: admin
resurrector_enabled: true

ntp: &ntp [0.pool.ntp.org, 1.pool.ntp.org]

openstack:
auth_url: http://14.63.202.11:5000/v2.0/tokens
username: admin
api_key: pass
tenant: admin
default_security_groups: [manager]
default_key_name: joo



--
View this message in context: http://cf-bosh.70367.x6.nabble.com/Deploying-Full-bosh-with-micro-bosh-tp1451.html
Sent from the CF BOSH mailing list archive at Nabble.com.

Re: Proposing a change for a NTP issue in stemcell

Marco Voelz
 

Dear Tomoe,

you say that one of the use-cases is when port 123 is blocked by e.g. a company firewall. Speaking for us at SAP, blocking this port is *intentional*. We have internal NTP servers and want everybody deploying a VM in the infrastructure actually use those servers.

I cannot speak for other companies here, but switching to a different port instead should not be the solution here.

Warm regards
Marco

On 09/03/16 03:19, "Tomoe Sugihara" <tsugihara(a)pivotal.io<mailto:tsugihara(a)pivotal.io>> wrote:

A friendly ping.

Could we decide on go/no-go for this change?
https://github.com/cloudfoundry/bosh/pull/1130

Best,
Tomoe



On Fri, Feb 19, 2016 at 12:31 AM, Tomoe Sugihara <tsugihara(a)pivotal.io<mailto:tsugihara(a)pivotal.io>> wrote:
Hi bosh team,

I'd like to start a discussion about a change <https://github.com/cloudfoundry/bosh/pull/1130> in stemcell regarding a NTP issues we have seen multiple times.
The symptom of the issue was that system clock of the VMs got out of sync because they were unable to sync time with ntp server.

Currently in bosh stemcells, there is a crontab entry for root user to run ntpdate every 15 minutes, which I *think* (I could be wrong as I'm pretty new to bosh and CF) is coming from here:
https://github.com/cloudfoundry/bosh/tree/develop/stemcell_builder/stages/bosh_ntpdate

The root cause of the problem was that incoming packets (response) from the NTP server were blocked as they are destined to port 123 because firewalls didn't allow that.
Since port 123(<1024) is a privileged port, it is not surprising that some firewalls would block those traffic. In fact, we have seen this happening multiple times. And annoyingly, this problem is tricky and time consuming to track down.

So, I have submitted a pull request to use -u option, which direct ntpdate to use unprivileged port so returning packets wouldn't be blocked by those firewalls:
https://github.com/cloudfoundry/bosh/pull/1130

I would argue that this change would reduce that risk with no risk introduced, but again I'm new to this field and wanted to get feedback from the community.

Comments appreciated and hopefully the patch would be merged to git rid of the problem.

Best,
Tomoe

test

Laurent Wilfred
 

test

--
m.

Re: Proposing a change for a NTP issue in stemcell

Tomoe Sugihara
 

A friendly ping.

Could we decide on go/no-go for this change?
https://github.com/cloudfoundry/bosh/pull/1130

Best,
Tomoe



On Fri, Feb 19, 2016 at 12:31 AM, Tomoe Sugihara <tsugihara(a)pivotal.io>
wrote:

Hi bosh team,

I'd like to start a discussion about a change
<https://github.com/cloudfoundry/bosh/pull/1130>in stemcell regarding a
NTP issues we have seen multiple times.
The symptom of the issue was that system clock of the VMs got out of sync
because they were unable to sync time with ntp server.

Currently in bosh stemcells, there is a crontab entry for root user to run
ntpdate every 15 minutes, which I *think* (I could be wrong as I'm pretty
new to bosh and CF) is coming from here:

https://github.com/cloudfoundry/bosh/tree/develop/stemcell_builder/stages/bosh_ntpdate

The root cause of the problem was that incoming packets (response) from
the NTP server were blocked as they are destined to port 123 because
firewalls didn't allow that.
Since port 123(<1024) is a privileged port, it is not surprising that some
firewalls would block those traffic. In fact, we have seen this happening
multiple times. And annoyingly, this problem is tricky and time consuming
to track down.

So, I have submitted a pull request to use -u option, which direct ntpdate
to use unprivileged port so returning packets wouldn't be blocked by those
firewalls:
https://github.com/cloudfoundry/bosh/pull/1130

I would argue that this change would reduce that risk with no risk
introduced, but again I'm new to this field and wanted to get feedback from
the community.

Comments appreciated and hopefully the patch would be merged to git rid of
the problem.

Best,
Tomoe

Re: Consul not starting due to confab usage error "at least one "expected-member" must be provided"

Lenny Ilyashov
 

Thank you for the explanation, Amit. We re-enabled consul jobs in our manifest and the upgrade ran successfully.

Much appreciated.
Lenny

Re: Consul not starting due to confab usage error "at least one "expected-member" must be provided"

Amit Kumar Gupta
 

For v228, the flag is provided to confab like this:

https://github.com/cloudfoundry-incubator/consul-release/blob/0f6aeb83d33728828dd1661f663b6eedb8113e83/jobs/consul_agent/templates/agent_ctl.sh.erb#L125-L127

It expects you to have the "consul.agent.servers.lan" property in your
manifest, usually set as a global property, e.g.:

https://github.com/cloudfoundry/cf-release/blob/v228/spec/fixtures/aws/cf-manifest.yml#L969-L976

If you don't have that property, I assume you're not deploying a consul
server cluster with your CF deployment (it's currently part of the
cf-release, so you might have those jobs set to 0 instances in your
manifest, or not in your manifest at all). You will eventually want to
have consul servers, because newer features rely on it for internal service
discovery. I don't recall whether v228 will even work without it. If you
don't want to deploy consul, you should remove the "consul_agent" job from
all your other jobs.

Best,
Amit

On Tue, Mar 8, 2016 at 9:59 AM, Lenny Ilyashov <lenny.ilyashov(a)wwt.com>
wrote:

I am attempting to upgrade CloudFoundry from v226 to v228. Upgrade bombs
out when it gets to a role running consul_agent (i.e., uaa). From
consul_agent.stderr.log it appears that the way consul_agent is being
started has changed and a tool called confab is now being used to start
consul. Unfortunately, it appears it is requiring the "expected-member"
parameter which is not being passed to the startup script.

It appears this is how it is attempting to start consul_agent (from the
logs):

+ chpst -u vcap:vcap /var/vcap/packages/confab/bin/confab start
-server=false -agent-path=/var/vcap/packages/consul/bin/consul
-consul-config-dir=/var/vcap/jobs/consul_agent/config
-pid-file=/var/vcap/sys/run/consul_agent/consul_agent.pid
-recursor=10.220.1.33 -recursor=10.220.1.34 -recursor=10.220.142.35
-ssl-disabled --config-file /var/vcap/jobs/consul_agent/confab.json

If I run the command above and add '-expected-member=[the ip of my uaa
box]', consul starts successfully and monit reports everything healthy on
the uaa box.

Any idea if this is fixed in a future release or how to address this in
v228?

From the logs:

==> monit/consul_agent.err.log <==
+ case $1 in
+ start /var/vcap/packages/confab
+ local confab_package
+ confab_package=/var/vcap/packages/confab
+ pid_guard /var/vcap/sys/run/consul_agent/consul_agent.pid consul_agent
+ tee /dev/stderr
tail: monit/consul_agent.err.log: file truncated
++ basename /var/vcap/jobs/consul_agent/bin/agent_ctl
++ date
+ echo '------------ STARTING agent_ctl at Tue Mar 8 17:52:33 UTC 2016
--------------'

==> monit/consul_agent.out.log <==
------------ STARTING agent_ctl at Tue Mar 8 17:52:33 UTC 2016
--------------

==> monit/consul_agent.err.log <==
+ pidfile=/var/vcap/sys/run/consul_agent/consul_agent.pid
+ name=consul_agent
+ '[' -f /var/vcap/sys/run/consul_agent/consul_agent.pid ']'
+ mkdir -p /var/vcap/sys/log/consul_agent
+ chown -R vcap:vcap /var/vcap/sys/log/consul_agent
+ mkdir -p /var/vcap/sys/run/consul_agent
+ chown -R vcap:vcap /var/vcap/sys/run/consul_agent
+ mkdir -p /var/vcap/store/consul_agent
+ chown -R vcap:vcap /var/vcap/store/consul_agent
+ mkdir -p /var/vcap/jobs/consul_agent/config
+ chown -R vcap:vcap /var/vcap/jobs/consul_agent/config
+ ulimit -v unlimited
+ ulimit -n 4096
+ setup_resolvconf
+ local resolvconf_file
+ resolvconf_file=/etc/resolv.conf
+ resolvconf --updates-are-enabled
+ resolvconf_file=/etc/resolvconf/resolv.conf.d/head
+ grep -q 127.0.0.1 /etc/resolvconf/resolv.conf.d/head
+ set +e
+ resolvconf -u
+ set -e
+ local server
+ server=false
+ '[' false '!=' true ']'
+ rm -f /var/vcap/store/consul_agent/serf/local.keyring
+ setcap cap_net_bind_service=+ep /var/vcap/packages/consul/bin/consul
++ nproc
+ GOMAXPROCS=1
+ '[' 1 = 1 ']'
+ GOMAXPROCS=2
+ export GOMAXPROCS
+ local nameservers
+ nameservers=("$(cat /etc/resolv.conf | grep nameserver | awk '{print
$2}' | grep -v 127.0.0.1)")
++ grep -v 127.0.0.1
++ awk '{print $2}'
++ grep nameserver
++ cat /etc/resolv.conf
+ local recursors
+ recursors=
+ for nameserver in '${nameservers[@]}'
+ recursors=' -recursor=10.220.1.33'
+ for nameserver in '${nameservers[@]}'
+ recursors=' -recursor=10.220.1.33 -recursor=10.220.1.34'
+ for nameserver in '${nameservers[@]}'
+ recursors=' -recursor=10.220.1.33 -recursor=10.220.1.34
-recursor=10.220.142.35'
+ chpst -u vcap:vcap /var/vcap/packages/confab/bin/confab start
-server=false -agent-path=/var/vcap/packages/consul/bin/consul
-consul-config-dir=/var/vcap/jobs/consul_agent/config
-pid-file=/var/vcap/sys/run/consul_agent/consul_agent.pid
-recursor=10.220.1.33 -recursor=10.220.1.34 -recursor=10.220.142.35
-ssl-disabled --config-file /var/vcap/jobs/consul_agent/confab.json
++ logger -p user.error -t vcap.consul-agent
++ tee -a /var/vcap/sys/log/consul_agent/consul_agent.stderr.log

==> consul_agent/consul_agent.stderr.log <==
++ logger -p user.info -t vcap.consul-agent
++ tee -a /var/vcap/sys/log/consul_agent/consul_agent.stdout.log
at least one "expected-member" must be provided

usage: confab COMMAND OPTIONS

COMMAND: "start" or "stop"

OPTIONS:
-agent-path executable
path to the on-filesystem consul executable
-config-file file
specifies the config file
-consul-config-dir directory
path to consul configuration directory
-encryption-key key
key used to encrypt consul traffic, may be specified multiple
times (default [])
-expected-member list
address list of the expected members, may be specified multiple
times (default [])
-pid-file file
path to consul PID file
-recursor server
specifies the address of an upstream DNS server, may be specified
multiple times (default [])
-server
whether to start the agent in server mode
-ssl-disabled
whether to run the server without ssl
-sync-max-retries number
specifies the maximum number of sync retry attempts (default 60)