Date   

Re: `api_z1/0' is not running after update to CF v231

Wayne Ha <wayne.h.ha@...>
 

Zach,

Thanks for the hints. You are right, I am not using latest stemcell:

vagrant(a)agent-id-bosh-0:~$ bosh stemcells
+---------------------------------------------+---------+--------------------------------------+
| Name | Version | CID |
+---------------------------------------------+---------+--------------------------------------+
| bosh-warden-boshlite-ubuntu-trusty-go_agent | 389* | cb6ee28c-a703-4a7e-581b-b63be2302e3d |

I will try the stemcell you recommended to see if it helps.

Thanks,


Re: Increase CAB meeting to 1.5 or 2 hours?

Steven Benario
 

One of the things we've started going in the Runtime-specific meetings is
asking that individual team status updates are provided ahead of time, and
then we use the time in the meeting for Q&A only, instead of reading those
notes aloud.

I'm a big fan and recommend this model.

Cheers

On Sun, Feb 14, 2016 at 11:19 AM, Dr Nic Williams <drnicwilliams(a)gmail.com>
wrote:

Since we started the CAB calls in late 2013 the size of Cloud Foundry
community, the number of engineering teams, and the number of code bases
(both core & community) has grown.

There was a time when community projects might get to be introduced once
during a CAB meeting; and shared discussion of what was going on amongst
the community.

Recently, due to the hard limit of 1 hour - there is only just time for
core team projects to discuss their status.

With increased number of CF office locations, the tread will continue to
create more teams, which in turn will create more repositories & tracker
roadmaps, and will create more demand on the limited CAB call time slot.

I like the name "Community Advisory Board meeting". It seems like a good
goal for the meeting.

Could we increase the CAB meeting to 1.5 or 2 hours?

Or could the summaries by PMs be delivered before the meeting (I have the
feeling many of them are organized and have notes that they are reading).
This might give us more time to discuss the projects, and discuss
non-core/community projects and ambitions.

It might also allow time for the community to chat with itself and bond as
a community.

with valentine's love
Dr Nic


--
Dr Nic Williams
Stark & Wayne LLC - consultancy for Cloud Foundry users
http://drnicwilliams.com
http://starkandwayne.com
cell +1 (415) 860-2185
twitter @drnic


Re: `api_z1/0' is not running after update to CF v231

Zach Robinson
 

Wayne,

Can you verify that you are using the latest bosh-lite stemcell 3147? Older stemcells are known to have issues with consul which is what many of the CF components use for service discovery.

Latest bosh-lite stemcells can be found at http://bosh.io Just search for lite.

See this similar issue: https://github.com/cloudfoundry/cf-release/issues/919

-Zach


Re: `api_z1/0' is not running after update to CF v231

Amit Kumar Gupta
 

As of cf v231, CC has switched from using NFS to WebDav as the default
blobstore. There are more details in the release notes:
https://github.com/cloudfoundry/cf-release/releases/tag/v231. I don't know
off-hand how to debug the issue you're seeing, but I will reach out to some
folks with more knowledge of Cloud Controller.

Best,
Amit

On Mon, Mar 7, 2016 at 8:48 AM, Wayne Ha <wayne.h.ha(a)gmail.com> wrote:

Kayode,

I am using the default bosh-lite-v231.yml file and the instances for nfs
server is set to 0:

vagrant(a)agent-id-bosh-0:~$ egrep -i "name:.*nfs|instances"
bosh-lite-v231.yml.1603041454
etc...
- instances: 0
- instances: 0
- instances: 0
name: nfs_z1
- name: debian_nfs_server
- instances: 1
- instances: 1
- instances: 1
etc...

So it is not running.

Thanks,


Re: `api_z1/0' is not running after update to CF v231

Wayne Ha <wayne.h.ha@...>
 

Kayode,

I am using the default bosh-lite-v231.yml file and the instances for nfs server is set to 0:

vagrant(a)agent-id-bosh-0:~$ egrep -i "name:.*nfs|instances" bosh-lite-v231.yml.1603041454
etc...
- instances: 0
- instances: 0
- instances: 0
name: nfs_z1
- name: debian_nfs_server
- instances: 1
- instances: 1
- instances: 1
etc...

So it is not running.

Thanks,


Update Parallelization in Cloud Foundry

Omar Elazhary <omazhary@...>
 

Hello everyone,

I know it is possible to update and redeploy components in parallel in cloud foundry by setting the "serial" property in the deployment manifest to "false". However, is such a thing recommended? Are there particular job dependencies that I need to pay attention to?

Regards,
Omar


Re: New CF Service Broker "chaos-galago" - a chaos-monkey for your Cloud Foundry

Sam Bryant
 

For anyone interested we have also now added a smoke tests project for chaos-galago that can be used to monitor the service-broker. This can be found: https://github.com/FidelityInternational/chaos-galago-smoke-tests

Details are also on the README for chaos-galago.

Regards,
Sam


Reg the minimal-openstack yml files

Nithiyasri Gnanasekaran -X (ngnanase - TECH MAHINDRA LIM@Cisco) <ngnanase at cisco.com...>
 

Hi

We are trying to upgrade our deployment with the latest cloud-foundry, from 205 to 230 release, as per your advice.

We could see minimal-aws.yml available in the GIT repo. Can we have a similar one available for openstack environment, with which we can deploy the basic cloud foundry and do our custom changes on top of it

Parallely we are updating our stub to match the template yml files guided by the errors given by the generate_deployment_manifest script. Kindly let us know if this is the correct way to generate the manifest.


Regards
Nithiyasri


Re: `api_z1/0' is not running after update to CF v231

Paul Bakare
 

Wayne, is the nfs_server-partition running?

On Mon, Mar 7, 2016 at 1:43 AM, Wayne Ha <wayne.h.ha(a)gmail.com> wrote:

I checked the blobstore is running:

root(a)e83575d2-dfbf-4f7c-97ee-5112560fa137:/var/vcap/sys/log# monit summary
The Monit daemon 5.2.4 uptime: 4h 14m
Process 'consul_agent' running
Process 'metron_agent' running
Process 'blobstore_nginx' running
Process 'route_registrar' running
System 'system_e83575d2-dfbf-4f7c-97ee-5112560fa137' running

But there are thousands of errors saying DopplerForwarder: can't forward
message, loggregator client pool is empty:

root(a)e83575d2-dfbf-4f7c-97ee-5112560fa137:/var/vcap/sys/log# find . -name
"*.log" | xargs grep -i error | cut -c 73-500 | sort -u
,"process_id":246,"source":"metron","log_level":
"error","message":"DopplerForwarder: can't forward message","data":{
"error":"loggregator client pool is empty"},

"file":"/var/vcap/data/compile/metron_agent/loggregator/src/metron/writers/dopplerforwarder/doppler_forwarder.go",
"line":104,

"method":"metron/writers/dopplerforwarder.(*DopplerForwarder).networkWrite"}

Not sure what is wrong.


Re: `api_z1/0' is not running after update to CF v231

Wayne Ha <wayne.h.ha@...>
 

I checked the blobstore is running:

root(a)e83575d2-dfbf-4f7c-97ee-5112560fa137:/var/vcap/sys/log# monit summary
The Monit daemon 5.2.4 uptime: 4h 14m
Process 'consul_agent' running
Process 'metron_agent' running
Process 'blobstore_nginx' running
Process 'route_registrar' running
System 'system_e83575d2-dfbf-4f7c-97ee-5112560fa137' running

But there are thousands of errors saying DopplerForwarder: can't forward message, loggregator client pool is empty:

root(a)e83575d2-dfbf-4f7c-97ee-5112560fa137:/var/vcap/sys/log# find . -name "*.log" | xargs grep -i error | cut -c 73-500 | sort -u
,"process_id":246,"source":"metron","log_level":
"error","message":"DopplerForwarder: can't forward message","data":{
"error":"loggregator client pool is empty"},
"file":"/var/vcap/data/compile/metron_agent/loggregator/src/metron/writers/dopplerforwarder/doppler_forwarder.go",
"line":104,
"method":"metron/writers/dopplerforwarder.(*DopplerForwarder).networkWrite"}

Not sure what is wrong.


Re: `api_z1/0' is not running after update to CF v231

Wayne Ha <wayne.h.ha@...>
 

Amit,

Thanks for letting me know I might have looked at the wrong log files. I
saw the following in cloud_controller log files:

root(a)7a1f2221-c31a-494b-b16c-d4a97c16c9ab:/var/vcap/sys/log# tail
./cloud_controller_ng_ctl.log
[2016-03-06 22:40:28+0000] ------------ STARTING cloud_controller_ng_ctl at
Sun Mar 6 22:40:28 UTC 2016 --------------
[2016-03-06 22:40:28+0000] Checking for blobstore availability
[2016-03-06 22:41:03+0000] Blobstore is not available

root(a)7a1f2221-c31a-494b-b16c-d4a97c16c9ab:/var/vcap/sys/log# tail
./cloud_controller_worker_ctl.log
[2016-03-06 22:41:13+0000] Killing
/var/vcap/sys/run/cloud_controller_ng/cloud_controller_worker_2.pid: 12145
[2016-03-06 22:41:13+0000] .Stopped
[2016-03-06 22:41:36+0000] Blobstore is not available
[2016-03-06 22:41:48+0000] ------------ STARTING
cloud_controller_worker_ctl at Sun Mar 6 22:41:48 UTC 2016 --------------
[2016-03-06 22:41:48+0000] Checking for blobstore availability
[2016-03-06 22:41:48+0000] Removing stale pidfile...

So maybe the cause is Blobstore is not available?

Thanks,

On Sun, Mar 6, 2016 at 1:15 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

The log lines saying "/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock
is not found" is probably just a symptom of the problem, not the root
cause. You're probably seeing those in the nginx logs? Cloud Controller
is failing to start, hence it is not establishing a connection on the
socket. You need to dig deeper into failures in logs in
/var/vcap/sys/log/cloud_controller_ng.

On Sun, Mar 6, 2016 at 10:00 AM, sridhar vennela <
sridhar.vennela(a)gmail.com> wrote:

Hi Wayne,

Looks like it, It is trying to connect to loggregator and failing I guess.


https://github.com/cloudfoundry/cloud_controller_ng/blob/master/app/controllers/runtime/syslog_drain_urls_controller.rb

Thank you,
Sridhar


Re: app auto-scaling in OSS CF contribution

Padmashree B
 

Hi,

Is the solution same as the one offered in IBM Bluemix?
Where can I find more information on IBM's solution [open-Autoscaler], current/planned features, their roadmap, timeline etc. ?

Kind Regards,
Padma


Re: `api_z1/0' is not running after update to CF v231

Amit Kumar Gupta
 

The log lines saying
"/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock
is not found" is probably just a symptom of the problem, not the root
cause. You're probably seeing those in the nginx logs? Cloud Controller
is failing to start, hence it is not establishing a connection on the
socket. You need to dig deeper into failures in logs in
/var/vcap/sys/log/cloud_controller_ng.

On Sun, Mar 6, 2016 at 10:00 AM, sridhar vennela <sridhar.vennela(a)gmail.com>
wrote:

Hi Wayne,

Looks like it, It is trying to connect to loggregator and failing I guess.


https://github.com/cloudfoundry/cloud_controller_ng/blob/master/app/controllers/runtime/syslog_drain_urls_controller.rb

Thank you,
Sridhar


Re: `api_z1/0' is not running after update to CF v231

sridhar vennela
 

Hi Wayne,

Looks like it, It is trying to connect to loggregator and failing I guess.

https://github.com/cloudfoundry/cloud_controller_ng/blob/master/app/controllers/runtime/syslog_drain_urls_controller.rb

Thank you,
Sridhar


Re: monit definitions

Benjamin Gandon
 

Hi,
I'm no expert but “monit” is a component of BOSH, not Cloud Foundry.
Your question would get answered if asked on the “bosh-dev” mailing-list.
Cheers

Le 18 févr. 2016 à 11:19, Nitta, Minoru <minoru.nitta(a)jp.fujitsu.com> a écrit :

Hi guys,

I know monit executes connection testing to some processes in CloudFoundry VM
by issuing HTTP requests.
e.g. (in UAA case)
if failed port <%= p('uaa.port') %> protocol http
request "/healthz"
with timeout 60 seconds for 10 cycles
then restart

I am wondering how timeout and cycles configuration values are decided. I mean, are there
any policies or guidelines to set these values? It seems that different values are set to
each process so I guessed there might be any policies in CloudFoundry.

Regards,
Minoru Nitta


Re: `api_z1/0' is not running after update to CF v231

Wayne Ha <wayne.h.ha@...>
 

Since it is complaining /var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock is not found, I thought I would just touch that file. Now I get:

2016/03/06 17:14:11 [error] 18497#0: *5 connect() to unix:/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock failed (111: Connection refused) while connecting to upstream, client: <bosh director>,
server: _, request: "GET /v2/syslog_drain_urls?batch_size=1000 HTTP/1.1", upstream: "http://unix:/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock:/v2/syslog_drain_urls?batch_size=1000", host: "api.bosh-lite.com"

Maybe there is network configuration problem in my environment?


Re: `api_z1/0' is not running after update to CF v231

Wayne Ha <wayne.h.ha@...>
 

Sridhar,

Thanks for your response. I have tried your suggestion and it doesn't
help. But I might have misled you with the consul error. That error only
got logged once at the beginning. So like you said, maybe VM was not able
to join consul server before it came up. But after that, the following
error keeps logging every minute or so:

2016/03/06 17:04:41 [crit] 11480#0: *4 connect() to
unix:/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock failed (2:
No such file or directory) while connecting to upstream,
server: _, request: "GET /v2/syslog_drain_urls?batch_size=1000 HTTP/1.1",
upstream: "http://unix:/var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock:/v2/syslog_drain_urls?batch_size=1000",
host: "api.bosh-lite.com"

So maybe the above is the cause of the problem?

Thanks,

On Sun, Mar 6, 2016 at 12:51 AM, sridhar vennela <sridhar.vennela(a)gmail.com>
wrote:

Hi Wayne,

Somehow VM is not able to join consul server. You can try below steps.

ps -ef | grep consul

kill consul-serverpid

monit restart <consul-job>

Thank you,
Sridhar


Re: `api_z1/0' is not running after update to CF v231

sridhar vennela
 

Hi Wayne,

Somehow VM is not able to join consul server. You can try below steps.

ps -ef | grep consul

kill consul-serverpid

monit restart <consul-job>

Thank you,
Sridhar


Re: `api_z1/0' is not running after update to CF v231

Wayne Ha <wayne.h.ha@...>
 

Sridhar,

Thanks for your response. I found the VM is listening to port 8500:

root(a)c6822dcb-fb02-4858-ae5d-3ab45d593896:/var/vcap/sys/log# netstat -anp |
grep LISTEN
tcp 0 0 127.0.0.1:8400 0.0.0.0:*
LISTEN 18162/consul
tcp 0 0 127.0.0.1:8500 0.0.0.0:*
LISTEN 18162/consul
tcp 0 0 127.0.0.1:53 0.0.0.0:*
LISTEN 18162/consul
tcp 0 0 127.0.0.1:2822 0.0.0.0:*
LISTEN 72/monit
tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 31/sshd
tcp 0 0 10.244.0.138:8301 0.0.0.0:*
LISTEN 18162/consul

If I run "monit stop all" then it only listens to the following:

root(a)c6822dcb-fb02-4858-ae5d-3ab45d593896:/var/vcap/sys/log# netstat -anp |
grep LISTEN
tcp 0 0 127.0.0.1:2822 0.0.0.0:*
LISTEN 72/monit
tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 31/sshd

Note that 10.244.0.138 is the IP of this VM.

Thanks,

On Sat, Mar 5, 2016 at 12:58 AM, sridhar vennela <sridhar.vennela(a)gmail.com>
wrote:

Hi Wayne,

Can you please verify port 8500 listening? Maybe output of netstat -anp
will help.

{"timestamp":"1457136496.397377968","source":"confab","message":"confab.agent-client.verify-joined.members.request.failed","log_level":2,"data":{"error":"Get
http://127.0.0.1:8500/v1/agent/members: dial tcp 127.0.0.1:8500:
getsockopt: connection refused","wan":false}}

Thank you,
Sridhar


Re: User defined variable "key" validation doesn't happen at cf set-env phase

Nicholas Calugar
 

Hi Ponraj,

I don't think the CC can make any determination regarding the validity of
environment variables as the CC doesn't (and shouldn't) know how each
buildpack will use these environment variables.

Thanks,

Nick

On Thu, Mar 3, 2016 at 9:22 AM Ponraj E <ponraj.e(a)gmail.com> wrote:

Hi CF Colleagues,

I see various PaaS providers provide the UI for entering the User provided
variables' key and value, but they dont seem to validate the "key" at the
save ["set-env"] phase, the validation happens only at
the restage phase. This is also because CF does the same. Is there any
reason that CC doesnt validate keys of user defined environment varibles at
the cf set-env phase ?

Examples:
1.cf set-env spring-music !@#$$%% "foobar" succeeds, but the restage
fails throwing /bin/bash: line 6: export: `!@#49%%=foobar': not a valid
identifier
2.cf set-env spring-music "!@#$$%%" "foobar" succeeds, the restage also
succeeds but the same as above is thrown as message here.

P.S: The above variables are used for only testing purposes and these
would give errros at the runtime usage in the application is implicit here.


Regards,
Ponraj