Date   

Why hm9000 write message to /var/log/syslog

MaggieMeng
 

Hi,

We recently encountered a CF crash due to full disk issue of root file system of CC and HM9000. I found HM9000 wrote log message to /var/log/syslog. Does anybody know why? I think CF jobs should not use root file system since the stemcell only has 3G space.

CF version is 197.

Thanks,
Maggie


Re: dynamic networks

Vik R <vagcom.ben@...>
 

Openstack + bosh dynamic networks

For instance, cf-231/cf-233 + dynamic networks work on bosh releases up to
240 (or stemcell versions up to 3160)
The moment I upgrade the stemcell to 3163 and onwards, I am running into
this metron agent template filling issues.


Ben R

On Mon, Apr 11, 2016 at 3:06 PM, Release Integration <cfruntime(a)gmail.com>
wrote:

Can you clarify the question further? What do you mean by dynamic
networks? Can you describe what network topology you would like to see
Cloud Foundry deployed to?

Thanks,
Rob & Dennis
CF Release Integration
Pivotal


Re: [cf-bosh] Re: Re: [Metron Agent] failed to generate job templates with metron agent on top of OpenStack Dynamic network

Amit Kumar Gupta
 

This will not work with dynamic networks. Many jobs in cf-release rely on
data from BOSH to determine their IP so that configuration files can be
rendered up-front by the director rather than at runtime, requiring system
calls to determine IP. metron_agent is one such job, and it tends to be
colocated with each other job (it is what allows all system component logs
to be aggregated through the loggregator system), so this would require all
Cloud Foundry VMs to be on a manual network. You don't need to manually
pick the IPs, you just need to tell BOSH which IPs in the network not to
use and specify these in the "reserved" range.

Since so many different components depend on being able to determine their
IP via BOSH data, there's no quick workaround if you want to stick to using
dynamic networks, but we're aware of this current limitation.

Best,
Amit

On Mon, Apr 11, 2016 at 7:23 PM, Yitao Jiang <jiangyt.cn(a)gmail.com> wrote:

Is it a bug of CF or Bosh ?

On Fri, Apr 8, 2016 at 12:08 PM, Ben R <vagcom.ben(a)gmail.com> wrote:

I have the same issue. It has to do with every release since bosh 248.
However, dynamic networks with older bosh releases + cf-231/cf-231 work.

This must be a bug.

Ben R


On Thu, Apr 7, 2016 at 8:55 PM, Yitao Jiang <jiangyt.cn(a)gmail.com> wrote:

Hi,guys

When deploy CF on top of OpenStack with dynamic network, the jobs failed
with metron-agent
Error filling in template 'syslog_forwarder.conf.erb' (line 44:
undefined method `strip' for nil:NilClass)

here's related logs

​D​
etecting deployment changes
----------------------------
Releases
cf
version type changed: String -> Fixnum
- 233
+ 233

Compilation
No changes

Update
± canaries:
- 1
+ 0

Resource pools
No changes

Disk pools
No changes

Networks
dynamic-net
+ name: dynamic-net
subnets
10.0.0.0/24
cloud_properties
+ net_id: 0700ae03-4b38-464e-b40d-0a9c8dd18ff0
+ security_groups: ["Test OS SG_20160128T070152Z"]
+ dns: ["114.114.114.114", "8.8.8.8"]

+ range: 10.0.0.0/24
+ name: Test OS Sub Internal Network_20160128T070152Z

+ type: dynamic


Jobs
stats_z1

± networks:

- {"name"=>"cf1"}

+ {"name"=>"dynamic-net"}


Properties
No changes


Meta
No changes


Please review all changes carefully

Deploying
---------

Are you sure you want to deploy? (type 'yes' to continue): yes


Director task 57
Started preparing deployment > Preparing deployment. Done (00:00:03)


Error 100: Unable to render instance groups for deployment. Errors are:
- Unable to render jobs for instance group 'stats_z1'. Errors are:

- Unable to render templates for job 'metron_agent'. Errors are:

- Error filling in template 'syslog_forwarder.conf.erb' (line
44: undefined method `strip' for nil:NilClass)

Task 57 error

For a more detailed error report, run: bosh task 57 --debug
as the ip manged by OpenStack, bosh cannot get the actual ip address of
each vm until vm alive, this lead to the generated job spec doesn't contain
ip address infos
so, must i have to configure network type to manual?​

snippets of deployment yml

1001 - name: dynamic-net
1002 subnets:
1003 - cloud_properties:
1004 net_id: 0700ae03-4b38-464e-b40d-0a9c8dd18ff0
1005 security_groups:
1006 - Test OS SG_20160128T070152Z
1007 dns:
1010 - 114.114.114.114
1011 - 8.8.8.8
1012 range: 10.0.0.0/24
1013 name: Test OS Sub Internal Network_20160128T070152Z
1014 type: dynamic

​Rendered job spec

{"deployment"=>"staging-01", "job"=

{"name"=>"stats_z1", "templates"=>[{"name"=>"collector",
"version"=>"6c210292f18d129e9a037fe7053836db2d494344",
"sha1"=>"38927f47b15c2daf6c8a2e7c760e73e5ff90
dfd4", "blobstore_id"=>"23531029-0ee1-4267-8863-b5f931afaecb"},
{"name"=>"metron_agent",
"version"=>"2b80a211127fc642fc8bb0d14d7eb30c37730db3", "sha1"=>"150f2
7445c2ef960951c1f26606525d41ec629b2",
"blobstore_id"=>"e87174dc-f3f7-4768-94cd-74f299813528"}],
"template"=>"collector", "version"=>"6c210292f18d129e9a037fe70
53836db2d494344", "sha1"=>"38927f47b15c2daf6c8a2e7c760e73e5ff90dfd4",
"blobstore_id"=>"23531029-0ee1-4267-8863-b5f931afaecb"}, "index"=>0,
"bootstrap"=>true,
"name"=>"stats_z1", "id"=>"99f349d0-fb5d-4de7-9912-3de5559d2f19",
"az"=>nil,

*"networks"=>{"dynamic-net"=>{"type"=>"dynamic",
"cloud_properties"=>{"net_id"=>"0700ae03-4b38-464e-b40d-0a9c8dd18ff0",
"security_groups"=>["Test OS SG_20160128T070152Z"]},
"dns"=>["114.114.114.114", "8.8.8.8", "10.0.0.13"], "default"=>["dns",
"gateway"],
"dns_record_name"=>"0.stats-z1.dynamic-net.staging-01.microbosh"}}*,
"properties"=>{"collector"=>{"aws"=>{
"access_key_id"=>nil, "secret_access_key"=>nil},
"datadog"=>{"api_key"=>nil, "application_key"=>nil},
"deployment_name"=>nil, "logging_level"=>"info", "interv
als"=>{"discover"=>60, "healthz"=>30, "local_metrics"=>30,
"nats_ping"=>30, "prune"=>300, "varz"=>30}, "use_aws_cloudwatch"=>false,
"use_datadog"=>false, "use
_tsdb"=>false, "opentsdb"=>{"address"=>nil, "port"=>nil},
"use_graphite"=>false, "graphite"=>{"address"=>nil, "port"=>nil},
"memory_threshold"=>800}, "nats"=>
{"machines"=>["10.0.0.127"], "password"=>"NATS_PASSWORD", "port"=>4222,
"user"=>"NATS_USER"}, "syslog_daemon_config"=>{"address"=>nil, "port"=>nil,
"transport
"=>"tcp", "fallback_addresses"=>[], "custom_rule"=>"",
"max_message_size"=>"4k"},
"metron_agent"=>{"dropsonde_incoming_port"=>3457, "preferred_protocol"=>"udp
", "tls"=>{"client_cert"=>"", "client_key"=>""}, "debug"=>false,
"zone"=>"z1", "deployment"=>"ya-staging-01",
"tcp"=>{"batching_buffer_bytes"=>10240, "batchin
g_buffer_flush_interval_milliseconds"=>100},
"logrotate"=>{"freq_min"=>5, "rotate"=>7, "size"=>"50M"},
"buffer_size"=>10000, "enable_buffer"=>false}, "metron_
endpoint"=>{"shared_secret"=>"LOGGREGATOR_ENDPOINT_SHARED_SECRET"},
"loggregator"=>{"tls"=>{"ca_cert"=>""}, "dropsonde_incoming_port"=>3457,
"etcd"=>{"machine
s"=>["10.0.0.133"], "maxconcurrentrequests"=>10}}},
"dns_domain_name"=>"microbosh", "links"=>{},
"address"=>"99f349d0-fb5d-4de7-9912-3de5559d2f19.stats-z1.dyn
amic-net.ya-staging-01.microbosh", "persistent_disk"=>0,
"resource_pool"=>"small_z1"}​

--

Regards,

Yitao

--

Regards,

Yitao


Re: dynamic networks

James Myers
 

I believe he is referring to bosh's dynamic networks:
https://bosh.io/docs/networks.html#dynamic. Per another thread there seems
to be an issue with metron_agent templates and dynamic networks in recent
cf releases.

Best,

Jim

On Mon, Apr 11, 2016 at 3:06 PM, Release Integration <cfruntime(a)gmail.com>
wrote:

Can you clarify the question further? What do you mean by dynamic
networks? Can you describe what network topology you would like to see
Cloud Foundry deployed to?

Thanks,
Rob & Dennis
CF Release Integration
Pivotal


Re: Request for Multibuildpack Use Cases

Mike Youngstrom
 

Ah, I think that helps a lot. So to restate this would be a feature of the
buildpacks themselves and the current buildpack contract (perhaps
triggering off of detect or something) not a generic feature of the cloud
controller feature would allows you to apply and order multiple buildpacks
against a single app, correct?

Thanks,
Mike

On Mon, Apr 11, 2016 at 8:12 PM, Danny Rosen <danny.rosen(a)gmail.com> wrote:

The Cloud Foundry Buildpacks project does not contain an "oracle-library"
buildpack.

For as list of Cloud Foundry buildpacks please see the "System Buildpacks"
section of the official Cloud Foundry documentation [1].

If your use case includes multiple buildpacks from this list, we would be
interested in hearing about it as we recognize that this type of
functionality may be useful to those in the community.

I apologize if that was not clear in my original post.

[1] - http://docs.cloudfoundry.org/buildpacks/
On Apr 11, 2016 6:57 PM, "Mike Youngstrom" <youngm(a)gmail.com> wrote:

Although, correct me if I'm wrong, this feature could be used as a
solution for some of those cases. For example, if my application depends
upon an oracle-service my node application will require the oracle binary
libraries to be configured in my environment before the nodejs buildpack
runs. If I create an "oracle-library" buildpack that installs the oracle
libraries in my staging container before npm install then this is
potentially another way to cover some of the use cases of the other
extension points right?

Mike

On Mon, Apr 11, 2016 at 3:38 PM, JT Archie <jarchie(a)pivotal.io> wrote:

Mike, this is so buildpacks can be composed together.

For example, if a user want to use NodeJS to compile Javascript assets
for a Java app (yes, there always Maven support, but some have different
work flows).

Multi-buildpack doesn't alleviate forking of buildpacks. In my opinion,
the above proposals for extensions to the buildpack app life cycle
are orthogonal.


On Mon, Apr 11, 2016 at 10:44 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

This seems to be yet another way to extend buildpacks with out forking
to go along with [0] and [1]. My only hope is that all these newly
proposed extension mechanisms come together in a simple, coherent, and
extensible way.

Mike

[0]
https://github.com/cloudfoundry-incubator/buildpack_app_lifecycle/pull/13
[1]
https://docs.google.com/document/d/145aOpNoq7BpuB3VOzUIDh-HBx0l3v4NHLYfW8xt2zK0/edit#



On Sun, Apr 10, 2016 at 6:15 PM, Danny Rosen <drosen(a)pivotal.io> wrote:

Hi there,

The CF Buildpacks team is considering taking on a line of work to
provide more formal support for multibuildpacks. Before we start, we would
be interested in learning if any community users have compelling use cases
they could share with us.

For more information on multibuildpacks, see Heroku's documentation [1]

[1] -
https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app


Re: [Metron Agent] failed to generate job templates with metron agent on top of OpenStack Dynamic network

Yitao Jiang
 

Is it a bug of CF or Bosh ?

On Fri, Apr 8, 2016 at 12:08 PM, Ben R <vagcom.ben(a)gmail.com> wrote:

I have the same issue. It has to do with every release since bosh 248.
However, dynamic networks with older bosh releases + cf-231/cf-231 work.

This must be a bug.

Ben R


On Thu, Apr 7, 2016 at 8:55 PM, Yitao Jiang <jiangyt.cn(a)gmail.com> wrote:

Hi,guys

When deploy CF on top of OpenStack with dynamic network, the jobs failed
with metron-agent
Error filling in template 'syslog_forwarder.conf.erb' (line 44: undefined
method `strip' for nil:NilClass)

here's related logs

​D​
etecting deployment changes
----------------------------
Releases
cf
version type changed: String -> Fixnum
- 233
+ 233

Compilation
No changes

Update
± canaries:
- 1
+ 0

Resource pools
No changes

Disk pools
No changes

Networks
dynamic-net
+ name: dynamic-net
subnets
10.0.0.0/24
cloud_properties
+ net_id: 0700ae03-4b38-464e-b40d-0a9c8dd18ff0
+ security_groups: ["Test OS SG_20160128T070152Z"]
+ dns: ["114.114.114.114", "8.8.8.8"]

+ range: 10.0.0.0/24
+ name: Test OS Sub Internal Network_20160128T070152Z

+ type: dynamic


Jobs
stats_z1

± networks:

- {"name"=>"cf1"}

+ {"name"=>"dynamic-net"}


Properties
No changes


Meta
No changes


Please review all changes carefully

Deploying
---------

Are you sure you want to deploy? (type 'yes' to continue): yes


Director task 57
Started preparing deployment > Preparing deployment. Done (00:00:03)


Error 100: Unable to render instance groups for deployment. Errors are:
- Unable to render jobs for instance group 'stats_z1'. Errors are:

- Unable to render templates for job 'metron_agent'. Errors are:

- Error filling in template 'syslog_forwarder.conf.erb' (line 44:
undefined method `strip' for nil:NilClass)

Task 57 error

For a more detailed error report, run: bosh task 57 --debug
as the ip manged by OpenStack, bosh cannot get the actual ip address of
each vm until vm alive, this lead to the generated job spec doesn't contain
ip address infos
so, must i have to configure network type to manual?​

snippets of deployment yml

1001 - name: dynamic-net
1002 subnets:
1003 - cloud_properties:
1004 net_id: 0700ae03-4b38-464e-b40d-0a9c8dd18ff0
1005 security_groups:
1006 - Test OS SG_20160128T070152Z
1007 dns:
1010 - 114.114.114.114
1011 - 8.8.8.8
1012 range: 10.0.0.0/24
1013 name: Test OS Sub Internal Network_20160128T070152Z
1014 type: dynamic

​Rendered job spec

{"deployment"=>"staging-01", "job"=

{"name"=>"stats_z1", "templates"=>[{"name"=>"collector",
"version"=>"6c210292f18d129e9a037fe7053836db2d494344",
"sha1"=>"38927f47b15c2daf6c8a2e7c760e73e5ff90
dfd4", "blobstore_id"=>"23531029-0ee1-4267-8863-b5f931afaecb"},
{"name"=>"metron_agent",
"version"=>"2b80a211127fc642fc8bb0d14d7eb30c37730db3", "sha1"=>"150f2
7445c2ef960951c1f26606525d41ec629b2",
"blobstore_id"=>"e87174dc-f3f7-4768-94cd-74f299813528"}],
"template"=>"collector", "version"=>"6c210292f18d129e9a037fe70
53836db2d494344", "sha1"=>"38927f47b15c2daf6c8a2e7c760e73e5ff90dfd4",
"blobstore_id"=>"23531029-0ee1-4267-8863-b5f931afaecb"}, "index"=>0,
"bootstrap"=>true,
"name"=>"stats_z1", "id"=>"99f349d0-fb5d-4de7-9912-3de5559d2f19",
"az"=>nil,

*"networks"=>{"dynamic-net"=>{"type"=>"dynamic",
"cloud_properties"=>{"net_id"=>"0700ae03-4b38-464e-b40d-0a9c8dd18ff0",
"security_groups"=>["Test OS SG_20160128T070152Z"]},
"dns"=>["114.114.114.114", "8.8.8.8", "10.0.0.13"], "default"=>["dns",
"gateway"],
"dns_record_name"=>"0.stats-z1.dynamic-net.staging-01.microbosh"}}*,
"properties"=>{"collector"=>{"aws"=>{
"access_key_id"=>nil, "secret_access_key"=>nil},
"datadog"=>{"api_key"=>nil, "application_key"=>nil},
"deployment_name"=>nil, "logging_level"=>"info", "interv
als"=>{"discover"=>60, "healthz"=>30, "local_metrics"=>30,
"nats_ping"=>30, "prune"=>300, "varz"=>30}, "use_aws_cloudwatch"=>false,
"use_datadog"=>false, "use
_tsdb"=>false, "opentsdb"=>{"address"=>nil, "port"=>nil},
"use_graphite"=>false, "graphite"=>{"address"=>nil, "port"=>nil},
"memory_threshold"=>800}, "nats"=>
{"machines"=>["10.0.0.127"], "password"=>"NATS_PASSWORD", "port"=>4222,
"user"=>"NATS_USER"}, "syslog_daemon_config"=>{"address"=>nil, "port"=>nil,
"transport
"=>"tcp", "fallback_addresses"=>[], "custom_rule"=>"",
"max_message_size"=>"4k"},
"metron_agent"=>{"dropsonde_incoming_port"=>3457, "preferred_protocol"=>"udp
", "tls"=>{"client_cert"=>"", "client_key"=>""}, "debug"=>false,
"zone"=>"z1", "deployment"=>"ya-staging-01",
"tcp"=>{"batching_buffer_bytes"=>10240, "batchin
g_buffer_flush_interval_milliseconds"=>100},
"logrotate"=>{"freq_min"=>5, "rotate"=>7, "size"=>"50M"},
"buffer_size"=>10000, "enable_buffer"=>false}, "metron_
endpoint"=>{"shared_secret"=>"LOGGREGATOR_ENDPOINT_SHARED_SECRET"},
"loggregator"=>{"tls"=>{"ca_cert"=>""}, "dropsonde_incoming_port"=>3457,
"etcd"=>{"machine
s"=>["10.0.0.133"], "maxconcurrentrequests"=>10}}},
"dns_domain_name"=>"microbosh", "links"=>{},
"address"=>"99f349d0-fb5d-4de7-9912-3de5559d2f19.stats-z1.dyn
amic-net.ya-staging-01.microbosh", "persistent_disk"=>0,
"resource_pool"=>"small_z1"}​

--

Regards,

Yitao
--

Regards,

Yitao


Re: Request for Multibuildpack Use Cases

Danny Rosen
 

The Cloud Foundry Buildpacks project does not contain an "oracle-library"
buildpack.

For as list of Cloud Foundry buildpacks please see the "System Buildpacks"
section of the official Cloud Foundry documentation [1].

If your use case includes multiple buildpacks from this list, we would be
interested in hearing about it as we recognize that this type of
functionality may be useful to those in the community.

I apologize if that was not clear in my original post.

[1] - http://docs.cloudfoundry.org/buildpacks/

On Apr 11, 2016 6:57 PM, "Mike Youngstrom" <youngm(a)gmail.com> wrote:

Although, correct me if I'm wrong, this feature could be used as a
solution for some of those cases. For example, if my application depends
upon an oracle-service my node application will require the oracle binary
libraries to be configured in my environment before the nodejs buildpack
runs. If I create an "oracle-library" buildpack that installs the oracle
libraries in my staging container before npm install then this is
potentially another way to cover some of the use cases of the other
extension points right?

Mike

On Mon, Apr 11, 2016 at 3:38 PM, JT Archie <jarchie(a)pivotal.io> wrote:

Mike, this is so buildpacks can be composed together.

For example, if a user want to use NodeJS to compile Javascript assets
for a Java app (yes, there always Maven support, but some have different
work flows).

Multi-buildpack doesn't alleviate forking of buildpacks. In my opinion,
the above proposals for extensions to the buildpack app life cycle
are orthogonal.


On Mon, Apr 11, 2016 at 10:44 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

This seems to be yet another way to extend buildpacks with out forking
to go along with [0] and [1]. My only hope is that all these newly
proposed extension mechanisms come together in a simple, coherent, and
extensible way.

Mike

[0]
https://github.com/cloudfoundry-incubator/buildpack_app_lifecycle/pull/13
[1]
https://docs.google.com/document/d/145aOpNoq7BpuB3VOzUIDh-HBx0l3v4NHLYfW8xt2zK0/edit#



On Sun, Apr 10, 2016 at 6:15 PM, Danny Rosen <drosen(a)pivotal.io> wrote:

Hi there,

The CF Buildpacks team is considering taking on a line of work to
provide more formal support for multibuildpacks. Before we start, we would
be interested in learning if any community users have compelling use cases
they could share with us.

For more information on multibuildpacks, see Heroku's documentation [1]

[1] -
https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app


Re: Request for Multibuildpack Use Cases

Mike Youngstrom
 

Although, correct me if I'm wrong, this feature could be used as a solution
for some of those cases. For example, if my application depends upon an
oracle-service my node application will require the oracle binary libraries
to be configured in my environment before the nodejs buildpack runs. If I
create an "oracle-library" buildpack that installs the oracle libraries in
my staging container before npm install then this is potentially another
way to cover some of the use cases of the other extension points right?

Mike

On Mon, Apr 11, 2016 at 3:38 PM, JT Archie <jarchie(a)pivotal.io> wrote:

Mike, this is so buildpacks can be composed together.

For example, if a user want to use NodeJS to compile Javascript assets for
a Java app (yes, there always Maven support, but some have different work
flows).

Multi-buildpack doesn't alleviate forking of buildpacks. In my opinion,
the above proposals for extensions to the buildpack app life cycle
are orthogonal.


On Mon, Apr 11, 2016 at 10:44 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

This seems to be yet another way to extend buildpacks with out forking to
go along with [0] and [1]. My only hope is that all these newly proposed
extension mechanisms come together in a simple, coherent, and extensible
way.

Mike

[0]
https://github.com/cloudfoundry-incubator/buildpack_app_lifecycle/pull/13
[1]
https://docs.google.com/document/d/145aOpNoq7BpuB3VOzUIDh-HBx0l3v4NHLYfW8xt2zK0/edit#



On Sun, Apr 10, 2016 at 6:15 PM, Danny Rosen <drosen(a)pivotal.io> wrote:

Hi there,

The CF Buildpacks team is considering taking on a line of work to
provide more formal support for multibuildpacks. Before we start, we would
be interested in learning if any community users have compelling use cases
they could share with us.

For more information on multibuildpacks, see Heroku's documentation [1]

[1] -
https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app


Re: dynamic networks

CF Runtime
 

Can you clarify the question further? What do you mean by dynamic networks? Can you describe what network topology you would like to see Cloud Foundry deployed to?

Thanks,
Rob & Dennis
CF Release Integration
Pivotal


Re: Request for Multibuildpack Use Cases

JT Archie <jarchie@...>
 

Mike, this is so buildpacks can be composed together.

For example, if a user want to use NodeJS to compile Javascript assets for
a Java app (yes, there always Maven support, but some have different work
flows).

Multi-buildpack doesn't alleviate forking of buildpacks. In my opinion, the
above proposals for extensions to the buildpack app life cycle
are orthogonal.

On Mon, Apr 11, 2016 at 10:44 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

This seems to be yet another way to extend buildpacks with out forking to
go along with [0] and [1]. My only hope is that all these newly proposed
extension mechanisms come together in a simple, coherent, and extensible
way.

Mike

[0]
https://github.com/cloudfoundry-incubator/buildpack_app_lifecycle/pull/13
[1]
https://docs.google.com/document/d/145aOpNoq7BpuB3VOzUIDh-HBx0l3v4NHLYfW8xt2zK0/edit#



On Sun, Apr 10, 2016 at 6:15 PM, Danny Rosen <drosen(a)pivotal.io> wrote:

Hi there,

The CF Buildpacks team is considering taking on a line of work to provide
more formal support for multibuildpacks. Before we start, we would be
interested in learning if any community users have compelling use cases
they could share with us.

For more information on multibuildpacks, see Heroku's documentation [1]

[1] -
https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app


Re: Remarks about the “confab” wrapper for consul

Amit Kumar Gupta
 

Orchestrating a raft cluster in a way that requires no manual intervention
is incredibly difficult. We write the PID file late for a specific reason:

https://www.pivotaltracker.com/story/show/112018069

For dealing with wedged states like the one you encountered, we have some
recommendations in the documentation:

https://github.com/cloudfoundry-incubator/consul-release/#disaster-recovery

We have acceptance tests we run in CI that exercise rolling a 3 node
cluster, so if you hit a failure it would be useful to get logs if you have
any.

Cheers,
Amit

On Mon, Apr 11, 2016 at 9:38 AM, Benjamin Gandon <benjamin(a)gandon.org>
wrote:

Actually, doing some further tests, I realize a mere 'join' is definitely
not enough.

Instead, you need to restore the raft/peers.json on each one of the 3
consul server nodes:

monit stop consul_agent
echo '["10.244.0.58:8300","10.244.2.54:8300","10.244.0.54:8300"]' >
/var/vcap/store/consul_agent/raft/peers.json


And make sure you start them quite at the same time with “monit start
consul_agent”

So this advocates a strongly for setting *skip_leave_on_interrupt=true*
and *leave_on_terminate=false* in confab, because loosing the peers.json
is really something we don't want in our CF deployments!

/Benjamin


Le 11 avr. 2016 à 18:15, Benjamin Gandon <benjamin(a)gandon.org> a écrit :

Hi cf devs,


I’m running a CF deployment with redundancy, and I just experienced my
consul servers not being able to elect any leader.
That’s a VERY frustrating situation that keeps the whole CF deployment
down, until you get a deeper understanding of consul, and figure out they
just need a silly manual 'join' so that they get back together.

But that was definitely not easy to nail down because at first look, I
could just see monit restarting the “agent_ctl” every 60 seconds because
confab was not writing the damn PID file.


More specifically, the 3 consul servers (i.e. consul_z1/0, consul_z1/1 and
consul_z2/0) had properly left oneanother uppon a graceful shutdown. This
state was persisted in /var/vcap/store/raft/peers.json being “null” on each
one of them, so they would not get back together on restart. A manual
'join' was necessary. But it took me hours to get there because I’m no
expert with consul.

And until the 'join' is made, VerifySynced() was negative in confab, and
monit was constantly starting and stopping it every 60 seconds. But once
you step back, you realize confab was actually waiting for the new leader
to be elected before it writes the PID file. Which is questionable.

So, I’m asking 3 questions here:

1. Does writing the PID file in confab *that* late really makes sense?
2. Could someone please write some minimal documentation about confab, at
least to tell what it is supposed to do?
3. Wouldn’t it be wiser that whenever any of the consul servers is not
here, then the cluster gets unhealthy?

With this 3rd question, I mean that even on a graceful TERM or INT, no
consul server should not perform any graceful 'leave'. With this different
approach, then they would properly be back up even when performing a
complete graceful restart of the cluster.

This can be done with those extra configs from the “confab” wrapper:

{
"skip_leave_on_interrupt": true,
"leave_on_terminate": false
}

What do you guys think of it?


/Benjamin



Re: cf stop not sending SIGTERM

Will Tran
 

Thanks! This was happening in JBP v3.3.1, I just tested with JBPv 3.6 and it's working.


Re: Doppler/Firehose - Multiline Log Entry

Mike Youngstrom
 

Finally got around to testing this. Preliminary testing show that "\u2028"
doesn't function as a new line character in bash and causes eclipse console
to wig out. I don't think "\u2028" is a viable long term solution. Hope
you make progress on a metric format available to an app in a container. I
too would like a tracker link to such a feature if there is one.

Thanks,
Mike

On Mon, Mar 14, 2016 at 2:28 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Hi Jim,

So, to be clear what we're basically doing is using unicode newline
character to fool loggregator (which is looking for \n) into thinking that
it isn't a new log event right? Does \u2028 work as a new line character
when tailing logs in the CLI? Anyone tried this unicode new line character
in various consoles? IDE, xterm, etc? I'm wondering if developers will
need to have different config for development.

Mike

On Mon, Mar 14, 2016 at 12:17 PM, Jim CF Campbell <jcampbell(a)pivotal.io>
wrote:

Hi Mike and Alex,

Two things - for Java, we are working toward defining an enhanced metric
format that will support transport of Multi Lines.

The second is this workaround that David Laing suggested for Logstash.
Think you could use it for Splunk?

With the Java Logback library you can do this by adding
"%replace(%xException){'\n','\u2028'}%nopex" to your logging config[1] ,
and then use the following logstash conf.[2]
Replace the unicode newline character \u2028 with \n, which Kibana will
display as a new line.

mutate {

gsub => [ "[@message]", '\u2028', "

"]
^^^ Seems that passing a string with an actual newline in it is the only
way to make gsub work

}

to replace the token with a regular newline again so it displays
"properly" in Kibana.

[1] github.com/dpin...ication.yml#L12
<https://github.com/dpinto-pivotal/cf-SpringBootTrader-config/blob/master/application.yml#L12>

[2] github.com/logs...se.conf#L60-L64
<https://github.com/logsearch/logsearch-for-cloudfoundry/blob/master/src/logsearch-config/src/logstash-filters/snippets/firehose.conf#L60-L64>


On Mon, Mar 14, 2016 at 11:11 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I'll let the Loggregator team respond formally. But, in my
conversations with the Loggregator team I think we're basically stuck not
sure what the right thing to do is on the client side. How does the client
trigger in loggregator that this is a multi line log message or what is the
right way for loggregator to detect that the client is trying to send a
multi line log message? Any ideas?

Mike

On Mon, Mar 14, 2016 at 10:25 AM, Aliaksandr Prysmakou <
prysmakou(a)gmail.com> wrote:

Hi guys,
Are there any updates about "Multiline Log Entry" issue? How correctly
deal with stacktraces?
Links to the tracker to read?
----
Alex Prysmakou / Altoros
Tel: (617) 841-2121 ext. 5161 | Toll free: 855-ALTOROS
Skype: aliaksandr.prysmakou
www.altoros.com | blog.altoros.com | twitter.com/altoros


--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963


[PROPOSAL]: Removing ability to specify npm version

John Shahid
 

Hi all,

The buildpacks team would like to propose a change to the nodejs buildpack.
It was recently brought to our attention in this issue
<https://github.com/cloudfoundry/nodejs-buildpack/issues/54>, that the
nodejs buildpack will try to download npm if the version specified in
package.json didn’t match the version shipped with nodejs. According to
heroku
<https://devcenter.heroku.com/articles/nodejs-support#specifying-an-npm-version>
this is a feature that exists for historical reasons that do not apply
anymore.

We would like to ask if anyone relies on this feature or have an objection
to this change.

Thanks,

The Buildpacks Team


Re: Staging and Runtime Hooks Feature Narrative

Mike Youngstrom
 

An interesting proposal. Any thoughts about this proposal in relation to
multi-buildpacks [0]? How many of the use cases for this feature go away
in lue of multi-buildpack support? I think it would be interesting to be
able to apply hooks without checking scripts into application (like
multi-bulidpack).

This feature also appears to be somewhat related to [1]. I hope that
someone is overseeing all these newly proposed buildpack features to help
ensure they are coherent.

Mike


[0]
https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/message/H64GGU6Z75CZDXNWC7CKUX64JNPARU6Y/
[1]
https://lists.cloudfoundry.org/archives/list/cf-dev(a)lists.cloudfoundry.org/thread/GRKFQ2UOQL7APRN6OTGET5HTOJZ7DHRQ/#SEA2RWDCAURSVPIMBXXJMWN7JYFQICL3

On Fri, Apr 8, 2016 at 4:16 PM, Troy Topnik <troy.topnik(a)hpe.com> wrote:

This feature allows developers more control of the staging and deployment
of their application code, without them having to fork existing buildpacks
or create their own.


https://docs.google.com/document/d/1PnTtTLwXOTG7f70ilWGlbTbi1LAXZu9zYnrUVvjr31I/edit

Hooks give developers the ability to optionally:
* run scripts in the staging container before and/or after the
bin/compile scripts executed by the buildpack, and
* run scripts in each app container before the app starts (via .profile
as per the Heroku buildpack API)

A similar feature has been available and used extensively in Stackato for
a few years, and we'd like to contribute this functionality back to Cloud
Foundry.

A proof-of-concept of this feature has already been submitted as a pull
request, and the Feature Narrative addresses many of the questions raised
in the PR discussion:

https://github.com/cloudfoundry-incubator/buildpack_app_lifecycle/pull/13

Please weigh in with comments in the document itself or in this thread.

Thanks,

TT


Re: Request for Multibuildpack Use Cases

Mike Youngstrom
 

This seems to be yet another way to extend buildpacks with out forking to
go along with [0] and [1]. My only hope is that all these newly proposed
extension mechanisms come together in a simple, coherent, and extensible
way.

Mike

[0]
https://github.com/cloudfoundry-incubator/buildpack_app_lifecycle/pull/13
[1]
https://docs.google.com/document/d/145aOpNoq7BpuB3VOzUIDh-HBx0l3v4NHLYfW8xt2zK0/edit#

On Sun, Apr 10, 2016 at 6:15 PM, Danny Rosen <drosen(a)pivotal.io> wrote:

Hi there,

The CF Buildpacks team is considering taking on a line of work to provide
more formal support for multibuildpacks. Before we start, we would be
interested in learning if any community users have compelling use cases
they could share with us.

For more information on multibuildpacks, see Heroku's documentation [1]

[1] -
https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app


Re: Remarks about the “confab” wrapper for consul

Benjamin Gandon
 

Actually, doing some further tests, I realize a mere 'join' is definitely not enough.

Instead, you need to restore the raft/peers.json on each one of the 3 consul server nodes:

monit stop consul_agent
echo '["10.244.0.58:8300","10.244.2.54:8300","10.244.0.54:8300"]' > /var/vcap/store/consul_agent/raft/peers.json

And make sure you start them quite at the same time with “monit start consul_agent”

So this advocates a strongly for setting skip_leave_on_interrupt=true and leave_on_terminate=false in confab, because loosing the peers.json is really something we don't want in our CF deployments!

/Benjamin

Le 11 avr. 2016 à 18:15, Benjamin Gandon <benjamin(a)gandon.org> a écrit :

Hi cf devs,


I’m running a CF deployment with redundancy, and I just experienced my consul servers not being able to elect any leader.
That’s a VERY frustrating situation that keeps the whole CF deployment down, until you get a deeper understanding of consul, and figure out they just need a silly manual 'join' so that they get back together.

But that was definitely not easy to nail down because at first look, I could just see monit restarting the “agent_ctl” every 60 seconds because confab was not writing the damn PID file.


More specifically, the 3 consul servers (i.e. consul_z1/0, consul_z1/1 and consul_z2/0) had properly left oneanother uppon a graceful shutdown. This state was persisted in /var/vcap/store/raft/peers.json being “null” on each one of them, so they would not get back together on restart. A manual 'join' was necessary. But it took me hours to get there because I’m no expert with consul.

And until the 'join' is made, VerifySynced() was negative in confab, and monit was constantly starting and stopping it every 60 seconds. But once you step back, you realize confab was actually waiting for the new leader to be elected before it writes the PID file. Which is questionable.

So, I’m asking 3 questions here:

1. Does writing the PID file in confab that late really makes sense?
2. Could someone please write some minimal documentation about confab, at least to tell what it is supposed to do?
3. Wouldn’t it be wiser that whenever any of the consul servers is not here, then the cluster gets unhealthy?

With this 3rd question, I mean that even on a graceful TERM or INT, no consul server should not perform any graceful 'leave'. With this different approach, then they would properly be back up even when performing a complete graceful restart of the cluster.

This can be done with those extra configs from the “confab” wrapper:

{
"skip_leave_on_interrupt": true,
"leave_on_terminate": false
}

What do you guys think of it?


/Benjamin


Re: AUFS bug in Linux kernel

Benjamin Gandon
 

Very neat!
Thanks a lot Eric.

Le 11 avr. 2016 à 17:46, Eric Malm <emalm(a)pivotal.io> a écrit :

Hi, Benjamin,

Yes, the BOSH-Lite boxes with kernel 3.19.0-40 through 3.19.0-50 are all susceptible to the AUFS bug. Kernel versions 3.19.0-51 and later will be fine, and I believe the earliest BOSH-Lite Vagrant box with one of those kernel versions is 9000.102.0. The 3.19.0-49 kernel that went into 3192 was a one-off build that Canonical supplied in advance of the release of the official kernel package with the fix (https://launchpad.net/ubuntu/+source/linux-lts-vivid/3.19.0-51.57~14.04.1 <https://launchpad.net/ubuntu/+source/linux-lts-vivid/3.19.0-51.57~14.04.1>), and the 'official' package with kernel 3.19.0-49 still has the AUFS bug.

Thanks,
Eric

On Mon, Apr 11, 2016 at 8:36 AM, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> wrote:
Hi,

Sorry for the late up, but would this hit bosh-lite too?
Because after it has run for a while, I’m experiencing severe similar issues with the 53 garden containers I use in Bosh-Lite.

Config :
- Bosh-lite v9000.91.0 (i.e. bosh v250 + warden-cpi v29 + garden-linux v0.331.0) and the kernel is 3.19.0-47.53~14.04.1 (I might have upgraded it)
- Deployment: cf v231 + Diego v0.1434.0 + Garden-linux v0.333.0 + Etcd v36 + cf-mysql v26 + other

Will the linux-image-3.19.0-49-generic fix the issue, as it was done in this 2016-02-08 commit <https://github.com/cloudfoundry/bosh/commit/750c5e7ed70b1d7753500ca725590c1c0eac1262> for stemcell 3192 ?

As a safety measure, I decided to upgrade to kernel 3.19.0-58-generic and I would be happy to get a confirmation that (1) my bosh-lite deployment was hit by the AUFS bug, and that (2) the new kernel I installed will get me off this operational nightmare.

Thanks!


Le 28 janv. 2016 à 02:06, Eric Malm <emalm(a)pivotal.io <mailto:emalm(a)pivotal.io>> a écrit :

Hi, Mike,

Warden also uses aufs for its containers' overlay filesystems, so we expect the same issue to affect the DEAs on these stemcell versions. I'm not aware of a deliberate attempt to reproduce it on the DEAs, though.

Thanks,
Eric

On Wed, Jan 27, 2016 at 4:08 PM, Mike Youngstrom <youngm(a)gmail.com <mailto:youngm(a)gmail.com>> wrote:
Thanks Will. Does anyone know if this bug could also impacts Warden?

Mike

On Wed, Jan 27, 2016 at 9:50 AM, Will Pragnell <wpragnell(a)pivotal.io <mailto:wpragnell(a)pivotal.io>> wrote:
A bug with AUFS [1] was introduced in version 3.19.0-40 of the linux kernel. This bug can cause containers to end up with unkillable zombie processes with high CPU usage. This can happen any time a container is supposed to be destroyed.

This affects both Garden-Linux and Warden (and Docker). If you see significant slowdown or increased CPU usage on DEAs or Diego cells, it might well be this. It will probably build slowly up over time, so you may not notice anything for a while depending on the rate of app instance churn on your deployment.

The bad version of the kernel is present in stemcell 3160 and later. I can't recommend using older stemcells because the bad kernel versions also include fixes for several high severity security vulnerabilities (at least [2-5], there may be others I've missed). Were it not for these, rolling back to stemcell 3157 would be the fix.

We're waiting for a fix to make its way into the kernel, and the BOSH team will produce a stemcell with the fix as soon as possible. In the meantime, I'd suggest simply keeping a closer eye than usual on your DEAs and Diego cells.

If this issue occurs, the only solution is to recreate that machine. While we've not had any actual reports of this issue occurring for Cloud Foundry deployments in the wild yet, we're confident that the issue will be occurring. The Diego team have seen it in testing, and several teams have encountered the issue with their Concourse workers, which also use Garden-Linux.

As always, please get in touch out if you have any questions.

Will - Garden PM

[1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533043 <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533043>
[2]: http://www.ubuntu.com/usn/usn-2857-1/ <http://www.ubuntu.com/usn/usn-2857-1/>
[3]: http://www.ubuntu.com/usn/usn-2868-1/ <http://www.ubuntu.com/usn/usn-2868-1/>
[4]: http://www.ubuntu.com/usn/usn-2869-1/ <http://www.ubuntu.com/usn/usn-2869-1/>
[5]: http://www.ubuntu.com/usn/usn-2871-2/ <http://www.ubuntu.com/usn/usn-2871-2/>


Remarks about the “confab” wrapper for consul

Benjamin Gandon
 

Hi cf devs,


I’m running a CF deployment with redundancy, and I just experienced my consul servers not being able to elect any leader.
That’s a VERY frustrating situation that keeps the whole CF deployment down, until you get a deeper understanding of consul, and figure out they just need a silly manual 'join' so that they get back together.

But that was definitely not easy to nail down because at first look, I could just see monit restarting the “agent_ctl” every 60 seconds because confab was not writing the damn PID file.


More specifically, the 3 consul servers (i.e. consul_z1/0, consul_z1/1 and consul_z2/0) had properly left oneanother uppon a graceful shutdown. This state was persisted in /var/vcap/store/raft/peers.json being “null” on each one of them, so they would not get back together on restart. A manual 'join' was necessary. But it took me hours to get there because I’m no expert with consul.

And until the 'join' is made, VerifySynced() was negative in confab, and monit was constantly starting and stopping it every 60 seconds. But once you step back, you realize confab was actually waiting for the new leader to be elected before it writes the PID file. Which is questionable.

So, I’m asking 3 questions here:

1. Does writing the PID file in confab that late really makes sense?
2. Could someone please write some minimal documentation about confab, at least to tell what it is supposed to do?
3. Wouldn’t it be wiser that whenever any of the consul servers is not here, then the cluster gets unhealthy?

With this 3rd question, I mean that even on a graceful TERM or INT, no consul server should not perform any graceful 'leave'. With this different approach, then they would properly be back up even when performing a complete graceful restart of the cluster.

This can be done with those extra configs from the “confab” wrapper:

{
"skip_leave_on_interrupt": true,
"leave_on_terminate": false
}

What do you guys think of it?


/Benjamin


Re: AUFS bug in Linux kernel

Eric Malm <emalm@...>
 

Hi, Benjamin,

Yes, the BOSH-Lite boxes with kernel 3.19.0-40 through 3.19.0-50 are all
susceptible to the AUFS bug. Kernel versions 3.19.0-51 and later will be
fine, and I believe the earliest BOSH-Lite Vagrant box with one of those
kernel versions is 9000.102.0. The 3.19.0-49 kernel that went into 3192 was
a one-off build that Canonical supplied in advance of the release of the
official kernel package with the fix (
https://launchpad.net/ubuntu/+source/linux-lts-vivid/3.19.0-51.57~14.04.1),
and the 'official' package with kernel 3.19.0-49 still has the AUFS bug.

Thanks,
Eric

On Mon, Apr 11, 2016 at 8:36 AM, Benjamin Gandon <benjamin(a)gandon.org>
wrote:

Hi,

Sorry for the late up, but would this hit bosh-lite too?
Because after it has run for a while, I’m experiencing severe similar
issues with the 53 garden containers I use in Bosh-Lite.

Config :
- Bosh-lite v9000.91.0 (i.e. bosh v250 + warden-cpi v29 + garden-linux
v0.331.0) and the kernel is 3.19.0-47.53~14.04.1 (I *might* have upgraded
it)
- Deployment: cf v231 + Diego v0.1434.0 + Garden-linux v0.333.0 + Etcd
v36 + cf-mysql v26 + other

Will the linux-image-3.19.0-49-generic fix the issue, as it was done in
this 2016-02-08 commit
<https://github.com/cloudfoundry/bosh/commit/750c5e7ed70b1d7753500ca725590c1c0eac1262> for
stemcell 3192 ?

As a safety measure, I decided to upgrade to kernel 3.19.0-58-generic and
I would be happy to get a confirmation that (1) my bosh-lite deployment was
hit by the AUFS bug, and that (2) the new kernel I installed will get me
off this operational nightmare.

Thanks!


Le 28 janv. 2016 à 02:06, Eric Malm <emalm(a)pivotal.io> a écrit :

Hi, Mike,

Warden also uses aufs for its containers' overlay filesystems, so we
expect the same issue to affect the DEAs on these stemcell versions. I'm
not aware of a deliberate attempt to reproduce it on the DEAs, though.

Thanks,
Eric

On Wed, Jan 27, 2016 at 4:08 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks Will. Does anyone know if this bug could also impacts Warden?

Mike

On Wed, Jan 27, 2016 at 9:50 AM, Will Pragnell <wpragnell(a)pivotal.io>
wrote:

A bug with AUFS [1] was introduced in version 3.19.0-40 of the linux
kernel. This bug can cause containers to end up with unkillable zombie
processes with high CPU usage. This can happen any time a container is
supposed to be destroyed.

This affects both Garden-Linux and Warden (and Docker). If you see
significant slowdown or increased CPU usage on DEAs or Diego cells, it
might well be this. It will probably build slowly up over time, so you may
not notice anything for a while depending on the rate of app instance churn
on your deployment.

The bad version of the kernel is present in stemcell 3160 and later. I
can't recommend using older stemcells because the bad kernel versions also
include fixes for several high severity security vulnerabilities (at least
[2-5], there may be others I've missed). Were it not for these, rolling
back to stemcell 3157 would be the fix.

We're waiting for a fix to make its way into the kernel, and the BOSH
team will produce a stemcell with the fix as soon as possible. In the
meantime, I'd suggest simply keeping a closer eye than usual on your DEAs
and Diego cells.

If this issue occurs, the only solution is to recreate that machine.
While we've not had any actual reports of this issue occurring for Cloud
Foundry deployments in the wild yet, we're confident that the issue will be
occurring. The Diego team have seen it in testing, and several teams have
encountered the issue with their Concourse workers, which also use
Garden-Linux.

As always, please get in touch out if you have any questions.

Will - Garden PM

[1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533043
[2]: http://www.ubuntu.com/usn/usn-2857-1/
[3]: http://www.ubuntu.com/usn/usn-2868-1/
[4]: http://www.ubuntu.com/usn/usn-2869-1/
[5]: http://www.ubuntu.com/usn/usn-2871-2/

4861 - 4880 of 9422