Date   

postgres out of disk space

Matthias Ender <Matthias.Ender@...>
 

I have a cf-aws-tiny cf-boshrelease deployment, and it's been running well for over 4 months.
We have about 40 apps, with a couple of dozen of cf pushes each day.
Yesterday pushing apps became spotty and then impossible, with various errors.
Turned out the 100GB disk for the postgres instance on the data note was full.
I increased the disk size and things a running again.
But - what happened there? 100G and growing seems like awfully large database for a rather modest use.
And I'm worried it'll just happen again in a few months.

thanks,
Matthias


Instance crashing after running once. Error: "reason"=>"CRASHED", "exit_status"=>0, "exit_description"=>"app instance exited"

Zuba Al
 

I've pushed the app which uses redis service (A sample app which simply send and receives a message thru Redis service). Instance is created at start and after successfully running once the instance getting crashed with error: "reason"=>"CRASHED", "exit_status"=>0, "exit_description"=>"app instance exited". And after sometime another instance getting created automatically, sucessfully running once and crashing with below logs. And this goes on for sometime.

my manifest.yml:

name: RedisApp
no-route: true
memory: 512M
random-route: true
instances: 1
path: target/gs-messaging-redis-0.1.0.jar
services:
- redislite


cf logs RedisApp command output:

2015-09-25T11:58:51.85+0200 [DEA/1] OUT Starting app instance (index 0) with guid aec41933-ef0c-4d5b-8e67-da6729ca3005
2015-09-25T11:58:56.58+0200 [App/0] OUT
2015-09-25T11:58:56.58+0200 [App/0] OUT . ____ _ __ _ _
2015-09-25T11:58:56.58+0200 [App/0] OUT /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
2015-09-25T11:58:56.58+0200 [App/0] OUT ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
2015-09-25T11:58:56.58+0200 [App/0] OUT \\/ ___)| |_)| | | | | || (_| | ) ) ) )
2015-09-25T11:58:56.58+0200 [App/0] OUT ' |____| .__|_| |_|_| |_\__, | / / / /
2015-09-25T11:58:56.58+0200 [App/0] OUT =========|_|==============|___/=/_/_/_/
2015-09-25T11:58:56.58+0200 [App/0] OUT :: Spring Boot :: (v1.2.6.RELEASE)
2015-09-25T11:58:56.69+0200 [App/0] OUT 2015-09-25 09:58:56.690 INFO 29 --- [ main] pertySourceApplicationContextInitializer : Adding 'cloud' PropertySource to ApplicationContext
2015-09-25T11:58:56.78+0200 [App/0] OUT 2015-09-25 09:58:56.786 INFO 29 --- [ main] nfigurationApplicationContextInitializer : Adding cloud service auto-reconfiguration to ApplicationContext
2015-09-25T11:58:56.80+0200 [App/0] OUT 2015-09-25 09:58:56.804 INFO 29 --- [ main] hello.Application : Starting Application on 18venf3o9v7 with PID 29 (/home/vcap/app started by vcap in /home/vcap/app)
2015-09-25T11:58:56.86+0200 [App/0] OUT 2015-09-25 09:58:56.869 INFO 29 --- [ main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext(a)6d2ca421: startup date [Fri Sep 25 09:58:56 UTC 2015]; root of context h
ierarchy
2015-09-25T11:58:57.22+0200 [App/0] OUT 2015-09-25 09:58:57.223 WARN 29 --- [ main] .i.s.PathMatchingResourcePatternResolver : Skipping [/home/vcap/app/.java-buildpack/spring_auto_reconfiguration/spring_auto_reconfiguration-1.10.0_RELEASE.jar] because it does not denote a directory
2015-09-25T11:58:57.78+0200 [App/0] OUT 2015-09-25 09:58:57.780 INFO 29 --- [ main] urceCloudServiceBeanFactoryPostProcessor : Auto-reconfiguring beans of type javax.sql.DataSource
2015-09-25T11:58:57.79+0200 [App/0] OUT 2015-09-25 09:58:57.789 INFO 29 --- [ main] urceCloudServiceBeanFactoryPostProcessor : No beans of type javax.sql.DataSource found. Skipping auto-reconfiguration.
2015-09-25T11:58:57.79+0200 [App/0] OUT 2015-09-25 09:58:57.794 INFO 29 --- [ main] edisCloudServiceBeanFactoryPostProcessor : Auto-reconfiguring beans of type org.springframework.data.redis.connection.RedisConnectionFactory
2015-09-25T11:58:57.90+0200 [App/0] OUT 2015-09-25 09:58:57.905 INFO 29 --- [ main] edisCloudServiceBeanFactoryPostProcessor : Reconfigured bean redisConnectionFactory into singleton service connector org.springframework.data.redis.connection.jedis.JedisConnectionFactory(a)74ca9fd4
2015-09-25T11:58:58.29+0200 [App/0] OUT 2015-09-25 09:58:58.298 INFO 29 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup
2015-09-25T11:58:58.30+0200 [App/0] OUT 2015-09-25 09:58:58.308 INFO 29 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 2147483647
2015-09-25T11:58:58.37+0200 [App/0] OUT 2015-09-25 09:58:58.375 INFO 29 --- [ main] hello.Application : Started Application in 2.491 seconds (JVM running for 3.401)
2015-09-25T11:58:58.37+0200 [App/0] OUT 2015-09-25 09:58:58.376 INFO 29 --- [ main] hello.Application : Sending message...
2015-09-25T11:58:58.39+0200 [App/0] OUT 2015-09-25 09:58:58.398 INFO 29 --- [ container-2] hello.Receiver : Received <Hello from Redis!>
2015-09-25T11:58:58.40+0200 [App/0] OUT 2015-09-25 09:58:58.400 INFO 29 --- [ Thread-2] s.c.a.AnnotationConfigApplicationContext : Closing org.springframework.context.annotation.AnnotationConfigApplicationContext(a)6d2ca421: startup date [Fri Sep 25 09:58:56 UTC 2015]; root of context hier
archy
2015-09-25T11:58:58.40+0200 [App/0] OUT 2015-09-25 09:58:58.401 INFO 29 --- [ Thread-2] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase 2147483647
2015-09-25T11:58:58.40+0200 [App/0] OUT 2015-09-25 09:58:58.404 INFO 29 --- [ Thread-2] o.s.j.e.a.AnnotationMBeanExporter : Unregistering JMX-exposed beans on shutdown
2015-09-25T11:58:58.44+0200 [App/0] ERR
2015-09-25T11:58:58.49+0200 [API/0] OUT App instance exited with guid aec41933-ef0c-4d5b-8e67-da6729ca3005 payload: {"cc_partition"=>"default", "droplet"=>"aec41933-ef0c-4d5b-8e67-da6729ca3005", "version"=>"8985cc3d-6aa2-4e34-a11c-f64289aeace3", "instance"=>"0147066c534c4d3bb879fffd2c149529", "
index"=>0, "reason"=>"CRASHED", "exit_status"=>0, "exit_description"=>"app instance exited", "crash_timestamp"=>1443175138}


Re: Proposal: Decomposing cf-release and Extracting Deployment Strategies

Mike Youngstrom
 

Sounds good. Thanks for taking the time to discuss this with me.

Mike

On Mon, Sep 21, 2015 at 7:24 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

This forces us to spread all clusterable nodes across 2 deploys and
certain jobs, like CC, use the job_name+index to uniquely identify a node

I believe they're planning on switching to guids for bosh job
identifiers. I saw in another thread you and Dmitriy discussed this. Any
other reasons for having unique job names we should know about?

How would you feel about the interface allowing for specifying
additional releases, jobs, and templates to be colocated on existing jobs,
along with property configuration for these things?

I don't quite follow what you are proposing here. Can you clarify?
What I mean is the tools we build for generating manifests will support
specifying inputs (probably in the form of a YAML file) that declares what
additional releases you want to add to the deployment, what additional jobs
you may want to add, what additional job templates you may want to colocate
with an existing job, and property configuration for those additional jobs
or colocated job templates. A common example is wanting to colocate some
monitoring agent on all the jobs, and providing some credential
configuration so it can pump metrics into some third party service. This
would be for things not already covered by the LAMB architecture.

Something like that would work for me as long as we were still able to
take advantage of the scripts/tooling in cf-deployment to manage the config
and templates we manage in lds-deployment.

Yes, that'd be the plan.

Cheers,
Amit


On Mon, Sep 21, 2015 at 2:41 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks for the response. See comments below:


Sensitive property management as part of manifest generation
(encrypted or acquired from an outside source)

How do you currently get these encrypted or external values into your
manifests? At manifest generation time, would you be able to generate a
stub on the fly from this source, and pass it into the manifest generation
script?
Yes, that would work fine. Just thought I'd call it out as something our
current solution does that we'd have to augment in cf-deployment.


If for some reason we are forced to fork a stock release we'd like to
be able to use that forked release we are building instead of the publicly
available one for manifest generation and release uploads, etc.

Yes, using the stock release will be the default option, but we will
support several other ways of specifying a release, including providing a
URL to a remote tarball, a path to a local release directory, a path to a
local tarball, and maybe a git URL and SHA.
Great!


The job names in each deployment must be unique across the
installation.

Why do the job names need to be unique across deployments?
This is because a single bosh cannot connect to multiple datacenters
which for us represent different availability zones. This forces us to
spread all clusterable nodes across 2 deploys and certain jobs, like CC,
use the job_name+index to uniquely identify a node [0]. Therefore if we
have 2 CCs deployed across 2 AZ we must have one job named
cloud_controller_az1 and the other named cloud_controller_az2. Does that
make sense? I recognize this is mostly the fault of a limitation in Bosh
but until bosh supports connection to multiple vsphere datacenters with a
single director we will need to account for it in our templatin.

[0]
https://github.com/cloudfoundry/cloud_controller_ng/blob/5257a8af6990e71cd1e34ae8978dfe4773b32826/bosh-templates/cloud_controller_worker_ctl.erb#L48

Occasionally we may wish to use some config from a stock release not
currently exposed in a cf-deployment template. I'd like to be sure there
is a way we can add that config, in a not hacky way, without waiting for a
PR to be accepted and subsequent release.

This would be ideal. Currently, a lot of complexity in manifest
generation is around, if you specify a certain value X, then you need to
make sure you specify values Y, Z, etc. in a compatible way. E.g. if you
have 3 etcd instances, then the value for the etcd.machines property needs
to have those 3 IPs. If you specify domain as "mydomain.com", then you
need to specify in other places that the UAA URL is "
https://uaa.mydomain.com". The hope is most of this complexity goes
away with BOSH Links (
https://github.com/cloudfoundry/bosh-notes/blob/master/links.md). My
hope is that, as the complexity goes away, we will have to maintain less
logic and will be able to comfortably expose more, if not all, of the
properties.
Great

We have our own internal bosh releases and config that we'll need to
merge in with the things cf-deployment is doing.

How would you feel about the interface allowing for specifying
additional releases, jobs, and templates to be colocated on existing jobs,
along with property configuration for these things?
I don't quite follow what you are proposing here. Can you clarify?


we'd like to augment this with our own release jobs and config that
we know to work with cf-deployment 250's and perhaps tag it as v250.lds

Would a workflow like this work for you: maintain an lds-deployment
repo, which includes cf-deployment as a submodule, and you can version
lds-deployment and update your submodule pointer to cf-deployment as you
see fit? lds-deployment will probably just need the cf-deployment
submodule, and a config file describing the "blessed" versions of the
non-stock releases you wish to add on. I know this is lacking details, but
does something along those lines sound like a reasonable workflow?
Something like that would work for me as long as we were still able to
take advantage of the scripts/tooling in cf-deployment to manage the config
and templates we manage in lds-deployment.

Thanks,
Mike




On Wed, Sep 16, 2015 at 3:06 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Another situation we have that you may want to keep in mind while
developing cf-deployment:

* We are using vsphere and currently we have a cf installation with 2
AZ using 2 separate vsphere "Datacenters" (more details:
https://github.com/cloudfoundry/bosh-notes/issues/7). This means we
have a CF installation that is actually made up of 2 deployments. So, we
need to generate a manifest for az1 and another for az2. The job names in
each deployment must be unique across the installation (e.g.
cloud_controller_az1 and cloud_controller_az2) would be the cc job names in
each deployment.

Mike

On Wed, Sep 16, 2015 at 3:38 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Here are some of the examples:

* Sensitive property management as part of manifest generation
(encrypted or acquired from an outside source)

* We have our own internal bosh releases and config that we'll need to
merge in with the things cf-deployment is doing. For example, if
cf-deployment tags v250 as including Diego 3333 and etcd 34 with given
templates perhaps we'd like to augment this with our own release jobs and
config that we know to work with cf-deployment 250's and perhaps tag it as
v250.lds and that becomes what we use to generate our manifests and upload
releases.

* Occasionally we may wish to use some config from a stock release not
currently exposed in a cf-deployment template. I'd like to be sure there
is a way we can add that config, in a not hacky way, without waiting for a
PR to be accepted and subsequent release.

* If for some reason we are forced to fork a stock release we'd like
to be able to use that forked release we are building instead of the
publicly available one for manifest generation and release uploads, etc.

Does that help?

Mike



On Tue, Sep 15, 2015 at 9:50 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Thanks for the feedback Mike!

Can you tell us more specifically what sort of extensions you need?
It would be great if cf-deployment provided an interface that could serve
the needs of essentially all operators of CF.

Thanks,
Amit

On Tue, Sep 15, 2015 at 4:02 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

This is great stuff! My organization currently maintains our own
custom ways to generate manifests, include secure properties, and manage
release versions.

We would love to base the next generation of our solution on
cf-deployment. Have you put any thought into how others might customize or
extend cf-deployment? Our needs are very similar to yours just sometimes a
little different.

Perhaps a private fork periodically merged with a known good release
combination (tag) might be appropriate? Or some way to include the same
tools into a wholly private repo?

Mike


On Tue, Sep 8, 2015 at 1:22 PM, Amit Gupta <agupta(a)pivotal.io>
wrote:

Hi all,

The CF OSS Release Integration team (casually referred to as the
"MEGA team") is trying to solve a lot of tightly interrelated problems, and
make many of said problems less interrelated. It is difficult to address
just one issue without touching the others, so the following proposal
addresses several issues, but the most important ones are:

* decompose cf-release into many independently manageable,
independently testable, independently usable releases
* separate manifest generation strategies from the release source,
paving the way for Diego to be part of the standard deployment

This proposal will outline a picture of how manifest generation
will work in a unified manner in development, test, and integration
environments. It will also outline a picture of what each release’s test
pipelines will look like, how they will feed into a common integration
environment, and how feedback from the integration environment will feed
back into the test environments. Finally, it will propose a picture for
what the integration environment will look like, and how we get from the
current integration environment to where we want to be.

For further details, please feel free to view and comment here:


https://docs.google.com/document/d/1Viga_TzUB2nLxN_ILqksmUiILM1hGhq7MBXxgLaUOkY

Thanks,
Amit, CF OSS Release Integration team


Re: Running the app test suite within the CATs, and the admin_buildpack_lifecycle_test is failing

CF Runtime
 

You can also check out the v208 tag of cf-release, then run the
acceptance-tests from src/github.com/cloudfoundry/cf-acceptance-tests

Joseph
CF Release Integration Team

On Thu, Sep 24, 2015 at 2:53 PM, Christopher Piraino <cpiraino(a)pivotal.io>
wrote:

Jordan,

The Cloud Foundry bosh release comes with an errand called
"acceptance_tests" that contains the version of CATs which that version of
CF was tested with. You can run these by doing "bosh run errand
acceptance_tests".

There are also some manifest properties that you might need to set for the
CATs to run correctly. The list of all possible properties for the
acceptance_tests errand can be found here:
https://github.com/cloudfoundry/cf-release/blob/develop/jobs/acceptance-tests/spec
.


- Chris Piraino

On Thu, Sep 24, 2015 at 11:27 AM, Jordan Collier <jordanicollier(a)gmail.com
wrote:
I was unclear on what I am asking, the real question is as follows:

What is the best way to run the apps test suite within the CATS on an
older version of cloud foundry? (for example I am running these tests on
version 208)


Re: Running the app test suite within the CATs, and the admin_buildpack_lifecycle_test is failing

Christopher Piraino <cpiraino@...>
 

Jordan,

The Cloud Foundry bosh release comes with an errand called
"acceptance_tests" that contains the version of CATs which that version of
CF was tested with. You can run these by doing "bosh run errand
acceptance_tests".

There are also some manifest properties that you might need to set for the
CATs to run correctly. The list of all possible properties for the
acceptance_tests errand can be found here:
https://github.com/cloudfoundry/cf-release/blob/develop/jobs/acceptance-tests/spec
.


- Chris Piraino

On Thu, Sep 24, 2015 at 11:27 AM, Jordan Collier <jordanicollier(a)gmail.com>
wrote:

I was unclear on what I am asking, the real question is as follows:

What is the best way to run the apps test suite within the CATS on an
older version of cloud foundry? (for example I am running these tests on
version 208)


Re: Environment variables with special characters not handled correctly?

Jonas Rosland
 

Hi Daniel and Dieu,

Finally after much trial and error I finally got it working. I created a user-provided service and then called on it from my application. I've documented the steps for anyone else wanting to know how to work with these variables (clearer documentation with examples maybe?).

Here's the documentation and example application: https://gist.github.com/jonasrosland/08b5758eaa9098a81cf8

Thanks for all your help!

Best regards,
Jonas Rosland


Re: Environment variables with special characters not handled correctly?

Daniel Mikusa
 

Sorry, sounds like escaping is the only option here for an environment
variable. If you don't want to escape, I think you could create a user
provided service, provide the value through there and bind that to your
app. That'll come into VCAP_SERVICES which, if I read the PT story right,
shouldn't need any extra escaping.

Dan

On Thu, Sep 24, 2015 at 3:10 PM, Jonas Rosland <jonas.rosland(a)emc.com>
wrote:

Hi Dieu and Daniel,

I did set the environment variable like you suggest Daniel, I should've
showed that in my example. I see now that the app wrongly removes the $
characters and the character after it, I didn't notice that before. `cf env
appname` shows the correct environment variable, so I guess I will have to
do some escaping of characters in my app?

Best regards,
Jonas Rosland


Re: Environment variables with special characters not handled correctly?

Jonas Rosland
 

Hi Dieu and Daniel,

I did set the environment variable like you suggest Daniel, I should've showed that in my example. I see now that the app wrongly removes the $ characters and the character after it, I didn't notice that before. `cf env appname` shows the correct environment variable, so I guess I will have to do some escaping of characters in my app?

Best regards,
Jonas Rosland


Re: Running the app test suite within the CATs, and the admin_buildpack_lifecycle_test is failing

Jordan Collier
 

I was unclear on what I am asking, the real question is as follows:

What is the best way to run the apps test suite within the CATS on an older version of cloud foundry? (for example I am running these tests on version 208)


Re: Error 400007: `stats_z1/0' is not running after update

Amit Kumar Gupta
 

Okay, please let me know if you are able to fix your security group
settings and whether the original problem gets resolved.

Amit

On Wed, Sep 23, 2015 at 7:03 PM, Guangcai Wang <guangcai.wang(a)gmail.com>
wrote:

That did help. It showed us the real error.

==> metron_agent/metron_agent.stdout.log <==
{"timestamp":1443054247.927488327,"process_id":23472,"source":"metron","log_level":"warn","message":"Failed
to create client: Could not connect to NATS: dial tcp 192.168.110.202:4222:
i/o
timeout","data":null,"file":"/var/vcap/data/compile/metron_agent/loggregator/src/
github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/collectorregistrar/collector_registrar.go
","line":51,"method":"
github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/collectorregistrar.(*CollectorRegistrar).Run
"}

I checked the security rule. It seems to have some problems.

On Thu, Sep 24, 2015 at 2:47 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

I often take the following approach to debugging issues like this:

* Open two shell sessions to your failing VM using bosh ssh, and switch
to superuser
* In one session, `watch monit summary`. You might see collector going
back and forth between initializing and not monitored, but please report
anything else of interest you see here
* In the other session, `cd /var/vcap/sys/log` and then `watch
--differences=cumulative ls -altr **/*` to see which files are being
written to while the startup processes are thrashing. Then `tail -f FILE_1
FILE_2 ...` listing all the files that were being written to, and seem
relevant to the thrashing process(es) in monit


On Wed, Sep 23, 2015 at 12:21 AM, Guangcai Wang <guangcai.wang(a)gmail.com>
wrote:

It frequently logs the message below. It seems not helpful.


{"timestamp":1442987404.9433253,"message":"collector.started","log_level":"info","source":"collector","data":{},"thread_id":70132569199380,"fiber_id":70132570371720,"process_id":19392,"file":"/var/vcap/packages/collector/lib/collector/config.rb","lineno":45,"method":"setup_logging"}

the only possible error message from the bosh debug log is
"ntp":{"message":"bad ntp server"}

But I don't think, it is related to the failure of stats_z1 updating.

I, [2015-09-23 04:55:59 #2392] [canary_update(stats_z1/0)] INFO --
DirectorJobRunner: Checking if stats_z1/0 has been updated after
63.333333333333336 seconds
D, [2015-09-23 04:55:59 #2392] [canary_update(stats_z1/0)] DEBUG --
DirectorJobRunner: SENT: agent.7d3452bd-679e-4a97-8514-63a373a54ffd
{"method":"get_state","arguments":[],"reply_to":"director.c5b97fc1-b972-47ec-9412-a83ad240823b.473fda64-6ac3-4a53-9ebc-321fc7eabd7a"}
D, [2015-09-23 04:55:59 #2392] [] DEBUG -- DirectorJobRunner: RECEIVED:
director.c5b97fc1-b972-47ec-9412-a83ad240823b.473fda64-6ac3-4a53-9ebc-321fc7eabd7a
{"value":{"properties":{"logging":{"max_log_file_size":""}},"job":{"name":"stats_z1","release":"","template":"fluentd","version":"4c71c87bbf0144428afacd470e2a5e32b91932fc","sha1":"b141c6037d429d732bf3d67f7b79f8d7d80aac5d","blobstore_id":"d8451d63-2e4f-4664-93a8-a77e5419621d","templates":[{"name":"fluentd","version":"4c71c87bbf0144428afacd470e2a5e32b91932fc","sha1":"b141c6037d429d732bf3d67f7b79f8d7d80aac5d","blobstore_id":"d8451d63-2e4f-4664-93a8-a77e5419621d"},{"name":"collector","version":"889b187e2f6adc453c61fd8f706525b60e4b85ed","sha1":"f5ae15a8fa2417bf984513e5c4269f8407a274dc","blobstore_id":"3eeb0166-a75c-49fb-9f28-c29788dbf64d"},{"name":"metron_agent","version":"e6df4c316b71af68dfc4ca476c8d1a4885e82f5b","sha1":"42b6d84ad9368eba0508015d780922a43a86047d","blobstore_id":"e578bfb0-9726-4754-87ae-b54c8940e41a"},{"name":"apaas_collector","version":"8808f0ae627a54706896a784dba47570c92e0c8b","sha1":"b9a63da925b40910445d592c70abcf4d23ffe84d","blobstore_id":"3e6fa71a-07f7-446a-96f4-3caceea02f2f"}]},"packages":{"apaas_collector":{"name":"apaas_collector","version":"f294704d51d4517e4df3d8417a3d7c71699bc04d.1","sha1":"5af77ceb01b7995926dbd4ad7481dcb7c3d94faf","blobstore_id":"fa0e96b9-71a6-4828-416e-dde3427a73a9"},"collector":{"name":"collector","version":"ba47450ce83b8f2249b75c79b38397db249df48b.1","sha1":"0bf8ee0d69b3f21cf1878a43a9616cb7e14f6f25","blobstore_id":"722a5455-f7f7-427d-7e8d-e562552857bc"},"common":{"name":"common","version":"99c756b71550530632e393f5189220f170a69647.1","sha1":"90159de912c9bfc71740324f431ddce1a5fede00","blobstore_id":"37be6f28-c340-4899-7fd3-3517606491bb"},"fluentd-0.12.13":{"name":"fluentd-0.12.13","version":"71d8decbba6c863bff6c325f1f8df621a91eb45f.1","sha1":"2bd32b3d3de59e5dbdd77021417359bb5754b1cf","blobstore_id":"7bc81ac6-7c24-4a94-74d1-bb9930b07751"},"metron_agent":{"name":"metron_agent","version":"997d87534f57cad148d56c5b8362b72e726424e4.1","sha1":"a21404c50562de75000d285a02cd43bf098bfdb9","blobstore_id":"6c7cf72c-9ace-40a1-4632-c27946bf631e"},"ruby-2.1.6":{"name":"ruby-2.1.6","version":"41d0100ffa4b21267bceef055bc84dc37527fa35.1","sha1":"8a9867197682cabf2bc784f71c4d904bc479c898","blobstore_id":"536bc527-3225-43f6-7aad-71f36addec80"}},"configuration_hash":"a73c7d06b0257746e95aaa2ca994c11629cbd324","networks":{"private_cf_subnet":{"cloud_properties":{"name":"random","net_id":"1e1c9aca-0b5a-4a8f-836a-54c18c21c9b9","security_groups":["az1_cf_management_secgroup_bosh_cf_ssh_cf2","az1_cf_management_secgroup_cf_private_cf2","az1_cf_management_secgroup_cf_public_cf2"]},"default":["dns","gateway"],"dns":["192.168.110.8","133.162.193.10","133.162.193.9","192.168.110.10"],"dns_record_name":"0.stats-z1.private-cf-subnet.cf-apaas.microbosh","gateway":"192.168.110.11","ip":"192.168.110.204","netmask":"255.255.255.0"}},"resource_pool":{"cloud_properties":{"instance_type":"S-1"},"name":"small_z1","stemcell":{"name":"bosh-openstack-kvm-ubuntu-trusty-go_agent","version":"2989"}},"deployment":"cf-apaas","index":0,"persistent_disk":0,"persistent_disk_pool":null,"rendered_templates_archive":{"sha1":"0ffd89fa41e02888c9f9b09c6af52ea58265a8ec","blobstore_id":"4bd01ae7-a69a-4fe5-932b-d98137585a3b"},"agent_id":"7d3452bd-679e-4a97-8514-63a373a54ffd","bosh_protocol":"1","job_state":"failing","vm":{"name":"vm-12d45510-096d-4b8b-9547-73ea5fda00c2"},"ntp":{"message":"bad
ntp server"}}}


On Wed, Sep 23, 2015 at 5:13 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Please check the file collector/collector.log, it's in a subdirectory
of the unpacked log tarball.

On Wed, Sep 23, 2015 at 12:01 AM, Guangcai Wang <
guangcai.wang(a)gmail.com> wrote:

Actually, I checked the two files in status_z1 job VM. I did not find
any clues. Attached for reference.

On Wed, Sep 23, 2015 at 4:54 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

If you do "bosh logs stats_z1 0 --job" you will get a tarball of all
the logs for the relevant processes running on the stats_z1/0 VM. You will
likely find some error messages in the collectors stdout or stderr logs.

On Tue, Sep 22, 2015 at 11:30 PM, Guangcai Wang <
guangcai.wang(a)gmail.com> wrote:

It does not help.

I always see the "collector" process bouncing between "running" and
"does not exit" when I use "monit summary" in a while loop.

Who knows how to get the real error when the "collector" process is
not failed? Thanks.

On Wed, Sep 23, 2015 at 4:11 PM, Tony <Tonyl(a)fast.au.fujitsu.com>
wrote:

My approach is to login on the stats vm and sudo, then
run "monit status" and restart the failed processes or simply
restart all
processes by running "monit restart all"

wait for a while(5~10 minutes at most)
If there is still some failed process, e.g. collector
then run ps -ef | grep collector
and kill the processes in the list(may be you need to run kill -9
sometimes)

then "monit restart all"

Normally, it will fix the issue "Failed: `XXX' is not running after
update"



--
View this message in context:
http://cf-dev.70369.x6.nabble.com/cf-dev-Error-400007-stats-z1-0-is-not-running-after-update-tp1901p1902.html
Sent from the CF Dev mailing list archive at Nabble.com.


Re: Environment variables with special characters not handled correctly?

Dieu Cao <dcao@...>
 

Hi Jonas,

You'll need to escape the special characters like $.
See this tracker story for some background:
https://www.pivotaltracker.com/story/show/76655240

-Dieu

On Thu, Sep 24, 2015 at 11:14 AM, Daniel Mikusa <dmikusa(a)pivotal.io> wrote:

It's possible that your shell is escaping the characters, like the '$'.

Try `cf set-env appname WORDPRESS_BEARER
'1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd'`.
Note the single quotes around the value of the environment variable. Or
set the environment variable in a manifest.yml file.

Also, run `cf env <app-name>` to confirm the value is being set correctly.

Thanks,

Dan


On Thu, Sep 24, 2015 at 1:57 PM, Jonas Rosland <jonas.rosland(a)emc.com>
wrote:

Hi all,

I am having an issue with an environment variable containing special
characters that doesn't seem to picked up correctly by CF.

I run `cf set-env appname WORDPRESS_BEARER
1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd`
(obviously not my currently correct key) and then use it in this Ruby app:
https://gist.github.com/jonasrosland/08b5758eaa9098a81cf8

When I check the output the app complains about the API key being
incorrect, when it is, in fact, correct. If I set it manually in the
application it works, but that is of course not a good practice. I've also
verified that the environment variable does get picked up by the
application by adding some logging output to show the API key, but it still
won't work. I'm wondering if this is because of the special characters in
the environment variable?

Thanks in advance,
Jonas Rosland


Re: Environment variables with special characters not handled correctly?

Daniel Mikusa
 

It's possible that your shell is escaping the characters, like the '$'.

Try `cf set-env appname WORDPRESS_BEARER
'1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd'`. Note
the single quotes around the value of the environment variable. Or set the
environment variable in a manifest.yml file.

Also, run `cf env <app-name>` to confirm the value is being set correctly.

Thanks,

Dan


On Thu, Sep 24, 2015 at 1:57 PM, Jonas Rosland <jonas.rosland(a)emc.com>
wrote:

Hi all,

I am having an issue with an environment variable containing special
characters that doesn't seem to picked up correctly by CF.

I run `cf set-env appname WORDPRESS_BEARER
1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd`
(obviously not my currently correct key) and then use it in this Ruby app:
https://gist.github.com/jonasrosland/08b5758eaa9098a81cf8

When I check the output the app complains about the API key being
incorrect, when it is, in fact, correct. If I set it manually in the
application it works, but that is of course not a good practice. I've also
verified that the environment variable does get picked up by the
application by adding some logging output to show the API key, but it still
won't work. I'm wondering if this is because of the special characters in
the environment variable?

Thanks in advance,
Jonas Rosland


Environment variables with special characters not handled correctly?

Jonas Rosland
 

Hi all,

I am having an issue with an environment variable containing special characters that doesn't seem to picked up correctly by CF.

I run `cf set-env appname WORDPRESS_BEARER 1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd` (obviously not my currently correct key) and then use it in this Ruby app: https://gist.github.com/jonasrosland/08b5758eaa9098a81cf8

When I check the output the app complains about the API key being incorrect, when it is, in fact, correct. If I set it manually in the application it works, but that is of course not a good practice. I've also verified that the environment variable does get picked up by the application by adding some logging output to show the API key, but it still won't work. I'm wondering if this is because of the special characters in the environment variable?

Thanks in advance,
Jonas Rosland


Jordan Collier email for mailing list

Jordan Collier
 

jordanicollier(a)gmail.com


Running the app test suite within the CATs, and the admin_buildpack_lifecycle_test is failing

Jordan Collier
 

`[2015-09-24 15:06:25.24 (UTC)]> cf logout
Logging out...
OK
• Failure [22.809 seconds]
Admin Buildpacks
/Users/localadmin/github.com/cloudfoundry/src/github.com/cloudfoundry/cf-acceptance-tests/apps/admin_buildpack_lifecycle_test.go:172
when the buildpack fails to detect
/Users/localadmin/github.com/cloudfoundry/src/github.com/cloudfoundry/cf-acceptance-tests/apps/admin_buildpack_lifecycle_test.go:129
fails to stage [It]
/Users/localadmin/github.com/cloudfoundry/src/github.com/cloudfoundry/cf-acceptance-tests/apps/admin_buildpack_lifecycle_test.go:128

Got stuck at:
Creating app CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974 in org CATS-ORG-1-2015_09_24-08h01m29.448s / space CATS-SPACE-1-2015_09_24-08h01m29.448s as CATS-USER-1-2015_09_24-08h01m29.448s...
OK

Creating route cats-app-5c7775a6-1753-4e2d-4415-7f6abe01a974.switchollie.allstate.com...
OK

Binding cats-app-5c7775a6-1753-4e2d-4415-7f6abe01a974.switchollie.allstate.com to CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974...
OK

Uploading CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974...
Uploading app files from: /var/folders/ph/tg82ppzd6kngwm_g2tbzpccc0000gn/T/matching-app824262495
Uploading 132, 1 files
Done uploading
OK

Starting app CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974 in org CATS-ORG-1-2015_09_24-08h01m29.448s / space CATS-SPACE-1-2015_09_24-08h01m29.448s as CATS-USER-1-2015_09_24-08h01m29.448s...
-----> Downloaded app package (4.0K)
Staging failed: An application could not be detected by any available buildpack


FAILED
Server error, status code: 400, error code: 170003, message: An app was not successfully detected by any available buildpack

TIP: use 'cf logs CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974 --recent' for more information

Waiting for:
NoAppDetectedError

/Users/localadmin/github.com/cloudfoundry/src/github.com/cloudfoundry/cf-acceptance-tests/apps/admin_buildpack_lifecycle_test.go:127`

It looks as if it is failing for the correct reason, is there something I am missing?


Re: Security group rules to allow HTTP communication between 2 apps deployed on CF

CF Runtime
 

Containers have a default iptables rule for REJECT all traffic. If there is
not a security group configured to allow the traffic to the destination,
you'll get a connection refused.

Security groups can only be created and configured by admin users.

Your only option is probably to have one app connect to the other using the
public route bound to that app.

Joseph
CF Release Integration Team

On Wed, Sep 23, 2015 at 3:54 AM, Denilson Nastacio <dnastacio(a)gmail.com>
wrote:

The message indicates this problem is unrelated to security groups. You
would get something like "host not found" instead of "connection refused".

Which version of CF are you using?
Can you curl a url from app2 at all?

On Wed, Sep 23, 2015, 3:27 AM Naveen Asapu <asapu.naveen(a)gmail.com> wrote:

Hi Matthew Sykes,

Actually I'm trying to monitor usage of app in bluemix. for that i'm
using cf-abacus in the example steps this command also there.

can u suggest how to monitor app usage using curl and cloudfoundary

--
Thanks
Naveen Asapu


Re: DEA/Warden staging error

kyle havlovitz <kylehav@...>
 

Ok, after more investigating the problem was that network manager was
running on the machine and was trying to take control of new network
interfaces after they came up, so it would cause problems with the
interface that Warden created for the container. With network manager
disabled I can push the app and everything is fine.

Thanks for your help everyone.

On Wed, Sep 23, 2015 at 10:45 AM, kyle havlovitz <kylehav(a)gmail.com> wrote:

Here's the output from those commands:
https://gist.github.com/MrEnzyme/36592831b1c46d44f007
Soon after running those I noticed that the container loses its IPv4
address shortly after coming up and ifconfig looks like this:

root(a)cf-build:/home/cloud-user/test# ifconfig -a
docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 Link encap:Ethernet HWaddr fa:16:3e:cd:f3:0a
inet addr:172.25.1.52 Bcast:172.25.1.127 Mask:255.255.255.128
inet6 addr: fe80::f816:3eff:fecd:f30a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:515749 errors:0 dropped:0 overruns:0 frame:0
TX packets:295471 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1162366659 (1.1 GB) TX bytes:59056756 (59.0 MB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:45057315 errors:0 dropped:0 overruns:0 frame:0
TX packets:45057315 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:18042315375 (18.0 GB) TX bytes:18042315375 (18.0 GB)
w-190db6c54la-0 Link encap:Ethernet HWaddr 12:dc:ba:da:38:5b
inet6 addr: fe80::10dc:baff:feda:385b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1454 Metric:1
RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:227 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:872 (872.0 B) TX bytes:35618 (35.6 KB)

Any idea what would be causing that?


On Tue, Sep 22, 2015 at 10:31 PM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

Based on your description, it doesn't sound like warden networking or the
warden iptables chains are your problem. Are you able to share all of your
routes and chains via a gist?

route -n
ifconfig -a
iptables -L -n -v -t filter
iptables -L -n -v -t nat
iptables -L -n -v -t mangle

Any kernel messages that look relevant in the message buffer (dmesg)?

Have you tried doing a network capture to verify the packets are look the
way you expect? Are you sure your host routing rules are good? Do the
warden subnets overlap with any network accessible to the host?

Based on previous notes, it doesn't sound like this is a standard
deployment so it's hard to say what could be impacting you.

On Tue, Sep 22, 2015 at 1:08 PM, Kyle Havlovitz (kyhavlov) <
kyhavlov(a)cisco.com> wrote:

I didn’t; I’m still having this problem. Even adding this lenient
security group didn’t let me get any traffic out of the VM:

[{"name":"allow_all","rules":[{"protocol":"all","destination":"0.0.0.0/0
"},{"protocol":"tcp","destination":"0.0.0.0/0
","ports":"1-65535"},{"protocol":"udp","destination":"0.0.0.0/0
","ports":"1-65535"}]}]

The only way I was able to get traffic out was by manually removing the
reject/drop iptables rules that warden set up, and even with that the
container still lost all connectivity after 30 seconds.

From: CF Runtime <cfruntime(a)gmail.com>
Reply-To: "Discussions about Cloud Foundry projects and the system
overall." <cf-dev(a)lists.cloudfoundry.org>
Date: Tuesday, September 22, 2015 at 12:50 PM
To: "Discussions about Cloud Foundry projects and the system overall." <
cf-dev(a)lists.cloudfoundry.org>
Subject: [cf-dev] Re: Re: Re: Re: Re: Re: Re: Re: DEA/Warden staging
error

Hey Kyle,

Did you make any progress?

Zak & Mikhail
CF Release Integration Team

On Thu, Sep 17, 2015 at 10:28 AM, CF Runtime <cfruntime(a)gmail.com>
wrote:

It certainly could be. By default the contains reject all egress
traffic. CC security groups configure iptables rules that allow traffic
out.

One of the default security groups in the BOSH templates allows access
on port 53. If you have no security groups, the containers will not be able
to make any outgoing requests.

Joseph & Natalie
CF Release Integration Team

On Thu, Sep 17, 2015 at 8:44 AM, Kyle Havlovitz (kyhavlov) <
kyhavlov(a)cisco.com> wrote:

On running git clone inside the container via the warden shell, I get:
"Cloning into 'staticfile-buildpack'...
fatal: unable to access '
https://github.com/cloudfoundry/staticfile-buildpack/': Could not
resolve host: github.com".
So the container can't get to anything outside of it (I also tried
pinging some external IPs to make sure it wasn't a DNS thing). Would this
be caused by cloud controller security group settings?

--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: Removing support for v1 service brokers

Mike Youngstrom
 

My vote on to wait a couple more months. I guess we'll see if anyone else
would like more months.

Mike

On Sep 23, 2015 11:52 PM, "Dieu Cao" <dcao(a)pivotal.io> wrote:

Thanks Mike. Totally understandable.


On Wed, Sep 23, 2015 at 9:23 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks Dieu, honestly I was just trying to find an angle to bargain for a
bit more time. :) Three months is generous. But six months would be
glorious. :)

After the CAB call this month we got started converting our brokers over
but our migration is more difficult because we use Service instance
credentials quite a bit and those don't appear to be handled well when
doing "migrate-service-instances". I think we can do 3 months but we'll be
putting our users through a bit of a fire drill.

That said I'll understand if you stick to 3 months since, we should have
started this conversion log ago.

Mike

On Wed, Sep 23, 2015 at 1:22 AM, Dieu Cao <dcao(a)pivotal.io> wrote:

We've found NATS to be unstable under certain conditions, temporary
network interruptions or network instability, around the client
reconnection logic.
We've seen that it could take anywhere from a few seconds to half an
hour to reconnect properly. We spent a fair amount of time investigating
ways to improve the reconnection logic and have made some improvements but
believe that it's best to work towards not having this dependency.
You can find more about this in the stories in this epic [1].

Mike, in addition to removing the NATS dependency, this will remove the
burden on the team, almost a weekly fight, in terms of maintaining
backwards compatibility for the v1 broker spec any time we work on adding
functionality to the service broker api.
I'll work with the team in the next couple of weeks on specific stories
and I'll link to it here.

[1] https://www.pivotaltracker.com/epic/show/1440790


On Tue, Sep 22, 2015 at 10:07 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Thanks for the announcement.

To be clear is this announcement to cease support for the old v1
brokers or is this to eliminate support for the v1 api in the CC? Does the
v1 CC code depend on NATS? None of my custom v1 brokers depend on NATS.

Mike

On Tue, Sep 22, 2015 at 6:01 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

Hello all,

We plan to remove support for v1 service brokers in about 3 months, in
a cf-release following 12/31/2015.
We are working towards removing CF's dependency on NATS and the v1
service brokers are still dependent on NATS.
Please let me know if you have questions/concerns about this timeline.

I'll be working on verifying a set of steps that you can find here [1]
that document how to migrate your service broker from v1 to v2 and what is
required in order to persist user data and will get that posted to the
service broker api docs officially.

-Dieu
CF CAPI PM

[1]
https://docs.google.com/document/d/1Pl1o7mxtn3Iayq2STcMArT1cJsKkvi4Ey1-d3TB_Nhs/edit?usp=sharing




Re: How to deploy a Web application using HTTPs

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Hi Dieu,

many thanks for the technical info.

I will consider this factor to add this restriction in the development.

Juan Antonio


Re: Removing support for v1 service brokers

Dieu Cao <dcao@...>
 

Thanks Mike. Totally understandable.

On Wed, Sep 23, 2015 at 9:23 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks Dieu, honestly I was just trying to find an angle to bargain for a
bit more time. :) Three months is generous. But six months would be
glorious. :)

After the CAB call this month we got started converting our brokers over
but our migration is more difficult because we use Service instance
credentials quite a bit and those don't appear to be handled well when
doing "migrate-service-instances". I think we can do 3 months but we'll be
putting our users through a bit of a fire drill.

That said I'll understand if you stick to 3 months since, we should have
started this conversion log ago.

Mike

On Wed, Sep 23, 2015 at 1:22 AM, Dieu Cao <dcao(a)pivotal.io> wrote:

We've found NATS to be unstable under certain conditions, temporary
network interruptions or network instability, around the client
reconnection logic.
We've seen that it could take anywhere from a few seconds to half an hour
to reconnect properly. We spent a fair amount of time investigating ways to
improve the reconnection logic and have made some improvements but believe
that it's best to work towards not having this dependency.
You can find more about this in the stories in this epic [1].

Mike, in addition to removing the NATS dependency, this will remove the
burden on the team, almost a weekly fight, in terms of maintaining
backwards compatibility for the v1 broker spec any time we work on adding
functionality to the service broker api.
I'll work with the team in the next couple of weeks on specific stories
and I'll link to it here.

[1] https://www.pivotaltracker.com/epic/show/1440790


On Tue, Sep 22, 2015 at 10:07 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Thanks for the announcement.

To be clear is this announcement to cease support for the old v1 brokers
or is this to eliminate support for the v1 api in the CC? Does the v1 CC
code depend on NATS? None of my custom v1 brokers depend on NATS.

Mike

On Tue, Sep 22, 2015 at 6:01 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

Hello all,

We plan to remove support for v1 service brokers in about 3 months, in
a cf-release following 12/31/2015.
We are working towards removing CF's dependency on NATS and the v1
service brokers are still dependent on NATS.
Please let me know if you have questions/concerns about this timeline.

I'll be working on verifying a set of steps that you can find here [1]
that document how to migrate your service broker from v1 to v2 and what is
required in order to persist user data and will get that posted to the
service broker api docs officially.

-Dieu
CF CAPI PM

[1]
https://docs.google.com/document/d/1Pl1o7mxtn3Iayq2STcMArT1cJsKkvi4Ey1-d3TB_Nhs/edit?usp=sharing



7441 - 7460 of 9415