Date   

Re: Update on Mailman 3 launch

Eric Searcy <eric@...>
 

We have an open upstream bug about the issue that is causing delays to mail (including
messages you post via the web interface sometimes not showing up). I hope this will have
a proper resolution by the end of the week, and in the mean time we will be monitoring for
this and “unsticking” any messages that get stuck: so if you don’t see any error, then
your message *will* get posted.
(https://gitlab.com/mailman/mailman/issues/138)
We have a workaround in place for the above issue (hack code to disable connection pooling) and it seems to have addressed this problem until we can get a better fix.

We've also just filed a new bug related to digest delivery mode affecting 3 or 4 list members, however we haven't yet determined exactly what the nature of the impact is. Hopefully a quick fix as soon as we discover more about the bug.
(https://gitlab.com/mailman/mailman/issues/141)

Eric


Re: Running Docker private images on CF

dharmi
 

Thanks for the details.
I deployed diego-docker-cache-release and I could run private docker images
now.

One note however. I had to modify the **property-overrides.yml *to add the
IP:<port> of the *docker-cache/0* job among the
*insecure_docker_registry_list* of for it to work. Without which it says
{"timestamp":"1439701925.514369965","source":"garden-linux","message":"garden-linux.pool.umojd9q7s54.provide-rootfs-failed","log_level":2,"data":{"error":"repository_fetcher:
ProvideRegistry: could not fetch image f93137f1-.. from registry
10.250.21.80:8080: Registry 10.250.21.80:8080 is missing from
-insecureDockerRegistryList
([docker-registry.service.cf.internal:8080])","session":"2.13"}}

Consul discovery at fault I suspect, if not, pls suggest.

Another observation on the Docker registry URI while running docker private
images(*, not a Diego issue, I guess*)
Looks like by default (*when I don't mention **docker_login_server*), the
images are pulled using the V1 api

$ cf start myapp
Starting app myapp in org myorg / space default as user...
Creating container
Successfully created container
Staging...
Docker daemon running
Staging process started ...
Caching docker image ...
*Logging to https://index.docker.io/v1/ <https://index.docker.io/v1/> ...*
WARNING: login credentials saved in /root/.dockercfg.
Login Succeeded
Logged in.
Pulling docker image <dockerid>/<image>:latest ...
latest: Pulling from <dockerid>/image
511136ea3c5a: Pulling fs layer
30d39e59ffe2: Pulling fs layer
c90d655b99b2: Pulling fs layer
…..

when I explicitly mention the V2 URI, which is *registry.hub.docker.com
<http://registry.hub.docker.com>* (*correct me if I am wrong*), pulling the
image fails.

$ cf start myapp
Starting app myapp in org myorg / space default as user...
Creating container
Successfully created container
Staging...
Docker daemon running
Staging process started ...
Caching docker image ...
*Logging to https://registry.hub.docker.com/
<https://registry.hub.docker.com/> ...*
WARNING: login credentials saved in /root/.dockercfg.
*Login Succeeded*
Logged in.
Pulling docker image <dockerid>/<image>:latest ...
time="2015-08-19T19:59:44Z" level=error msg=*"Error from V2 registry:
Authentication is required."*
Pulling repository <dockerid>/<image>
Error: image <dockerid>/<image>:latest ... not found

Thanks

On Tue, Aug 11, 2015 at 6:45 PM, Eric Malm <emalm(a)pivotal.io> wrote:

Hi, Dharmi,

In order to run private docker images (that is, ones that require
user/password/email authentication with the registry), you'll have to stage
them into the optional diego-docker-cache deployed alongside Diego. The
BOSH release is located at
https://github.com/cloudfoundry-incubator/diego-docker-cache-release. If
you've already deployed Diego using the spiff-based manifest-generation
templates in diego-release, the deployment for this release should be
similar. If you deploy the caching registry release without TLS enabled or
enabled but with a self-signed certificate, Diego should then be configured
with the URL "docker-registry.service.cf.internal:8080" supplied in the
diego.garden-linux.insecure_docker_registry_list property, and
diego.stager.insecure_docker_registry set to 'true', as you can see in
https://github.com/cloudfoundry-incubator/diego-docker-cache-release/blob/develop/stubs-for-diego-release/bosh-lite-property-overrides.yml
.

Once that release is deployed, you can follow the instructions at
https://github.com/cloudfoundry-incubator/diego-docker-cache-release#caching-docker-image-with-diego
to stage your image into the cache, which should be as simple as setting
the DIEGO_DOCKER_CACHE env var to 'true' on your app before staging it.
When you start the app, Diego will then instruct Garden to pull the image
from the internal caching registry rather than from the remote registry you
staged it from. This has the added benefit of ensuring that you're always
running exactly the Docker image you staged, rather than something that may
have changed in the remote registry.

Thanks,
Eric, CF Runtime Diego PM

On Tue, Aug 11, 2015 at 9:32 AM, dharmi <dharmi(a)gmail.com> wrote:

We have CF v214 with Diego deployed on AWS.

I am able to successfully create apps from Docker public repo, as per the
apidocs <http://apidocs.cloudfoundry.org/214/apps/creating_an_app.html>
,

but, while creating apps from the Docker private repos, I see the below
error from 'cf logs' when starting the app.

[API/0] OUT Updated app with guid bcb8f363-xyz
({"route"=>"5af6948b-xyz"})
[API/0] OUT Updated app with guid bcb8f363-xyz ({"state"=>"STARTED"})
[STG/0] OUT Creating container
[STG/0] OUT Successfully created container
[STG/0] OUT Staging...
[STG/0] OUT Staging process started ...
[STG/0] ERR Staging process failed: Exit trace for group:
[STG/0] ERR builder exited with error: failed to fetch metadata from
[:dockerid/go-app] with tag [latest] and insecure registries [] due to
HTTP
code: 404
[STG/0] OUT Exit status 2
[STG/0] ERR Staging Failed: Exited with status 2
[API/0] ERR Failed to stage application: staging failed


cf curl command for reference.

cf curl /v2/apps -X POST -H "Content-Type: application/json" -H
"Authorization: bearer *accessToken*" -d '
{"name": "myapp",
"space_guid": "71b22eba-xyz",
"docker_image": ":dockerid/go-app",
"diego": true,
"docker_credentials_json":
{"docker_login_server": "https://index.docker.io/v1/",
"docker_user": ":dockerid",
"docker_password": ":dockerpwd",
"docker_email": ":email"
}
}'

Looking at the apidocs, the 'Example value' for 'docker_credentials_json'
indicates a Hash value
(#<RspecApiDocumentation::Views::HtmlExample:0x0000000bb883e0>), but
looking
inside the code, we found the below JSON format.

let(:docker_credentials) do
{
docker_login_server: login_server,
docker_user: user,
docker_password: password,
docker_email: email
}

Pls correct me if I am missing something.

Thanks,
Dharmi



--
View this message in context:
http://cf-dev.70369.x6.nabble.com/Running-Docker-private-images-on-CF-tp1148.html
Sent from the CF Dev mailing list archive at Nabble.com.
--
Wise people learn when they can. Fools learn when they must.” - The Duke of
Ellington


Re: I'm getting different x_forwarded_for in my Gorouter access logs depending on what browser/cli-tool I use.

Dieu Cao <dcao@...>
 

What do you use for your edge load balancer?
You could try to set up your edge load balancer to strip x_forwarded_for
sent in the request and set it again before forwarding on.
In that way, I believe it should be consistent, whatever the client.

-Dieu

On Wed, Aug 19, 2015 at 7:38 AM, Simon Johansson <simon(a)simonjohansson.com>
wrote:

Gist of diff:
https://gist.github.com/simonjohansson/847e972f1459ea4cd65e

Gist of debug output:
https://gist.github.com/simonjohansson/3324b08fc42e5ac7105a

Since it's quite hard to read in the email...


Re: Security Question --- Securely wipe data on warden container removal / destruction???

Will Pragnell <wpragnell@...>
 

In the Docker image case, the filesystem layer specific to the container is
also deleted immediately when the container stops running (this is the same
for buildpack based apps on Diego/Garden). Lower layers in the image (i.e.
the pre-existing docker image, as pulled from the registry) are not
currently removed, even if not used in any other running containers.

In the coming weeks, we'll define and implement a strategy to remove unused
images, but the details aren't decided yet.

On 19 August 2015 at 14:57, James Bayer <jbayer(a)pivotal.io> wrote:

warden/DEAs keeps container file systems for a configured amount of time,
something like 1 hr before removing the containers, i believe with standard
removal tools.

diego cells and garden removes container file system immediately after
they are stopped by the user or the system. when using docker images, the
container images are cached in the garden graph directory and i'm not quite
sure of their cleanup / garbage collection life cycle.

On Wed, Aug 19, 2015 at 1:08 AM, Chris K <christopherkugler2(a)yahoo.de>
wrote:

Hi,

I have a few questions regarding the way data is removed when an
application is removed and its corresponding warden container is destroyed.
As the Cloud Foundry instance my company is using may be shared with
multiple tenants, this is a very critical question for us to be answered.
From Cloud Foundry's GitHub repository I gathered the following
information regarding the destruction process:

"When a container is destroyed -- either per user request, or
automatically after being idle -- Warden first kills all unprivileged
processes running inside the container. These processes first receive a
TERM signal followed by a KILL if they haven't exited after a couple of
seconds. When these processes have terminated, the root of the container's
process tree is sent a KILL . Once all resources the container used have
been released, its files are removed and it is considered destroyed."
(Quote: https://github.com/cloudfoundry/warden/tree/master/warden)

According to this quote all files of the file system are removed before
the resources can be used again. But how are they removed? Are they
securely wiped, meaning all blocks are set to zero (or randomized)? And how
is data removed from the RAM before it can be assigned to a new warden
(i.e. new application).

In case the data is not being securely wiped, how much access does an
application have towards the available memory? Is it for example possible
to create files of arbitrary size and read / access them?

I'd be thankful for any kind of hints on this topic.

With Regards,
Chris


--
Thank you,

James Bayer


Re: I'm getting different x_forwarded_for in my Gorouter access logs depending on what browser/cli-tool I use.

Simon Johansson <simon@...>
 

Gist of diff:
https://gist.github.com/simonjohansson/847e972f1459ea4cd65e

Gist of debug output:
https://gist.github.com/simonjohansson/3324b08fc42e5ac7105a

Since it's quite hard to read in the email...


I'm getting different x_forwarded_for in my Gorouter access logs depending on what browser/cli-tool I use.

Simon Johansson <simon@...>
 

Hiya!

Im looking into an issue where the x_forwarded_for have different values depending on what you are using to hit it with.
With curl, w3m, lynx x_forwarded_for gets set to "sourceIP, a-gateway-ip"
whereas with Firefox, Chrome, Opera, wget x_forwarded_for is simply set to "sourceIP"

This is causing confusion but most of all it makes me tear my hair out as I cannot figure out what is going on, from what I can see the issue is not in Gorouter directly but in the stdlib of Golang.

I have made two patches to figure out what is going on

In the gorouter

--- a/proxy/proxy.go
+++ b/proxy/proxy.go
@@ -2,18 +2,18 @@ package proxy

import (
"errors"
+ "fmt"
"io"
)

const (
@@ -117,6 +117,8 @@ func (p *proxy) lookup(request *http.Request) *route.Pool {
}

func (p *proxy) ServeHTTP(responseWriter http.ResponseWriter, request *http.Request) {
+ fmt.Println("In proxy.ServeHTTP")
+ fmt.Println("Request: ", request)
startedAt := time.Now()

accessLog := access_log.AccessLogRecord{
@@ -207,7 +209,9 @@ func (p *proxy) ServeHTTP(responseWriter http.ResponseWriter, request *http.Requ
},
}

+ fmt.Println("X-Forwarded-For before newReverseProxy.ServeHTTP: ", request.Header.Get("X-Forwarded-For"))
p.newReverseProxy(roundTripper, request).ServeHTTP(proxyWriter, request)
+ fmt.Println("X-Forwarded-For after newReverseProxy.ServeHTTP: ", request.Header.Get("X-Forwarded-For"))

accessLog.FinishedAt = time.Now()
accessLog.BodyBytesSent = proxyWriter.Size()


And in golang/src/net/http/httputil/reverseproxy.go
--- a/src/net/http/httputil/reverseproxy.go
+++ b/src/net/http/httputil/reverseproxy.go
@@ -7,6 +7,7 @@
package httputil

import (
+ "fmt"
"io"
"log"
"net"
@@ -101,6 +102,7 @@ var hopHeaders = []string{
}

func (p *ReverseProxy) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
+ fmt.Println("In net/http/httputil/reverseproxy.go: ServeHTTP")
transport := p.Transport
if transport == nil {
transport = http.DefaultTransport
@@ -132,6 +134,7 @@ func (p *ReverseProxy) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
}
}

+ fmt.Println("X-Forwarded-For in req before 'If we aren't the first proxy retain prior': ", req.Header.Get("X-Forwarded-For"))
if clientIP, _, err := net.SplitHostPort(req.RemoteAddr); err == nil {
// If we aren't the first proxy retain prior
// X-Forwarded-For information as a comma+space
@@ -140,7 +143,9 @@ func (p *ReverseProxy) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
clientIP = strings.Join(prior, ", ") + ", " + clientIP
}
outreq.Header.Set("X-Forwarded-For", clientIP)
+ fmt.Println("X-Forwarded-For in outreq: ", outreq.Header.Get("X-Forwarded-For"))
}
+ fmt.Println("X-Forwarded-For in req after 'If we aren't the first proxy retain prior': ", req.Header.Get("X-Forwarded-For"))

res, err := transport.RoundTrip(outreq)
if err != nil {
@@ -158,6 +163,7 @@ func (p *ReverseProxy) ServeHTTP(rw http.ResponseWriter, req *http.Request) {

rw.WriteHeader(res.StatusCode)
p.copyResponse(rw, res.Body)
+ fmt.Println("Done in net/http/httputil/reverseproxy.go: ServeHTTP")
}


So basically debug printing.

This is the difference I see (first example with 2 ips in the headers, second example with 1 ip in the headers), both requests from the same machine to the same Gorouter.

Logs from gorouter
In proxy.ServeHTTP
Request: &{GET / HTTP/1.1 1 1 map[Accept:[*/*] Forwarded:[for=172.21.27.221; proto=http] X-Forwarded-Proto:[http] X-Forwarded-For:[172.21.27.221] True_client_ip:[172.21.27.221] X-Cf-Requestid:[874ed33a-c361-4946-74c3-c18b693ade7e] User-Agent:[curl/7.43.0]] 0xbb2730 0 [] false cf-env.test.cf.springer-sbm.com map[] map[] <nil> map[] 10.230.31.8:43597 / <nil>}
X-Forwarded-For before newReverseProxy.ServeHTTP: 172.21.27.221
In net/http/httputil/reverseproxy.go: ServeHTTP
X-Forwarded-For in req before 'If we aren't the first proxy retain prior': 172.21.27.221
X-Forwarded-For in outreq: 172.21.27.221, 10.230.31.8
X-Forwarded-For in req after 'If we aren't the first proxy retain prior': 172.21.27.221, 10.230.31.8
Done in net/http/httputil/reverseproxy.go: ServeHTTP
X-Forwarded-For after newReverseProxy.ServeHTTP: 172.21.27.221, 10.230.31.8

$ cf logs cf-env
2015-08-19T15:11:36.22+0200 [RTR/0] OUT cf-env.test.cf.springer-sbm.com - [19/08/2015:13:11:36 +0000] "GET / HTTP/1.1" 200 0 4155 "-" "curl/7.43.0" 10.230.31.8:43597 x_forwarded_for:"172.21.27.221, 10.230.31.8" vcap_request_id:c2ce64d5-5951-46fb-7b0a-9ea233c34823 response_time:0.011068200 app_id:9011c83a-9407-4e0a-ae91-0c66ff3d6e92


Logs from gorouter
In proxy.ServeHTTP
Request: &{GET / HTTP/1.1 1 1 map[True_client_ip:[172.21.27.221] X-Cf-Requestid:[d7629a05-cbae-4f13-63d2-43acf5bfe8c9] User-Agent:[Mozilla/5.0 (X11; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0] Accept-Language:[en-US,en;q=0.5] Accept-Encoding:[gzip, deflate] Forwarded:[for=172.21.27.221; proto=http] X-Forwarded-Proto:[http] X-Forwarded-For:[172.21.27.221] Accept:[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8] Cookie:[wt3_eid=%3B846182373400841%7C2141346926300579527%3B532695141032829%7C2142289007300951926%3B929408468507536%7C2142323421600457694%3B987963572816400%7C2142323579400274413%3B595753140200997%7C2142364572200667778%3B754741632944378%7C2142539934000605010] Connection:[keep-alive]] 0xbb2730 0 [] false cf-env.test.cf.springer-sbm.com map[] map[] <nil> map[] 10.230.31.8:43601 / <nil>}
X-Forwarded-For before newReverseProxy.ServeHTTP: 172.21.27.221
In net/http/httputil/reverseproxy.go: ServeHTTP
X-Forwarded-For in req before 'If we aren't the first proxy retain prior': 172.21.27.221
X-Forwarded-For in outreq: 172.21.27.221, 10.230.31.8
X-Forwarded-For in req after 'If we aren't the first proxy retain prior': 172.21.27.221
Done in net/http/httputil/reverseproxy.go: ServeHTTP
X-Forwarded-For after newReverseProxy.ServeHTTP: 172.21.27.221

$ cf logs cf-env
2015-08-19T15:12:16.28+0200 [RTR/0] OUT cf-env.test.cf.springer-sbm.com - [19/08/2015:13:12:16 +0000] "GET / HTTP/1.1" 200 0 4700 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0" 10.230.31.8:43601 x_forwarded_for:"172.21.27.221" vcap_request_id:f807d34a-9191-4a29-5831-cb78eece72e1 response_time:0.008955044 app_id:9011c83a-9407-4e0a-ae91-0c66ff3d6e92

I use Gorouter 212 source and Golang 1.4.2 source.

What confuses me is that here(https://github.com/golang/go/blob/release-branch.go1.4/src/net/http/httputil/reverseproxy.go#L110) we set outreq to req, and here(https://github.com/golang/go/blob/release-branch.go1.4/src/net/http/httputil/reverseproxy.go#L142) we set the outreq's X-Forwareded-For headers that affect req's X-Forwareded-For header when using curl/w3m/lynx but not with Firefox/Opera/Chrome/wget

Anyone ever seen anything similar or anything obvious I have missed?


Re: Security Question --- Securely wipe data on warden container removal / destruction???

James Bayer
 

warden/DEAs keeps container file systems for a configured amount of time,
something like 1 hr before removing the containers, i believe with standard
removal tools.

diego cells and garden removes container file system immediately after they
are stopped by the user or the system. when using docker images, the
container images are cached in the garden graph directory and i'm not quite
sure of their cleanup / garbage collection life cycle.

On Wed, Aug 19, 2015 at 1:08 AM, Chris K <christopherkugler2(a)yahoo.de>
wrote:

Hi,

I have a few questions regarding the way data is removed when an
application is removed and its corresponding warden container is destroyed.
As the Cloud Foundry instance my company is using may be shared with
multiple tenants, this is a very critical question for us to be answered.
From Cloud Foundry's GitHub repository I gathered the following
information regarding the destruction process:

"When a container is destroyed -- either per user request, or
automatically after being idle -- Warden first kills all unprivileged
processes running inside the container. These processes first receive a
TERM signal followed by a KILL if they haven't exited after a couple of
seconds. When these processes have terminated, the root of the container's
process tree is sent a KILL . Once all resources the container used have
been released, its files are removed and it is considered destroyed."
(Quote: https://github.com/cloudfoundry/warden/tree/master/warden)

According to this quote all files of the file system are removed before
the resources can be used again. But how are they removed? Are they
securely wiped, meaning all blocks are set to zero (or randomized)? And how
is data removed from the RAM before it can be assigned to a new warden
(i.e. new application).

In case the data is not being securely wiped, how much access does an
application have towards the available memory? Is it for example possible
to create files of arbitrary size and read / access them?

I'd be thankful for any kind of hints on this topic.

With Regards,
Chris


--
Thank you,

James Bayer


Re: More reliable way to collect user application logs

James Bayer
 

use the syslog nozzle to get system component logs and send them to your
syslog server of choice.

https://github.com/cloudfoundry-community/firehose-to-syslog

On Wed, Aug 19, 2015 at 12:59 AM, ronak banka <ronakbanka.cse(a)gmail.com>
wrote:

same query for the component logs



--
View this message in context:
http://cf-dev.70369.x6.nabble.com/cf-dev-More-reliable-way-to-collect-user-application-logs-tp1214p1273.html
Sent from the CF Dev mailing list archive at Nabble.com.


--
Thank you,

James Bayer


Re: CF integration with logging and monitoring tool

James Bayer
 

what is your logging and metrics tool? does it have an api?

the loggregator nozzle's are the go-forward approach to tapping into logs
and metrics in the entire cf system and sending them somewhere.

On Wed, Aug 12, 2015 at 4:14 AM, Swatz bosh <swatzron(a)gmail.com> wrote:

I would like to know in CF, how to integrate with 3rd party app logging
and monitoring like Graphite, Nagios, etc, what is the recommended
approach. I found few articles where its mentioned firehose is the better
option. Whereas I remember having collector (stats_z1/z2) job pointing to
such monitoring server works well. So what steps I need to follow if I have
to integrate such application monitoring tool using firehose? Do I need to
write Nozzles for my monitoring tool like CloudCredo did for Graphite?
https://github.com/CloudCredo/graphite-nozzle
http://www.cloudcredo.com/how-to-integrate-graphite-with-cloud-foundry/
So if I have to integrate with Nagios, Wily, Splunk etc I would need
nozzle for all of them? If not what changes I need to do in my
configuration and in buildpack(not sure)?

I also found NOAA client https://github.com/cloudfoundry/noaa which I
think consumes all logs from doppler and show them on console? How this
Noaa client is using firehose, is not very clear in the document?
--
Thank you,

James Bayer


Re: Running Docker private images on CF

James Bayer
 

i don't believe the current docker support includes support for private
docker images on a docker registry that require authentication. that does
not seem to be listed in the diego docker docs [1] yet, but we should make
that explicitly clear.

[1]
https://github.com/cloudfoundry-incubator/diego-design-notes/blob/master/docker-support.md

On Mon, Aug 10, 2015 at 3:34 PM, Dharmendra Sarkar <dharmi(a)gmail.com> wrote:


We have CF v214 with Diego deployed on AWS.

I am able to successfully create apps from Docker public repo, as per the
apidocs, but, while creating apps from the Docker private repos, I see the
below error from 'cf logs' when starting the app. 'appreciate any pointers.

[API/0] OUT Updated app with guid bcb8f363-xyz
({"route"=>"5af6948b-xyz"})
[API/0] OUT Updated app with guid bcb8f363-xyz ({"state"=>"STARTED"})
[STG/0] OUT Creating container
[STG/0] OUT Successfully created container
[STG/0] OUT Staging...
[STG/0] OUT Staging process started ...
[STG/0] ERR Staging process failed: Exit trace for group:
[STG/0] ERR builder exited with error: failed to fetch metadata from
[adobecloud/go-app] with tag [latest] and insecure registries [] due to
HTTP code: 404
[STG/0] OUT Exit status 2
[STG/0] ERR Staging Failed: Exited with status 2
[API/0] ERR Failed to stage application: staging failed


cf curl command for reference.

cf curl /v2/apps -X POST -H "Content-Type: application/json" -H
"Authorization: bearer *accessToken*" -d '
{"name": "myapp",
"space_guid": "71b22eba-xyz",
"docker_image": "adobecloud/go-app",
"diego": true,
"docker_credentials_json":
{"docker_login_server": "https://index.docker.io/v1/",
"docker_user": ":dockerid",
"docker_password": ":dockerpwd",
"docker_email": ":email"
}
}'

Looking at the apidocs, the 'Example value' for 'docker_credentials_json'
indicates a Hash value
(#<RspecApiDocumentation::Views::HtmlExample:0x0000000bb883e0>), but
looking inside the code, 'found the below JSON format.

let(:docker_credentials) do
{
docker_login_server: login_server,
docker_user: user,
docker_password: password,
docker_email: email
}

Pls correct me if I am missing something.

Thanks,
Dharmi
--
Thank you,

James Bayer


Re: Notifications for service provisioning

James Bayer
 

there is no standard notification mechanism built into cf today for this
kind of information. as an administrator, you could possibly potentially
build and deploy a loggregator firehose nozzle that looked for this type of
information and create notifications based on that content.

the service lifecycle events are tracked and stored in the cloud controller
database and accessible via the api and cli.

for example:
$ cf curl /v2/events?q=type:audit.service_instance.create

see this and similar lifecycle events for services:
http://apidocs.cloudfoundry.org/215/events/list_service_instance_create_events.html

there have been recent threads on the list about having a more proactive
notification system component, but that is still in the formative stages of
discussion.

On Wed, Aug 12, 2015 at 11:25 AM, Vineet Banga <vineetbanga1(a)gmail.com>
wrote:

Is there any notification mechanism available in CF to listen on service
broker create/update/delete calls? We are implementing multiple services
exposed via Service Broker in the marketplace and would like to take
certain common actions when services are being provisioned.

Vineet


--
Thank you,

James Bayer


Re: Bizarre DEA + Spring Behaviour

James Bayer
 

here is some guidance on how to check for available entropy on a linux host
[1]. i'm not sure if the bosh agent, DEA or diego cell captures this metric
or not. but we should certainly look into it. when you're inside a
container, you can check for available entropy with the "cf ssh" command
that is supported now with diego (or the app itself could log it before
startup). see an example of this command running on pivotal's hosted diego
which indicates values lower than 200 while you're trying to do operations
needed entropy can cause a problem [2].

[1] https://major.io/2007/07/01/check-available-entropy-in-linux/
[2] $ cf ssh MYAPP
vcap(a)uqj9t0vqu9l:~$$ cat /proc/sys/kernel/random/entropy_avail; date;
855
Wed Aug 19 13:32:05 UTC 2015
vcap(a)uqj9t0vqu9l:~$ cat /proc/sys/kernel/random/entropy_avail; date;
866
Wed Aug 19 13:32:07 UTC 2015
vcap(a)uqj9t0vqu9l:~$ cat /proc/sys/kernel/random/entropy_avail; date;
876
Wed Aug 19 13:32:08 UTC 2015

On Wed, Aug 19, 2015 at 5:06 AM, Daniel Mikusa <dmikusa(a)pivotal.io> wrote:

I've seen this happen to a good number of apps running on PWS, so it's
something you can encounter when running CF on AWS as well.

What usually happens is that the application takes significantly longer to
start, sometimes to the point where it fails to start quick enough and CF
marks it as crashed. I haven't seen it cause any NPE's though. My
understanding is that the JVM will just block until it gets the entropy it
needs.

Dan

On Wed, Aug 19, 2015 at 3:32 AM, Johannes Hiemer <jvhiemer(a)gmail.com>
wrote:

Daniel,
I have had a problem with the deployment of Spring applications on
Openstack recently as well. I am also not sure, without seeing the logs,
what could be the reason, but did you try:
http://www.evoila.de/vsphere/java-applications-not-starting-on-openstack-based-cloud-foundry-deployment/?lang=en


Regards,
Johannes

On Wed, Aug 19, 2015 at 9:28 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Thanks for the input, that's a good call. A colleague of mine (who is
currently on vacation) did look at that... Sadly he's not around to ask
what he tested.

On Wed, Aug 19, 2015 at 7:06 AM, Guillaume Berche <bercheg(a)gmail.com>
wrote:

Some other "state" on the dea host such has shorteage on /dev/random
(that went away with vm reconstruction but not with dea job restart) ?

Guillaume.
Le 12 août 2015 14:14, "Daniel Mikusa" <dmikusa(a)pivotal.io> a écrit :

It seems like you were pretty thorough. I can't think of anything
that would be different or that could cause symptoms like this, although I
could be overlooking something as well. Without logs / app to try and
replicate I'm not sure I can help much more. Sorry.

Perhaps someone else on the list has some thoughts?

Dan


On Wed, Aug 12, 2015 at 3:25 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi Dan,

Thanks for taking the time to reply.

I didn't include too much in the way of detail, as I was thinking
that there must be a moving part in the equation I'm blind to, in which
case that's a gap in my knowledge that I ought to fill in.

As we did `bosh recreate` on all the VMs, which fixed it` I can't go
back and fetch logs unfortunately. There's no chance of being able to
create a test case as I'm on client's time, so consider this a thought
exercise :)

The app was Spring Boot 1.2.3, pulling in Spring Boot JDBC and Spring
LDAP. Root FS was cflinuxfs2, and the Java buildpack logged the same for
both. On some failing DEAs there were no other apps, on others there were -
it didn't seem to be a factor. All DEAs had plenty of disk space.

I was wondering if there was a race condition, but I assumed Spring
contexts start single-threadedly. Do you know if that's a correct
assumption?

Do you know if there any *things* that could have been different
between the DEAs that I didn't account for? Ie another moving part that's
*not* either release, job, stemcell, droplet, root FS, app
environment?

On Tue, Aug 11, 2015 at 12:32 PM, Daniel Mikusa <dmikusa(a)pivotal.io>
wrote:

On Tue, Aug 11, 2015 at 5:15 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi all,

I've witnessed behaviour caused by the combination of a DEA and a
Spring application that I can't explain. If you like a good mystery or you
happen to know a lot about Java proxies and DEA transient state, please
read on!

A particular Spring app

Version of Spring? What parts of Spring are you pulling into the
app?


was crashing only on specific DEAs in a Cloud Foundry.
Ever try bumping up the log level for Spring when you were getting
the problem? If so, did the problem still occur? Were you able to capture
the logs?



All DEAs were from the same CF release (PCF ERT 1.5.2)
All DEAs were up-to-date according to BOSH (ie no outstanding
changes waiting to be applied)
All DEAs were deployed with identical BOSH job config
All Warden containers were using the same root FS
lucid64 or cflinuxfs2? or didn't matter?


The droplet was the same across all DEAs
The droplet version was the same
The droplet tarballs all had the same MD5 checksum
What was the output of the Java build pack when the droplet was
created? or better yet, run `cf files <app> app/.java-buildpack.log` and
include the output.


Warden was providing the exact same env and start command to all
containers
I saw the same behaviour repeat itself across 5 completely separate
Cloud Foundry installations

The crash was Spring not being able to autowire a bean, where it
was referenced by implementation rather than interface (yes, I know, but it
was not my code!).

Any chance you could include logs from the crash? Was there an
exception / stacktrace generated? Alternatively, have you been able to
create a simple test app that replicates the behavior?


There was some Javassist/CGLIB action going on, creating proxies
for the sake of transaction management.

Rebooting the troublesome DEAs did not fix the problem.

Doing a `bosh recreate` did reliably fix the problem.

Alternatively, changing the Spring code to wire by interface also
reliably fixed the problem.

I can't understand why different DEA instances, from the same BOSH
release, with the same config, on the same stemcell, running the same
version of Warden, with the same droplet, and the same root FS, and the
same env, and the same start command, yielded different behaviour. I'm even
further confused as to why a `bosh recreate` changed that behaviour. What
could possibly have changed? Something on ephemeral disk? But what else is
there on ephemeral disk that could have mattered and was likely to have
changed?

How much was on the disk? Was it getting full? How many other apps
were running on that DEA (before vs after)?


Do CGLIB/Javassist have some native dependencies that weren't in
sync between DEAs?

Anyone with a convincing explanation (that does not involve voodoo)
will receive one free beer and a high-five at the next CF Summit!
Wild guess, race condition in the code somewhere?

Dan


--
Regards,

Daniel Jones
EngineerBetter.com

--
Regards,

Daniel Jones
EngineerBetter.com


--
Mit freundlichen Grüßen

Johannes Hiemer
--
Thank you,

James Bayer


Re: Bizarre DEA + Spring Behaviour

Daniel Mikusa
 

I've seen this happen to a good number of apps running on PWS, so it's
something you can encounter when running CF on AWS as well.

What usually happens is that the application takes significantly longer to
start, sometimes to the point where it fails to start quick enough and CF
marks it as crashed. I haven't seen it cause any NPE's though. My
understanding is that the JVM will just block until it gets the entropy it
needs.

Dan

On Wed, Aug 19, 2015 at 3:32 AM, Johannes Hiemer <jvhiemer(a)gmail.com> wrote:

Daniel,
I have had a problem with the deployment of Spring applications on
Openstack recently as well. I am also not sure, without seeing the logs,
what could be the reason, but did you try:
http://www.evoila.de/vsphere/java-applications-not-starting-on-openstack-based-cloud-foundry-deployment/?lang=en


Regards,
Johannes

On Wed, Aug 19, 2015 at 9:28 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Thanks for the input, that's a good call. A colleague of mine (who is
currently on vacation) did look at that... Sadly he's not around to ask
what he tested.

On Wed, Aug 19, 2015 at 7:06 AM, Guillaume Berche <bercheg(a)gmail.com>
wrote:

Some other "state" on the dea host such has shorteage on /dev/random
(that went away with vm reconstruction but not with dea job restart) ?

Guillaume.
Le 12 août 2015 14:14, "Daniel Mikusa" <dmikusa(a)pivotal.io> a écrit :

It seems like you were pretty thorough. I can't think of anything that
would be different or that could cause symptoms like this, although I could
be overlooking something as well. Without logs / app to try and replicate
I'm not sure I can help much more. Sorry.

Perhaps someone else on the list has some thoughts?

Dan


On Wed, Aug 12, 2015 at 3:25 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi Dan,

Thanks for taking the time to reply.

I didn't include too much in the way of detail, as I was thinking that
there must be a moving part in the equation I'm blind to, in which case
that's a gap in my knowledge that I ought to fill in.

As we did `bosh recreate` on all the VMs, which fixed it` I can't go
back and fetch logs unfortunately. There's no chance of being able to
create a test case as I'm on client's time, so consider this a thought
exercise :)

The app was Spring Boot 1.2.3, pulling in Spring Boot JDBC and Spring
LDAP. Root FS was cflinuxfs2, and the Java buildpack logged the same for
both. On some failing DEAs there were no other apps, on others there were -
it didn't seem to be a factor. All DEAs had plenty of disk space.

I was wondering if there was a race condition, but I assumed Spring
contexts start single-threadedly. Do you know if that's a correct
assumption?

Do you know if there any *things* that could have been different
between the DEAs that I didn't account for? Ie another moving part that's
*not* either release, job, stemcell, droplet, root FS, app
environment?

On Tue, Aug 11, 2015 at 12:32 PM, Daniel Mikusa <dmikusa(a)pivotal.io>
wrote:

On Tue, Aug 11, 2015 at 5:15 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi all,

I've witnessed behaviour caused by the combination of a DEA and a
Spring application that I can't explain. If you like a good mystery or you
happen to know a lot about Java proxies and DEA transient state, please
read on!

A particular Spring app

Version of Spring? What parts of Spring are you pulling into the app?


was crashing only on specific DEAs in a Cloud Foundry.
Ever try bumping up the log level for Spring when you were getting
the problem? If so, did the problem still occur? Were you able to capture
the logs?



All DEAs were from the same CF release (PCF ERT 1.5.2)
All DEAs were up-to-date according to BOSH (ie no outstanding
changes waiting to be applied)
All DEAs were deployed with identical BOSH job config
All Warden containers were using the same root FS
lucid64 or cflinuxfs2? or didn't matter?


The droplet was the same across all DEAs
The droplet version was the same
The droplet tarballs all had the same MD5 checksum
What was the output of the Java build pack when the droplet was
created? or better yet, run `cf files <app> app/.java-buildpack.log` and
include the output.


Warden was providing the exact same env and start command to all
containers
I saw the same behaviour repeat itself across 5 completely separate
Cloud Foundry installations

The crash was Spring not being able to autowire a bean, where it was
referenced by implementation rather than interface (yes, I know, but it was
not my code!).

Any chance you could include logs from the crash? Was there an
exception / stacktrace generated? Alternatively, have you been able to
create a simple test app that replicates the behavior?


There was some Javassist/CGLIB action going on, creating proxies for
the sake of transaction management.

Rebooting the troublesome DEAs did not fix the problem.

Doing a `bosh recreate` did reliably fix the problem.

Alternatively, changing the Spring code to wire by interface also
reliably fixed the problem.

I can't understand why different DEA instances, from the same BOSH
release, with the same config, on the same stemcell, running the same
version of Warden, with the same droplet, and the same root FS, and the
same env, and the same start command, yielded different behaviour. I'm even
further confused as to why a `bosh recreate` changed that behaviour. What
could possibly have changed? Something on ephemeral disk? But what else is
there on ephemeral disk that could have mattered and was likely to have
changed?

How much was on the disk? Was it getting full? How many other apps
were running on that DEA (before vs after)?


Do CGLIB/Javassist have some native dependencies that weren't in
sync between DEAs?

Anyone with a convincing explanation (that does not involve voodoo)
will receive one free beer and a high-five at the next CF Summit!
Wild guess, race condition in the code somewhere?

Dan


--
Regards,

Daniel Jones
EngineerBetter.com

--
Regards,

Daniel Jones
EngineerBetter.com


--
Mit freundlichen Grüßen

Johannes Hiemer


Re: Security group rules to allow HTTP communication between 2 apps deployed on CF

Daniel Mikusa
 

On Sat, Aug 8, 2015 at 2:33 AM, Ahmad Ferdous Bin Alam <
ahmadferdous(a)gmail.com> wrote:

Hi,

I have deployed two node.js (express) applications - App1 and App2 - on a
CF local instance. App2 consumes a service exposed (REST API) by App1. When
App2 receives a request, it needs to communicate with App1. It worked all
good when I tested. Once they are deployed on CF, it didn't work.

It turned out that App2 got error 'connect ECONNREFUSED'.

How are you trying to connect to App1 from App2? If you access App2's URL,
it should work? i.e. app-2.your-cf-domain.com


I thought it might be a security group rule issue that prevented outbound
traffic to App1. So I added a security group allowing all outgoing traffic.
But it didn't help. Now I think it may have to do with inbound traffic rule.

For inbound traffic, the restriction is HTTP, HTTPS & WebSockets. I don't
believe there are any further restrictions.


I searched for documentation as to how inbound traffic rules can be added
but couldn't find.

My questions are:
1) Is it possible at all to have 2 apps deployed on CF communication with
each other over HTTP?
Yes. If you deploy App2 and have it send a request to App1, that should
work as long as you use the URL for App1.


2) Is the security group given below correct? Its purpose is to allow all
outgoing traffic.
This is the group I've used to allow everything. What you've entered looks
OK too.

[
{
"destination": "0.0.0.0-255.255.255.255",
"protocol": "all"
}
]

Don't forget to bind the security group to your space or to the running /
staging groups. Also, I think you need to restart or restage your app so
it's container gets recreated with the new rules.

3) Is there any way we can add inbound traffic 'allow' rules?
Shouldn't be necessary.

Dan




Please help.

Additional info:
- I have CF locally installed as a Vagrant devbox (host Ubuntu 14.04). I
used NISE installer: https://github.com/yudai/cf_nise_installer
- I added the following security group to allow all outgoing traffic. I
bound it to both staging and running security groups and finally restarted
the apps so that the rules get applied.
[
{
"protocol":"tcp",
"destination":"0.0.0.0/0",
"ports":"1-65535"
},
{
"protocol":"udp",
"destination":"0.0.0.0/0",
"ports":"1-65535"
}
]


Re: no more stdout in app files since upgrade to 214

ramonskie
 

i use cf cli 6.12.2 (so the latest)



--
View this message in context: http://cf-dev.70369.x6.nabble.com/no-more-stdout-in-app-files-since-upgrade-to-214-tp1197p1275.html
Sent from the CF Dev mailing list archive at Nabble.com.


Security Question --- Securely wipe data on warden container removal / destruction???

Chris K
 

Hi,

I have a few questions regarding the way data is removed when an application is removed and its corresponding warden container is destroyed. As the Cloud Foundry instance my company is using may be shared with multiple tenants, this is a very critical question for us to be answered.
From Cloud Foundry's GitHub repository I gathered the following information regarding the destruction process:

"When a container is destroyed -- either per user request, or automatically after being idle -- Warden first kills all unprivileged processes running inside the container. These processes first receive a TERM signal followed by a KILL if they haven't exited after a couple of seconds. When these processes have terminated, the root of the container's process tree is sent a KILL . Once all resources the container used have been released, its files are removed and it is considered destroyed." (Quote: https://github.com/cloudfoundry/warden/tree/master/warden)

According to this quote all files of the file system are removed before the resources can be used again. But how are they removed? Are they securely wiped, meaning all blocks are set to zero (or randomized)? And how is data removed from the RAM before it can be assigned to a new warden (i.e. new application).

In case the data is not being securely wiped, how much access does an application have towards the available memory? Is it for example possible to create files of arbitrary size and read / access them?

I'd be thankful for any kind of hints on this topic.

With Regards,
Chris


Re: More reliable way to collect user application logs

Ronak Banka
 

same query for the component logs



--
View this message in context: http://cf-dev.70369.x6.nabble.com/cf-dev-More-reliable-way-to-collect-user-application-logs-tp1214p1273.html
Sent from the CF Dev mailing list archive at Nabble.com.


Re: Bizarre DEA + Spring Behaviour

Johannes Hiemer <jvhiemer@...>
 

Go for it and let's see if we can document this issue afterwards with some
logs for other people.

On Wed, Aug 19, 2015 at 9:55 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Ooh, that's interesting. Coupled with what Guillaume suggested, I can
imagine that being a problem. We did get a NullPointerException logged by
some Spring Security component where we couldn't figure out what could
possibly be null, so it's conceivable that some nested call to
java.util.Random failed and returned null.

Sadly I don't have the logs any more, but this narrative is convincing
enough to make me think it might have been the problem :)

On Wed, Aug 19, 2015 at 8:32 AM, Johannes Hiemer <jvhiemer(a)gmail.com>
wrote:

Daniel,
I have had a problem with the deployment of Spring applications on
Openstack recently as well. I am also not sure, without seeing the logs,
what could be the reason, but did you try:
http://www.evoila.de/vsphere/java-applications-not-starting-on-openstack-based-cloud-foundry-deployment/?lang=en


Regards,
Johannes

On Wed, Aug 19, 2015 at 9:28 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Thanks for the input, that's a good call. A colleague of mine (who is
currently on vacation) did look at that... Sadly he's not around to ask
what he tested.

On Wed, Aug 19, 2015 at 7:06 AM, Guillaume Berche <bercheg(a)gmail.com>
wrote:

Some other "state" on the dea host such has shorteage on /dev/random
(that went away with vm reconstruction but not with dea job restart) ?

Guillaume.
Le 12 août 2015 14:14, "Daniel Mikusa" <dmikusa(a)pivotal.io> a écrit :

It seems like you were pretty thorough. I can't think of anything
that would be different or that could cause symptoms like this, although I
could be overlooking something as well. Without logs / app to try and
replicate I'm not sure I can help much more. Sorry.

Perhaps someone else on the list has some thoughts?

Dan


On Wed, Aug 12, 2015 at 3:25 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi Dan,

Thanks for taking the time to reply.

I didn't include too much in the way of detail, as I was thinking
that there must be a moving part in the equation I'm blind to, in which
case that's a gap in my knowledge that I ought to fill in.

As we did `bosh recreate` on all the VMs, which fixed it` I can't go
back and fetch logs unfortunately. There's no chance of being able to
create a test case as I'm on client's time, so consider this a thought
exercise :)

The app was Spring Boot 1.2.3, pulling in Spring Boot JDBC and Spring
LDAP. Root FS was cflinuxfs2, and the Java buildpack logged the same for
both. On some failing DEAs there were no other apps, on others there were -
it didn't seem to be a factor. All DEAs had plenty of disk space.

I was wondering if there was a race condition, but I assumed Spring
contexts start single-threadedly. Do you know if that's a correct
assumption?

Do you know if there any *things* that could have been different
between the DEAs that I didn't account for? Ie another moving part that's
*not* either release, job, stemcell, droplet, root FS, app
environment?

On Tue, Aug 11, 2015 at 12:32 PM, Daniel Mikusa <dmikusa(a)pivotal.io>
wrote:

On Tue, Aug 11, 2015 at 5:15 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi all,

I've witnessed behaviour caused by the combination of a DEA and a
Spring application that I can't explain. If you like a good mystery or you
happen to know a lot about Java proxies and DEA transient state, please
read on!

A particular Spring app

Version of Spring? What parts of Spring are you pulling into the
app?


was crashing only on specific DEAs in a Cloud Foundry.
Ever try bumping up the log level for Spring when you were getting
the problem? If so, did the problem still occur? Were you able to capture
the logs?



All DEAs were from the same CF release (PCF ERT 1.5.2)
All DEAs were up-to-date according to BOSH (ie no outstanding
changes waiting to be applied)
All DEAs were deployed with identical BOSH job config
All Warden containers were using the same root FS
lucid64 or cflinuxfs2? or didn't matter?


The droplet was the same across all DEAs
The droplet version was the same
The droplet tarballs all had the same MD5 checksum
What was the output of the Java build pack when the droplet was
created? or better yet, run `cf files <app> app/.java-buildpack.log` and
include the output.


Warden was providing the exact same env and start command to all
containers
I saw the same behaviour repeat itself across 5 completely separate
Cloud Foundry installations

The crash was Spring not being able to autowire a bean, where it
was referenced by implementation rather than interface (yes, I know, but it
was not my code!).

Any chance you could include logs from the crash? Was there an
exception / stacktrace generated? Alternatively, have you been able to
create a simple test app that replicates the behavior?


There was some Javassist/CGLIB action going on, creating proxies
for the sake of transaction management.

Rebooting the troublesome DEAs did not fix the problem.

Doing a `bosh recreate` did reliably fix the problem.

Alternatively, changing the Spring code to wire by interface also
reliably fixed the problem.

I can't understand why different DEA instances, from the same BOSH
release, with the same config, on the same stemcell, running the same
version of Warden, with the same droplet, and the same root FS, and the
same env, and the same start command, yielded different behaviour. I'm even
further confused as to why a `bosh recreate` changed that behaviour. What
could possibly have changed? Something on ephemeral disk? But what else is
there on ephemeral disk that could have mattered and was likely to have
changed?

How much was on the disk? Was it getting full? How many other apps
were running on that DEA (before vs after)?


Do CGLIB/Javassist have some native dependencies that weren't in
sync between DEAs?

Anyone with a convincing explanation (that does not involve voodoo)
will receive one free beer and a high-five at the next CF Summit!
Wild guess, race condition in the code somewhere?

Dan


--
Regards,

Daniel Jones
EngineerBetter.com

--
Regards,

Daniel Jones
EngineerBetter.com


--
Mit freundlichen Grüßen

Johannes Hiemer


--
Regards,

Daniel Jones
EngineerBetter.com


--
Mit freundlichen Grüßen

Johannes Hiemer


Re: Bizarre DEA + Spring Behaviour

Daniel Jones
 

Ooh, that's interesting. Coupled with what Guillaume suggested, I can
imagine that being a problem. We did get a NullPointerException logged by
some Spring Security component where we couldn't figure out what could
possibly be null, so it's conceivable that some nested call to
java.util.Random failed and returned null.

Sadly I don't have the logs any more, but this narrative is convincing
enough to make me think it might have been the problem :)

On Wed, Aug 19, 2015 at 8:32 AM, Johannes Hiemer <jvhiemer(a)gmail.com> wrote:

Daniel,
I have had a problem with the deployment of Spring applications on
Openstack recently as well. I am also not sure, without seeing the logs,
what could be the reason, but did you try:
http://www.evoila.de/vsphere/java-applications-not-starting-on-openstack-based-cloud-foundry-deployment/?lang=en


Regards,
Johannes

On Wed, Aug 19, 2015 at 9:28 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Thanks for the input, that's a good call. A colleague of mine (who is
currently on vacation) did look at that... Sadly he's not around to ask
what he tested.

On Wed, Aug 19, 2015 at 7:06 AM, Guillaume Berche <bercheg(a)gmail.com>
wrote:

Some other "state" on the dea host such has shorteage on /dev/random
(that went away with vm reconstruction but not with dea job restart) ?

Guillaume.
Le 12 août 2015 14:14, "Daniel Mikusa" <dmikusa(a)pivotal.io> a écrit :

It seems like you were pretty thorough. I can't think of anything that
would be different or that could cause symptoms like this, although I could
be overlooking something as well. Without logs / app to try and replicate
I'm not sure I can help much more. Sorry.

Perhaps someone else on the list has some thoughts?

Dan


On Wed, Aug 12, 2015 at 3:25 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi Dan,

Thanks for taking the time to reply.

I didn't include too much in the way of detail, as I was thinking that
there must be a moving part in the equation I'm blind to, in which case
that's a gap in my knowledge that I ought to fill in.

As we did `bosh recreate` on all the VMs, which fixed it` I can't go
back and fetch logs unfortunately. There's no chance of being able to
create a test case as I'm on client's time, so consider this a thought
exercise :)

The app was Spring Boot 1.2.3, pulling in Spring Boot JDBC and Spring
LDAP. Root FS was cflinuxfs2, and the Java buildpack logged the same for
both. On some failing DEAs there were no other apps, on others there were -
it didn't seem to be a factor. All DEAs had plenty of disk space.

I was wondering if there was a race condition, but I assumed Spring
contexts start single-threadedly. Do you know if that's a correct
assumption?

Do you know if there any *things* that could have been different
between the DEAs that I didn't account for? Ie another moving part that's
*not* either release, job, stemcell, droplet, root FS, app
environment?

On Tue, Aug 11, 2015 at 12:32 PM, Daniel Mikusa <dmikusa(a)pivotal.io>
wrote:

On Tue, Aug 11, 2015 at 5:15 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi all,

I've witnessed behaviour caused by the combination of a DEA and a
Spring application that I can't explain. If you like a good mystery or you
happen to know a lot about Java proxies and DEA transient state, please
read on!

A particular Spring app

Version of Spring? What parts of Spring are you pulling into the app?


was crashing only on specific DEAs in a Cloud Foundry.
Ever try bumping up the log level for Spring when you were getting
the problem? If so, did the problem still occur? Were you able to capture
the logs?



All DEAs were from the same CF release (PCF ERT 1.5.2)
All DEAs were up-to-date according to BOSH (ie no outstanding
changes waiting to be applied)
All DEAs were deployed with identical BOSH job config
All Warden containers were using the same root FS
lucid64 or cflinuxfs2? or didn't matter?


The droplet was the same across all DEAs
The droplet version was the same
The droplet tarballs all had the same MD5 checksum
What was the output of the Java build pack when the droplet was
created? or better yet, run `cf files <app> app/.java-buildpack.log` and
include the output.


Warden was providing the exact same env and start command to all
containers
I saw the same behaviour repeat itself across 5 completely separate
Cloud Foundry installations

The crash was Spring not being able to autowire a bean, where it was
referenced by implementation rather than interface (yes, I know, but it was
not my code!).

Any chance you could include logs from the crash? Was there an
exception / stacktrace generated? Alternatively, have you been able to
create a simple test app that replicates the behavior?


There was some Javassist/CGLIB action going on, creating proxies for
the sake of transaction management.

Rebooting the troublesome DEAs did not fix the problem.

Doing a `bosh recreate` did reliably fix the problem.

Alternatively, changing the Spring code to wire by interface also
reliably fixed the problem.

I can't understand why different DEA instances, from the same BOSH
release, with the same config, on the same stemcell, running the same
version of Warden, with the same droplet, and the same root FS, and the
same env, and the same start command, yielded different behaviour. I'm even
further confused as to why a `bosh recreate` changed that behaviour. What
could possibly have changed? Something on ephemeral disk? But what else is
there on ephemeral disk that could have mattered and was likely to have
changed?

How much was on the disk? Was it getting full? How many other apps
were running on that DEA (before vs after)?


Do CGLIB/Javassist have some native dependencies that weren't in
sync between DEAs?

Anyone with a convincing explanation (that does not involve voodoo)
will receive one free beer and a high-five at the next CF Summit!
Wild guess, race condition in the code somewhere?

Dan


--
Regards,

Daniel Jones
EngineerBetter.com

--
Regards,

Daniel Jones
EngineerBetter.com


--
Mit freundlichen Grüßen

Johannes Hiemer
--
Regards,

Daniel Jones
EngineerBetter.com


Re: Bizarre DEA + Spring Behaviour

Johannes Hiemer <jvhiemer@...>
 

Daniel,
I have had a problem with the deployment of Spring applications on
Openstack recently as well. I am also not sure, without seeing the logs,
what could be the reason, but did you try:
http://www.evoila.de/vsphere/java-applications-not-starting-on-openstack-based-cloud-foundry-deployment/?lang=en


Regards,
Johannes

On Wed, Aug 19, 2015 at 9:28 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Thanks for the input, that's a good call. A colleague of mine (who is
currently on vacation) did look at that... Sadly he's not around to ask
what he tested.

On Wed, Aug 19, 2015 at 7:06 AM, Guillaume Berche <bercheg(a)gmail.com>
wrote:

Some other "state" on the dea host such has shorteage on /dev/random
(that went away with vm reconstruction but not with dea job restart) ?

Guillaume.
Le 12 août 2015 14:14, "Daniel Mikusa" <dmikusa(a)pivotal.io> a écrit :

It seems like you were pretty thorough. I can't think of anything that
would be different or that could cause symptoms like this, although I could
be overlooking something as well. Without logs / app to try and replicate
I'm not sure I can help much more. Sorry.

Perhaps someone else on the list has some thoughts?

Dan


On Wed, Aug 12, 2015 at 3:25 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi Dan,

Thanks for taking the time to reply.

I didn't include too much in the way of detail, as I was thinking that
there must be a moving part in the equation I'm blind to, in which case
that's a gap in my knowledge that I ought to fill in.

As we did `bosh recreate` on all the VMs, which fixed it` I can't go
back and fetch logs unfortunately. There's no chance of being able to
create a test case as I'm on client's time, so consider this a thought
exercise :)

The app was Spring Boot 1.2.3, pulling in Spring Boot JDBC and Spring
LDAP. Root FS was cflinuxfs2, and the Java buildpack logged the same for
both. On some failing DEAs there were no other apps, on others there were -
it didn't seem to be a factor. All DEAs had plenty of disk space.

I was wondering if there was a race condition, but I assumed Spring
contexts start single-threadedly. Do you know if that's a correct
assumption?

Do you know if there any *things* that could have been different
between the DEAs that I didn't account for? Ie another moving part that's
*not* either release, job, stemcell, droplet, root FS, app environment?

On Tue, Aug 11, 2015 at 12:32 PM, Daniel Mikusa <dmikusa(a)pivotal.io>
wrote:

On Tue, Aug 11, 2015 at 5:15 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi all,

I've witnessed behaviour caused by the combination of a DEA and a
Spring application that I can't explain. If you like a good mystery or you
happen to know a lot about Java proxies and DEA transient state, please
read on!

A particular Spring app

Version of Spring? What parts of Spring are you pulling into the app?


was crashing only on specific DEAs in a Cloud Foundry.
Ever try bumping up the log level for Spring when you were getting the
problem? If so, did the problem still occur? Were you able to capture the
logs?



All DEAs were from the same CF release (PCF ERT 1.5.2)
All DEAs were up-to-date according to BOSH (ie no outstanding changes
waiting to be applied)
All DEAs were deployed with identical BOSH job config
All Warden containers were using the same root FS
lucid64 or cflinuxfs2? or didn't matter?


The droplet was the same across all DEAs
The droplet version was the same
The droplet tarballs all had the same MD5 checksum
What was the output of the Java build pack when the droplet was
created? or better yet, run `cf files <app> app/.java-buildpack.log` and
include the output.


Warden was providing the exact same env and start command to all
containers
I saw the same behaviour repeat itself across 5 completely separate
Cloud Foundry installations

The crash was Spring not being able to autowire a bean, where it was
referenced by implementation rather than interface (yes, I know, but it was
not my code!).

Any chance you could include logs from the crash? Was there an
exception / stacktrace generated? Alternatively, have you been able to
create a simple test app that replicates the behavior?


There was some Javassist/CGLIB action going on, creating proxies for
the sake of transaction management.

Rebooting the troublesome DEAs did not fix the problem.

Doing a `bosh recreate` did reliably fix the problem.

Alternatively, changing the Spring code to wire by interface also
reliably fixed the problem.

I can't understand why different DEA instances, from the same BOSH
release, with the same config, on the same stemcell, running the same
version of Warden, with the same droplet, and the same root FS, and the
same env, and the same start command, yielded different behaviour. I'm even
further confused as to why a `bosh recreate` changed that behaviour. What
could possibly have changed? Something on ephemeral disk? But what else is
there on ephemeral disk that could have mattered and was likely to have
changed?

How much was on the disk? Was it getting full? How many other apps
were running on that DEA (before vs after)?


Do CGLIB/Javassist have some native dependencies that weren't in sync
between DEAs?

Anyone with a convincing explanation (that does not involve voodoo)
will receive one free beer and a high-five at the next CF Summit!
Wild guess, race condition in the code somewhere?

Dan


--
Regards,

Daniel Jones
EngineerBetter.com

--
Regards,

Daniel Jones
EngineerBetter.com


--
Mit freundlichen Grüßen

Johannes Hiemer

8121 - 8140 of 9429