Date   

Re: Addressing buildpack size

Daniel Mikusa
 

On Fri, May 8, 2015 at 3:09 PM, Mike Dalessio <mdalessio(a)pivotal.io> wrote:

Hey Dan,


On Tue, May 5, 2015 at 1:33 PM, Daniel Mikusa <dmikusa(a)pivotal.io> wrote:

I'm happy to see the size of the build packs dropping, but I have to ask
why do we bundle the build packs with a fixed set of binaries?

The build packs themselves are very small, it's the binaries that are
huge. It seems like it would make sense to handle them as separate
concerns.
You've nailed it. Yes, it makes a ton of sense to handle binaries as
separate concerns, and we're heading in that direction.

At one point very recently, we started doing some planning around how we
might cache buildpack assets in a structured way (like a blob store) and
seamlessly have everything Just Work™.

The first step towards separating these concerns was to extract the use of
dependencies out of the (generally upstream) buildpack code and into a
buildpack manifest file. Having done that, the dependencies are now
first-class artifacts that can be managed by operators.

We stopped there, at least for the time being, as it's not terribly clear
how to jam buildpack asset caching into the current API, CC buildpack
model, and staging process (though, again, the manifest is the best first
step, as it enables us to trap network calls and thus redirect them to a
cache either on disk or over the network).

It's also quite possible that the remaining pain will be further
ameliorated by the proposed Diego feature to attach persistent disk (on
which, presumably, the buildpacks and their assets are cached), which means
we're deferring further work until we've got more user feedback and data.

This sounds cool. Can't wait to see what you guys come up with here. I've
been thinking about the subject a bit, but haven't come up with any great
ideas.

The first thought that came to mind was a transparent network proxy, like
Squid, which would just automatically cache the files as they're accessed.
It's nice and simple, nothing with the build pack would need to change or
be altered to take advantage of it, but I'm not sure how that would work in
a completely offline environments as I'm not sure how you'd seed the cache.

Another thought was for the DEA to provided some additional hints to the
build packs about how they could locate binaries. Perhaps a special
environment variable like CF_BP_REPO=http://repo.system.domain/. The build
pack could then take that and use it to generate URLs to it's binary
resources. A variation on that would be to check this repo first, and then
fall back to some global external repo if available (i.e. most recent stuff
is on CF_BP_REPO, older stuff needs Internet access to download). Yet
another variation would be for the CF_BP_REPO to start small and grow as
things are requested. For example, if you request a file that doesn't
exist CF_BP_REPO would try to download it from the Internet, cache it and
stream it back to the app.

Anyway, I'm just thinking out loud now. Thanks for the update!

Dan






I don't want to come off too harsh, but in addition to the size of the
build packs when bundled with binaries, there are some other disadvantages
to doing things this way.

- Binaries and build packs are updated at different rates. Binaries
are usually updated often, to pick up new runtimes versions & security
fixes; build packs are generally changed at a slower pace, as features or
bug fixes for them are needed. Bundling the two together, requires an
operator to update the build packs more often, just to get updated
binaries. It's been my experience that users don't (or forget) to update
build packs which means they're likely running with older, possibly
insecure runtimes.

- It's difficult to bundle a set of runtime binaries that suite
everyone's needs, different users will update at different rates and will
want different sets of binaries. If build packs and binaries are packaged
together, users will end up needing to find a specific build pack bundle
that contains the runtime they want or users will need to build their own
custom bundles. If build packs and binaries are handled separately, there
will be more flexibility in what binaries a build pack has available as an
operator can manage binaries independently. Wayne's post seems to hit on
this point.

- At some point, I think this has already happened (jruby & java),
build packs are going to start having overlapping sets of binaries. If the
binaries are bundled with the build pack, there's no way that build packs
could ever share binaries.

My personal preference would be to see build packs bundled without
binaries and some other solution, which probably merits a separate thread,
for managing the binaries.

I'm curious to hear what others think or if I've missed something and
bundling build packs and binaries is clearly the way to go.

Dan

PS. If this is something that came up in the PMC, I apologize. I
skimmed the notes, but may have missed it.



On Mon, May 4, 2015 at 2:10 PM, Wayne E. Seguin <
wayneeseguin(a)starkandwayne.com> wrote:

Because of very good compatibility between versions (post 1.X) I would
like to make a motion to do the following:

Split the buildpack:

have the default golang buildpack track the latest golang version

Then handle older versions in one of two ways, either:

a) have a large secondary for older versions

or

b) have multiple, one for each version of golang, users can specify a
specific URL if they care about specific versions.

This would improve space/time considerations for operations. Personally
I would prefer b) because it allows you to enable supporting older go
versions out of the box by design but still keeping each golang buildpack
small.

~Wayne

Wayne E. Seguin <wayneeseguin(a)starkandwayne.com>
CTO ; Stark & Wayne, LLC

On May 4, 2015, at 12:40 , Mike Dalessio <mdalessio(a)pivotal.io> wrote:

Hi Wayne,

On Fri, May 1, 2015 at 1:29 PM, Wayne E. Seguin <
wayneeseguin(a)starkandwayne.com> wrote:

What an incredible step in the right direction, Awesome!!!

Out of curiosity, why is the go buildpack still quite so large?
Thanks for asking this question.

Currently we're including the following binary dependencies in
`go-buildpack`:

```
cache $ ls -lSh *_go*
-rw-r--r-- 1 flavorjones flavorjones 60M 2015-05-04 12:36
https___storage.googleapis.com_golang_go1.4.2.linux-amd64.tar.gz
-rw-r--r-- 1 flavorjones flavorjones 60M 2015-05-04 12:36
https___storage.googleapis.com_golang_go1.4.1.linux-amd64.tar.gz
-rw-r--r-- 1 flavorjones flavorjones 54M 2015-05-04 12:36
https___storage.googleapis.com_golang_go1.2.2.linux-amd64.tar.gz
-rw-r--r-- 1 flavorjones flavorjones 54M 2015-05-04 12:36
http___go.googlecode.com_files_go1.2.1.linux-amd64.tar.gz
-rw-r--r-- 1 flavorjones flavorjones 51M 2015-05-04 12:36
https___storage.googleapis.com_golang_go1.3.3.linux-amd64.tar.gz
-rw-r--r-- 1 flavorjones flavorjones 51M 2015-05-04 12:36
https___storage.googleapis.com_golang_go1.3.2.linux-amd64.tar.gz
-rw-r--r-- 1 flavorjones flavorjones 40M 2015-05-04 12:36
http___go.googlecode.com_files_go1.1.2.linux-amd64.tar.gz
-rw-r--r-- 1 flavorjones flavorjones 40M 2015-05-04 12:36
http___go.googlecode.com_files_go1.1.1.linux-amd64.tar.gz
```

One question we should ask, I think, is: should we still be supporting
golang 1.1 and 1.2? Dropping those versions would cut the size of the
buildpack in (approximately) half.





On May 1, 2015, at 11:54 , Mike Dalessio <mdalessio(a)pivotal.io> wrote:

Skinny buildpacks have been cut for go, nodejs, php, python and ruby
buildpacks.

| | current | previous |
|--------+---------+----------|
| go | 442MB | 633MB |
| nodejs | 69MB | 417MB |
| php | 804MB | 1.1GB |
| python | 454MB | 654MB |
| ruby | 365MB | 1.3GB |
|--------+---------+----------|
| total | 2.1GB | 4.1GB |

for an aggregate 51% reduction in size. Details follow.
Next Steps

I recognize that every cloud operator may have a different policy on
what versions of interpreters and libraries they want to support, based on
the specific requirements of their users.

These buildpacks reflect a "bare mininum" policy for a cloud to be
operable, and I do not expect these buildpacks to be adopted as-is by many
operators.

These buildpacks have not yet been added to cf-release, specifically
so that the community can prepare their own buildpacks if necessary.

Over the next few days, the buildpacks core team will ship
documentation and tooling to assist you in packaging specific dependencies
for your instance of CF. I'll start a new thread on this list early next
week to communicate this information.
Call to Action

In the meantime, please think about whether the policy implemented in
these buildpacks ("last two patches (or teenies) on all supported
major.minor releases") is suitable for your users; and if not, think about
what dependencies you'll ideally be supporting.
go-buildpack v1.3.0

Release notes are here
<https://github.com/cloudfoundry/go-buildpack/releases/tag/v1.3.0>.

Size reduced 30% from 633MB
<https://github.com/cloudfoundry/go-buildpack/releases/tag/v1.2.0> to
442MB
<https://github.com/cloudfoundry/go-buildpack/releases/tag/v1.3.0>.

Supports (full manifest here
<https://github.com/cloudfoundry/go-buildpack/blob/v1.3.0/manifest.yml>
):

- golang 1.4.{1,2}
- golang 1.3.{2,3}
- golang 1.2.{1,2}
- golang 1.1.{1,2}

nodejs-buildpack v1.3.0

Full release notes are here
<https://github.com/cloudfoundry/nodejs-buildpack/releases/tag/v1.3.0>.

Size reduced 83% from 417MB
<https://github.com/cloudfoundry/nodejs-buildpack/releases/tag/v1.2.1>
to 69MB
<https://github.com/cloudfoundry/nodejs-buildpack/releases/tag/v1.3.0>.

Supports (full manifest here
<https://github.com/cloudfoundry/nodejs-buildpack/blob/v1.3.0/manifest.yml>
):

- 0.8.{27,28}
- 0.9.{11,12}
- 0.10.{37,38}
- 0.11.{15,16}
- 0.12.{1,2}

php-buildpack v3.2.0

Full release notes are here
<https://github.com/cloudfoundry/php-buildpack/releases/tag/v3.2.0>.

Size reduced 27% from 1.1GB
<https://github.com/cloudfoundry/php-buildpack/releases/tag/v3.1.1> to
803MB
<https://github.com/cloudfoundry/php-buildpack/releases/tag/v3.2.0>.

Supports: (full manifest here
<https://github.com/cloudfoundry/php-buildpack/blob/v3.2.0/manifest.yml>
)

*PHP*:

- 5.6.{6,7}
- 5.5.{22,23}
- 5.4.{38,39}

*HHVM* (lucid64 stack):

- 3.2.0

*HHVM* (cflinuxfs2 stack):

- 3.5.{0,1}
- 3.6.{0,1}

*Apache HTTPD*:

- 2.4.12

*nginx*:

- 1.7.10
- 1.6.2
- 1.5.13

python-buildpack v1.3.0

Full release notes are here
<https://github.com/cloudfoundry/python-buildpack/releases/tag/v1.3.0>.

Size reduced 30% from 654MB
<https://github.com/cloudfoundry/python-buildpack/releases/tag/v1.2.0>
to 454MB
<https://github.com/cloudfoundry/python-buildpack/releases/tag/v1.3.0>.

Supports: (full manifest here
<https://github.com/cloudfoundry/python-buildpack/blob/v1.3.0/manifest.yml>
)

- 2.7.{8,9}
- 3.2.{4,5}
- 3.3.{5,6}
- 3.4.{2,3}

ruby-buildpack v1.4.0

Release notes are here
<https://github.com/cloudfoundry/ruby-buildpack/releases/tag/v1.4.0>.

Size reduced 71% from 1.3GB
<https://github.com/cloudfoundry/ruby-buildpack/releases/tag/v1.3.1>
to 365MB
<https://github.com/cloudfoundry/ruby-buildpack/releases/tag/v1.4.0>.

Supports: (full manifest here
<https://github.com/cloudfoundry/ruby-buildpack/blob/v1.4.0/manifest.yml>
)

*MRI*:

- 2.2.{1,2}
- 2.1.{5,6}
- 2.0.0p645

*JRuby*:

- ruby-1.9.3-jruby-1.7.19
- ruby-2.0.0-jruby-1.7.19
- ruby-2.2.0-jruby-9.0.0.0.pre1


---------- Forwarded message ----------
From: Mike Dalessio <mdalessio(a)pivotal.io>
Date: Wed, Apr 8, 2015 at 11:10 AM
Subject: Addressing buildpack size
To: vcap-dev(a)cloudfoundry.org


Hello vcap-dev!

This email details a proposed change to how Cloud Foundry buildpacks
are packaged, with respect to the ever-increasing number of binary
dependencies being cached within them.

This proposal's permanent residence is here:

https://github.com/cloudfoundry-incubator/buildpack-packager/issues/4

Feel free to comment there or reply to this email.
------------------------------
Buildpack SizesWhere we are today

Many of you have seen, and possibly been challenged by, the enormous
sizes of some of the buildpacks that are currently shipping with cf-release.

Here's the state of the world right now, as of v205:

php-buildpack: 1.1G
ruby-buildpack: 922M
go-buildpack: 675M
python-buildpack: 654M
nodejs-buildpack: 403M
----------------------
total: 3.7G

These enormous sizes are the result of the current policy of packaging
every-version-of-everything-ever-supported ("EVOEES") within the buildpack.

Most recently, this problem was exacerbated by the fact that buildpacks
now contain binaries for two rootfses.
Why this is a problem

If continued, buildpacks will only continue to increase in size,
leading to longer and longer build and deploy times, longer test times,
slacker feedback loops, and therefore less frequent buildpack releases.

Additionally, this also means that we're shipping versions of
interpreters, web servers, and libraries that are deprecated, insecure, or
both. Feedback from CF users has made it clear that many companies view
this as an unnecessary security risk.

This policy is clearly unsustainable.
What we can do about it

There are many things being discussed to ameliorate the impact that
buildpack size is having on the operations of CF.

Notably, Onsi has proposed a change to buildpack caching, to improve
Diego staging times (link to proposal
<https://github.com/pivotal-cf-experimental/diego-dev-notes/blob/master/proposals/better-buildpack-caching.md>
).

However, there is an immediate solution available, which addresses both
the size concerns as well as the security concern: packaging fewer binary
dependencies within the buildpack.
The proposal

I'm proposing that we reduce the binary dependencies in each buildpack
in a very specific way.

Aside on terms I'll use below:

- Versions of the form "1.2.3" are broken down as:
MAJOR.MINOR.TEENY. Many language ecosystems refer to the "TEENY" as "PATCH"
interchangeably, but we're going to use "TEENY" in this proposal.
- We'll assume that TEENY gets bumped for API/ABI compatible
changes.
- We'll assume that MINOR and MAJOR get bumped when there are
API/ABI *incompatible* changes.

I'd like to move forward soon with the following changes:

1. For language interpreters/compilers, we'll package the two
most-recent TEENY versions on each MAJOR.MINOR release.
2. For all other dependencies, we'll package only the single
most-recent TEENY version on each MAJOR.MINOR release.
3. We will discontinue packaging versions of dependencies that have
been deprecated.
4. We will no longer provide "EVOEES" buildpack releases.
5. We will no longer provide "online" buildpack releases, which
download dependencies from the public internet.
6. We will document the process, and provide tooling, for CF
operators to build their own buildpacks, choosing the dependencies that
their organization wants to support or creating "online" buildpacks at
operators' discretion.

An example for #1 is that we'll go from packaging 34 versions of node v0.10.x
to only packaging two: 0.10.37 and 0.10.38.

An example for #2 is that we'll go from packaging 3 versions of nginx 1.5
in the PHP buildpack to only packaging one: 1.5.12.

An example for #3 is that we'll discontinue packaging ruby 1.9.3 in the
ruby-buildpack, which reached end-of-life in February 2015.
Outcomes

With these changes, the total buildpack size will be reduced greatly.
As an example, we expect the ruby-buildpack size to go from 922M to 338M.

We also want to set the expectation that, as new interpreter versions
are released, either for new features or (more urgently) for security
fixes, we'll release new buildpacks much more quickly than we do today. My
hope is that we'll be able to do it within 24 hours of a new release.
Planning

These changes will be relatively easy to make, since all the buildpacks
are now using a manifest.yml file to declare what's being packaged. We
expect to be able to complete this work within the next two weeks.

Stories are in the Tracker backlog under the Epic named
"skinny-buildpacks", which you can see here:

https://www.pivotaltracker.com/epic/show/1747328

------------------------------

Please let me know how these changes will impact you and your
organizations, and let me know of any counter-proposals or variations you'd
like to consider.

Thanks,

-mike



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev



_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Runtime PMC: 2015-05-19 Notes

Eric Malm <emalm@...>
 

Hi, all,

The Runtime PMC met on Tuesday, 2015-05-19. Permanent notes are available
at:

https://github.com/cloudfoundry/pmc-notes/blob/master/Runtime/2015-05-19-runtime.md

and are included below.

Best,
Eric

---

*# Runtime PMC Meeting 2015-05-19*

*## Agenda*

1. Current Backlog and Priorities
1. PMC Lifecycle Activities
1. Open Discussion


*## Attendees*

* Chip Childers, Cloud Foundry Foundation
* Matt Sykes, IBM
* Atul Kshirsagar, GE
* Erik Jasiak, Pivotal
* Sree Tummidi, Pivotal
* Eric Malm, Pivotal
* Shannon Coen, Pivotal
* Will Pragnell, Pivotal
* Marco Nicosia, Pivotal


*## Current Backlog and Priorities*

*### Runtime*

* Shannon filling in for Dieu this week
* support for context-based routing; delivered
* investigating query performance
* addressing outstanding pull requests
* bump to UAA
* issues with loggregator in acceptance environment, blocker to cutting
stabilization release for collector


*### Diego*

* ssh access largely done, currently working routing ssh traffic to proxy
* performance breadth: completed 50 cell test, investigating bulk
processing in jobs that do so
* refining CI to improve recording compatible versions of Diego and CF
* processing of PRs from Garden and Lattice are prioritized
* Stories queued up to investigate securing identified gaps in Diego


*### UAA*

* 2.2.6, 2.3.0 releases, notes available
* upgraded Spring versions
* update to JRE expected in v210 of cf-release
* more LDAP work, chaining in identity zone: both LDAP and internal
authentication can work simultaneously
* support for New Relic instrumentation, will appear after v209
* upcoming:
* risk assessment of persistent token storage: understand performance
implications
* starting work on password policy: multi-tenant for default zone and
additional zones
* OAuth client groups: authorization to manage clients
* SAML support
* question from Matt Sykes:
* would like to discuss IBM PR for UAA DB migration strategy with the team


*### Garden*

* investigating management of disk quotas
* replacing C/Bash code with Go to enable instrumentation, security, and
maintainability
* planning to remove default VCAP user in Garden


*### Lattice*

* nearly done with last stories before releasing 0.2.5
* Cisco contributed openstack support
* baking deployment automation into published images on some providers
* improved documentation for how to install lattice on VMs
* next work planned is support for CF-like app lifecycle management
(pushing code in addition to docker)


*### TCP Router*

* building out icebox to reflect inception
* question from Matt Sykes:
* how to incorporate new project into PMC? IBM parties surprised with
announcement at Summit
* Chip: inconsistent policy so far; maybe this belongs alongside gorouter
in Runtime PMC
* working on process for review, discussion of incubating project
* Shannon: first step will be to produce proposal, discuss with community


*### LAMB*

* big rewind project on datadog firehose nozzle: limitation in doppler
about size of messages, dropping messages
* working to resolve those problems: improving number of concurrent reads,
marshaling efficiency
* seeing increases in message loss in Runtime environments: may be other
source of contention, working with them to resolve
* Datadog nozzle work:
* looking at developing a Graphite nozzle from community work
* will investigate community interest in Graphite support
* naming alignment from loggregator to doppler
* instrumentation of statsd for larger message sizes, work to phase out
collector and NATS in CF
* goal is to stream metrics directly to firehose
* question from Matt Sykes: story about protobuf protocol proposal
* best way to support vm tagging in log messages: distinguish between types
of data in log messages
* goal would be to improve the implementation: more generic API for message
data; understand implications of this change


*### Greenhouse*

* Accepted code from HP
* will get support from Microsoft with regard to interest in entire
Microsoft stack


*## PMC Lifecycle Activities*

None to report.

*## Open Discussion*

None to report.


Re: cf-release v209 published

Simon Johansson <simon@...>
 

i wanted to share the great news that the new skinny buildpacks reduced
the size of cf-release from 5.2gb -> 3.5gb!

This is great news, good job buildpack team!

On Thu, May 21, 2015 at 4:40 PM, James Bayer <jbayer(a)pivotal.io> wrote:

skinny buildpacks refer to each buildpack no longer shipping old
unsupported or insecure versions of runtimes. you can still customize the
buildpacks to include older runtimes by building the buildpack yourself
with the admin buildpack feature.


On Thu, May 21, 2015 at 7:34 AM, Long Nguyen <long.nguyen11288(a)gmail.com>
wrote:

Wooot! Thanks James. Is skinny just having latest 2 version of language?


On May 21, 2015 at 1:42:11 AM, James Bayer (jbayer(a)pivotal.io) wrote:

more info is coming soon, we don't have all of the release notes
published yet because dieu and shannon are out of the office. i wanted to
share the great news that the new skinny buildpacks reduced the size of
cf-release from 5.2gb -> 3.5gb!

the "what's in the deploy" file is awaiting approval b/c of mailman
limits.

thanks buildpacks team!

--
Thank you,

James Bayer
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Thank you,

James Bayer

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Question about services on Cloud Foundry

Kinjal Doshi
 

Hi,

From the architecture point of view I understand that there are no service
explicitly associated with CF.

However, the following doc is very confusing:
http://docs.cloudfoundry.org/devguide/services/managed.html

Would be great if some one can explain the meaning of manages services her.

Thanks,
Kinjal


Re: List Reply-To behavior

James Bayer
 

yes, this has affected me

On Fri, May 22, 2015 at 4:33 AM, Daniel Mikusa <dmikusa(a)pivotal.io> wrote:



On Fri, May 22, 2015 at 6:22 AM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

The vcap-dev list used to use a Reply-To header pointing back to the list
such that replying to a post would automatically go back to the list. The
current mailman configuration for cf-dev does not set a Reply-To header and
the default behavior is to reply to the author.

While I understand the pros and cons of setting the Reply-To header, this
new behavior has bitten me several times and I've found myself re-posting a
response to the list instead of just the author.

I'm interested in knowing if anyone else has been bitten by this behavior
and would like a Reply-To header added back...
+1 and +1

Dan



Thanks.

--
Matthew Sykes
matthew.sykes(a)gmail.com

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

--
Thank you,

James Bayer


Re: List Reply-To behavior

Daniel Mikusa
 

On Fri, May 22, 2015 at 6:22 AM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

The vcap-dev list used to use a Reply-To header pointing back to the list
such that replying to a post would automatically go back to the list. The
current mailman configuration for cf-dev does not set a Reply-To header and
the default behavior is to reply to the author.

While I understand the pros and cons of setting the Reply-To header, this
new behavior has bitten me several times and I've found myself re-posting a
response to the list instead of just the author.

I'm interested in knowing if anyone else has been bitten by this behavior
and would like a Reply-To header added back...
+1 and +1

Dan



Thanks.

--
Matthew Sykes
matthew.sykes(a)gmail.com

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


List Reply-To behavior

Matthew Sykes <matthew.sykes@...>
 

The vcap-dev list used to use a Reply-To header pointing back to the list
such that replying to a post would automatically go back to the list. The
current mailman configuration for cf-dev does not set a Reply-To header and
the default behavior is to reply to the author.

While I understand the pros and cons of setting the Reply-To header, this
new behavior has bitten me several times and I've found myself re-posting a
response to the list instead of just the author.

I'm interested in knowing if anyone else has been bitten by this behavior
and would like a Reply-To header added back...

Thanks.

--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: container cannot communicate with the host

Matthew Sykes <matthew.sykes@...>
 

Warden explicitly disables access to the container host. If you move up to
a more recent level of cf-release, that behavior is configurable with the
`allow_host_access` flag. When that flag is true, this line is skipped:

https://github.com/cloudfoundry/warden/blob/4f1e5c049a12199fdd1f29cde15c9a786bd5fac8/warden/root/linux/net.sh#L128

At the level you're at, that rule is always specified so you'd have to
manually change it.

https://github.com/cloudfoundry/warden/blob/17f34e2d7ff1994856a61961210a82e83f24ecac/warden/root/linux/net.sh#L124

On Fri, May 22, 2015 at 3:21 AM, Youzhi Zhu <zhuyouzhi03(a)gmail.com> wrote:

Hi all

I have an app A and a service B, service B is running on the dea
server(ip 10.0.0.254), app A need to connect with service B through tcp, it
works normally in my LAN, but when I push A to cf, it cannot connect to B,
then I execute bin/wsh to get into the container and ping the host ip,
it's unreachable, as below:







*root(a)18mkbd9n808:~# ping 10.0.0.254PING 10.0.0.254 (10.0.0.254) 56(84)
bytes of data.From 10.0.0.254 icmp_seq=1 Destination Port UnreachableFrom
10.0.0.254 icmp_seq=2 Destination Port Unreachable^C--- 10.0.0.254 ping
statistics ---2 packets transmitted, 0 received, +2 errors, 100% packet
loss, time 1002ms*

But if I ping another host in the LAN. it can be reached!!!








*root(a)18mkbd9n808:~# ping 10.0.0.253PING 10.0.0.253 (10.0.0.253) 56(84)
bytes of data.64 bytes from 10.0.0.253 <http://10.0.0.253>: icmp_seq=1
ttl=63 time=1.60 ms64 bytes from 10.0.0.253 <http://10.0.0.253>: icmp_seq=2
ttl=63 time=0.421 ms^C--- 10.0.0.253 ping statistics ---2 packets
transmitted, 2 received, 0% packet loss, time 1001msrtt min/avg/max/mdev =
0.421/1.013/1.606/0.593 ms*

It's wired!!! my cf-release is cf-175 and I have only one dea server.Does
anyone met this situation before? thanks!

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: container cannot communicate with the host

Lev Berman <lev.berman@...>
 

As far as I know, it is so by design - in order to setup a connection to
the same host you need to explicitly tell Warden to allow external traffic
-
https://github.com/cloudfoundry/warden/blob/master/warden/README.md#net-handle-out-addressmaskport
.

In more details:

1) ssh into your VM with DEA
2) find your Warden handle in /var/vcap/data/dea_ng/db/instances.json -
"warden_handle" field for the hash describing your specific application
("application_id" value is the same as cf app --guid)
3) cd into /var/vcap/packages/warden/warden
4) bundle install
5) ./bin/warden --socket /var/vcap/data/warden/warden.sock
6) > net_out --handle <your handle from instances.json> --port <your port
to open>

This is for CF v208, an earlier version of Warden client may have slightly
different API - see command help.

On Fri, May 22, 2015 at 10:21 AM, Youzhi Zhu <zhuyouzhi03(a)gmail.com> wrote:

Hi all

I have an app A and a service B, service B is running on the dea
server(ip 10.0.0.254), app A need to connect with service B through tcp, it
works normally in my LAN, but when I push A to cf, it cannot connect to B,
then I execute bin/wsh to get into the container and ping the host ip,
it's unreachable, as below:







*root(a)18mkbd9n808:~# ping 10.0.0.254 PING 10.0.0.254 (10.0.0.254) 56(84)
bytes of data. From 10.0.0.254 icmp_seq=1 Destination Port Unreachable From
10.0.0.254 icmp_seq=2 Destination Port Unreachable ^C --- 10.0.0.254 ping
statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet
loss, time 1002ms*

But if I ping another host in the LAN. it can be reached!!!








*root(a)18mkbd9n808:~# ping 10.0.0.253 PING 10.0.0.253 (10.0.0.253) 56(84)
bytes of data. 64 bytes from 10.0.0.253 <http://10.0.0.253>: icmp_seq=1
ttl=63 time=1.60 ms 64 bytes from 10.0.0.253 <http://10.0.0.253>:
icmp_seq=2 ttl=63 time=0.421 ms ^C --- 10.0.0.253 ping statistics --- 2
packets transmitted, 2 received, 0% packet loss, time 1001ms rtt
min/avg/max/mdev = 0.421/1.013/1.606/0.593 ms*

It's wired!!! my cf-release is cf-175 and I have only one dea server.Does
anyone met this situation before? thanks!
--
Lev Berman

Altoros - Cloud Foundry deployment, training and integration

Github
*: https://github.com/ldmberman <https://github.com/ldmberman>*


container cannot communicate with the host

Youzhi Zhu
 

Hi all

I have an app A and a service B, service B is running on the dea
server(ip 10.0.0.254), app A need to connect with service B through tcp, it
works normally in my LAN, but when I push A to cf, it cannot connect to B,
then I execute bin/wsh to get into the container and ping the host ip,
it's unreachable, as below:







*root(a)18mkbd9n808:~# ping 10.0.0.254PING 10.0.0.254 (10.0.0.254) 56(84)
bytes of data.From 10.0.0.254 icmp_seq=1 Destination Port UnreachableFrom
10.0.0.254 icmp_seq=2 Destination Port Unreachable^C--- 10.0.0.254 ping
statistics ---2 packets transmitted, 0 received, +2 errors, 100% packet
loss, time 1002ms*

But if I ping another host in the LAN. it can be reached!!!








*root(a)18mkbd9n808:~# ping 10.0.0.253PING 10.0.0.253 (10.0.0.253) 56(84)
bytes of data.64 bytes from 10.0.0.253 <http://10.0.0.253>: icmp_seq=1
ttl=63 time=1.60 ms64 bytes from 10.0.0.253 <http://10.0.0.253>: icmp_seq=2
ttl=63 time=0.421 ms^C--- 10.0.0.253 ping statistics ---2 packets
transmitted, 2 received, 0% packet loss, time 1001msrtt min/avg/max/mdev =
0.421/1.013/1.606/0.593 ms*

It's wired!!! my cf-release is cf-175 and I have only one dea server.Does
anyone met this situation before? thanks!


Re: Setting up API endpoint failed in Local CF

Balaramaraju JLSP <balaramaraju@...>
 

Hi All,

I found the issue , got the details from deploy.yml

thanks
Balaramaraju

On Fri, May 22, 2015 at 10:44 AM, Balaramaraju JLSP <balaramaraju(a)gmail.com>
wrote:

Hi All,

using the *https://github.com/yudai/cf_nise_installer
<https://github.com/yudai/cf_nise_installer>* i install local CF and able
to the start services ".\scripts\start.sh"

logs:-

All processes have been started!
-u admin -p c1oudc0w --skip-ssl-validation'ip.io
Download CF CLI from https://github.com/cloudfoundry/cli

but while setting up the endpiont is it failing

vagrant(a)vagrant-ubuntu-trusty-64:/vagrant$ cf api --skip-ssl-validation
https://api.vagrant-ubuntu-tip.io-64.io
Setting api endpoint to https://api.vagrant-ubuntu-trusty-64.io...
FAILED
Error performing request: Get
https://api.vagrant-ubuntu-trusty-64.ip.io/v2/info: dial tcp
50.21.180.100:443: i/o timeout

any help is appreciated .

--
Balaramaraju


--
J L S P Balaramaraju


Setting up API endpoint failed in Local CF

Balaramaraju JLSP <balaramaraju@...>
 

Hi All,

using the *https://github.com/yudai/cf_nise_installer
<https://github.com/yudai/cf_nise_installer>* i install local CF and able
to the start services ".\scripts\start.sh"

logs:-

All processes have been started!
-u admin -p c1oudc0w --skip-ssl-validation'ip.io
Download CF CLI from https://github.com/cloudfoundry/cli

but while setting up the endpiont is it failing

vagrant(a)vagrant-ubuntu-trusty-64:/vagrant$ cf api --skip-ssl-validation
https://api.vagrant-ubuntu-tip.io-64.io
Setting api endpoint to https://api.vagrant-ubuntu-trusty-64.io...
FAILED
Error performing request: Get
https://api.vagrant-ubuntu-trusty-64.ip.io/v2/info: dial tcp
50.21.180.100:443: i/o timeout

any help is appreciated .

--
Balaramaraju


Re: cf-release v209 published

James Bayer
 

skinny buildpacks refer to each buildpack no longer shipping old
unsupported or insecure versions of runtimes. you can still customize the
buildpacks to include older runtimes by building the buildpack yourself
with the admin buildpack feature.

On Thu, May 21, 2015 at 7:34 AM, Long Nguyen <long.nguyen11288(a)gmail.com>
wrote:

Wooot! Thanks James. Is skinny just having latest 2 version of language?


On May 21, 2015 at 1:42:11 AM, James Bayer (jbayer(a)pivotal.io) wrote:

more info is coming soon, we don't have all of the release notes published
yet because dieu and shannon are out of the office. i wanted to share the
great news that the new skinny buildpacks reduced the size of cf-release
from 5.2gb -> 3.5gb!

the "what's in the deploy" file is awaiting approval b/c of mailman limits.

thanks buildpacks team!

--
Thank you,

James Bayer
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Thank you,

James Bayer


Re: cf-release v209 published

Long Nguyen
 

Wooot! Thanks James. Is skinny just having latest 2 version of language?

On May 21, 2015 at 1:42:11 AM, James Bayer (jbayer(a)pivotal.io) wrote:

more info is coming soon, we don't have all of the release notes published yet because dieu and shannon are out of the office. i wanted to share the great news that the new skinny buildpacks reduced the size of cf-release from 5.2gb -> 3.5gb!

the "what's in the deploy" file is awaiting approval b/c of mailman limits.

thanks buildpacks team!

--
Thank you,

James Bayer
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: cf-release v209 published

Wayne E. Seguin
 

That's an incredible savings and excellent to see, thanks Buildpacks team!
:)

On Thu, May 21, 2015 at 1:42 AM, James Bayer <jbayer(a)pivotal.io> wrote:

more info is coming soon, we don't have all of the release notes published
yet because dieu and shannon are out of the office. i wanted to share the
great news that the new skinny buildpacks reduced the size of cf-release
from 5.2gb -> 3.5gb!

the "what's in the deploy" file is awaiting approval b/c of mailman limits.

thanks buildpacks team!

--
Thank you,

James Bayer

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


cf-release v209 published

James Bayer
 

more info is coming soon, we don't have all of the release notes published
yet because dieu and shannon are out of the office. i wanted to share the
great news that the new skinny buildpacks reduced the size of cf-release
from 5.2gb -> 3.5gb!

the "what's in the deploy" file is awaiting approval b/c of mailman limits.

thanks buildpacks team!

--
Thank you,

James Bayer


cf-release v209 published

James Bayer
 

more info is coming soon, we don't have all of the release notes published
yet because dieu and shannon are out of the office. i wanted to share the
great news that the new skinny buildpacks reduced the size of cf-release
from 5.2gb -> 3.5gb!

thanks buildpacks team!

--
Thank you,

James Bayer


Buildpacks PMC - 2015-05-20 Notes

Mike Dalessio
 

Howdy all,

We had a meeting of the Buildpacks PMC today, which was only lightly
attended and we adjourned quickly.

Permanent notes are at:

https://github.com/cloudfoundry/pmc-notes/blob/master/Buildpacks/2015-05-20-buildpacks.md

but I've helpfully also included a snapshot of those notes below.

Happy Wednesday!
-mike

---

*# Buildpacks PMC Meeting 2015-05-20*

*## Agenda*

1. Update on Java Buildpack (Ryan Morgan)
2. Update on core Buildpacks (Mike Dalessio)
3. Open Discussion


*## Attendees*

* Chip Childers, Cloud Foundry Foundation
* Mike Dalessio, Pivotal (PMC lead)
* Matthew Sykes, IBM


*## Update on Java Buildpack (Ryan Morgan)*

* Added support for Wily Introscope.
* Memory calculator re-written in Go and supports java memory
configuration at startup time rather than staging time. Some details
on this feature were posted to cf-dev by Chris Frost last week.
* Team currently discussing how to 'pin' buildpack dependency versions
to allow for repeatable offline buildpack creation.
* Team also discussing moving from Jenkins to Concourse for CI.

Mike will follow up with the team to discuss how `buildpack-packager`
might be used to pin cached dependencies.


*## Update on core Buildpacks (Mike Dalessio)*

The [binary buildpack][binary] was added into `cf-release` last week
([PR here][binary-pr]), and moved into the `cloudfoundry` github
org. This is the same idea as what's commonly called a "null"
buildpack, where developers can simply execute a binary at runtime.

The [staticfile buildpack][static] was added into `cf-release` last
week ([PR here][static-pr]), and moved into the `cloudfoundry` github
org. Originally created by Dr. Nic, this buildpacks allows a static
website to be published behind nginx, and for nginx to be configured
in a few interesting ways.

At long last, [skinny buildpacks][skinny] made it into `cf-release`
([PRs here][skinny-pr]). There was also some interesting discussion on
the mailing lists, both [old][skinny-thread1] and
[new][skinny-thread2].

Notable near-term goals:

* ability to generate and test CF rootfs-specific binaries; and tooling for
CF operators to do the same
* work more closely with the java-buildpacks team


*## Open Discussion*

Mike: Just want to note that IBM open-sourced their linux-based ASP.NET/Mono
buildpack this week. Looks awesome!

https://github.com/cloudfoundry-community/asp.net5-buildpack

---

[binary]: https://github.com/cloudfoundry/binary-buildpack
[binary-pr]: https://github.com/cloudfoundry/cf-release/pull/677
[static]: https://github.com/cloudfoundry/staticfile-buildpack
[static-pr]: https://github.com/cloudfoundry/cf-release/pull/668
[skinny]:
https://github.com/cloudfoundry-incubator/buildpack-packager/issues/4
[skinny-pr]:
https://github.com/cloudfoundry/cf-release/pulls?utf8=%E2%9C%93&q=is%3Apr+buildpack+skinny+
[skinny-thread1]:
https://groups.google.com/a/cloudfoundry.org/forum/#!searchin/vcap-dev/addressing$20buildpack/vcap-dev/1HmGK4wU3Rc/lk186OOtdbMJ
[skinny-thread2]:
http://lists.cloudfoundry.org/pipermail/cf-dev/2015-May/000005.html


Re: [vcap-dev] Java OOM debugging

Daniel Mikusa
 

On Thu, May 14, 2015 at 10:23 AM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi Lari,

Thanks again for your input. Have you seen this problem with versions of
Tomcat before 8.0.20?

David and I think we've narrowed down the issue to a change from using
Tomcat 8.0.18 to 8.0.21. We're running more tests and collaborating with
Pivotal support. We also noticed that non-prod versions of our apps were
taking longer to crash, so it would seem to be activity-related at least.
Since it seems activity related, have you tried monitoring the number of
threads in the JVM?

While you can cap the number of threads Tomcat uses for processing
requests, I don't believe that you can cap the number of threads it's
possible to create in the JVM. The reason I mention this is because each
thread causes the amount of memory required to go up by the thread stack
size (Xss * <threads> = total thread memory). Perhaps as activity
increases, so does the thread count and that's pushing you over the limit.

Are you setting a custom -Xss value or using the default? From memory, the
default is pretty large. If you're not using a custom one, you might try a
smaller one, like 256k and see if that has any impact on the problem.

Alternatively, you could adjust the memory weightings in the build pack so
that heap consumes a smaller amount of the total memory and there's more
memory available for native / stack and other memory.



Do you know how Tomcat's APR/NIO memory gets allocated?
I'm not sure I follow your question, but in terms of NIO I suspect the JVM
is going to handle memory allocation not Tomcat. Given that, I it should
happen just like any other Java code that uses the NIO. APR is unlikely to
be an issue, see my next comment.


Is there a way of telling from pmap whether pages are being used for NIO
buffers or by the APR?
Unless you compile the APR native library and include it with your version
of the build pack, Tomcat's not going to use it. It'll use NIO by default,
with Tomcat version 8. You can confirm by looking at the logs when you
start your app. One of the first things it logs will be the protocol
handler. This is from a test app, where it's using NIO.

Ex:

```
[CONTAINER] org.apache.coyote.http11.Http11NioProtocol INFO
Initializing ProtocolHandler ["http-nio-63227"]
```

Dan




I wonder if the other folks that have reported CF out of memory errors
with later versions of Tomcat are seeing slow creeps in native memory
consumption?

On Mon, May 11, 2015 at 2:19 PM, Lari Hotari <Lari(a)hotari.net> wrote:


fyi. Tomcat 8.0.20 might be consuming more memory than 8.0.18:

https://github.com/cloudfoundry/java-buildpack/issues/166#issuecomment-94517568

Other things we’ve tried:

- We set verbose garbage collection to verify there was no
memory size issues within the JVM. There wasn’t.

- We tried setting minimum memory for native, it had no effect.
The container still gets killed

- We tried adjusting the ‘memory heuristics’ so that they added
up to 80 rather than 100. This had the effect of causing a delay in the
container being killed. However it still was killed.

I think adjusting memory heuristics so that they add up to 80 doesn't
make a difference because the values aren't percentages.
The values are proportional weighting values used in the memory
calculation:

https://github.com/grails-samples/java-buildpack/blob/b4abf89/docs/jre-oracle_jre.md#memory-calculation

I found out that the only way to reserve "unused" memory is to set a high
value for the native memory lower bound in the memory_sizes.native setting
of config/open_jdk_jre.yml .
Example:

https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25



This seems like classic memory leak behaviour to me.

In my case it wasn't a classical Java memory leak, since the Java
application wasn't leaking memory. I was able to confirm this by getting
some heap dumps with the HeapDumpServlet (
https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/HeapDumpServlet.groovy)
and analyzing them.

In my case the JVM's RSS memory size is slowly growing. It probably is
some kind of memory leak since one process I've been monitoring now is very
close to the memory limit. The uptime is now almost 3 weeks.

Here is the latest diff of the meminfo report.

https://gist.github.com/lhotari/ee77decc2585f56cf3ad#file-meminfo_diff_example2-txt

From a Java perspective this isn't classical. The JVM heap isn't filling
up. The problem is that RSS size is slowly growing and will eventually
cause the Java process to cross the memory boundary so that the process
gets kill by the Linux kernel cgroups OOM killer.

RSS size might be growing because of many reasons. I have been able to
slow down the growth by doing the various MALLOC_ and JVM parameter tuning
(-XX:MinMetaspaceExpansion=1M -XX:CodeCacheExpansionSize=1M). I'm able to
get a longer uptime, but the problem isn't solved.

Lari



On 15-05-11 06:41 AM, Head-Rapson, David wrote:

Thanks for the continued advice.



We’ve hit on a key discovery after yet another a soak test this weekend.

- When we deploy using Tomcat 8.0.18 we don’t see the issue

- When we deploy using Tomcat 8.0.20 (same app version, same CF
space, same services bound, same JBP code version, same JRE version,
running at the same time), we see the crashes occurring after just a couple
of hours.



Ideally we’d go ahead with the memory calculations you mentioned however
we’re stuck on lucid64 because we’re using Pivotal CF 1.3.x & we’re having
upgrade issues to 1.4.x.

So we’re not able to adjust MALLOC_ARENA_MAX, nor are we able to view RSS
in pmap as you describe



Other things we’ve tried:

- We set verbose garbage collection to verify there was no
memory size issues within the JVM. There wasn’t.

- We tried setting minimum memory for native, it had no effect.
The container still gets killed

- We tried adjusting the ‘memory heuristics’ so that they added
up to 80 rather than 100. This had the effect of causing a delay in the
container being killed. However it still was killed.



This seems like classic memory leak behaviour to me.



*From:* Lari Hotari [mailto:lari.hotari(a)sagire.fi <lari.hotari(a)sagire.fi>]
*On Behalf Of *Lari Hotari
*Sent:* 08 May 2015 16:25
*To:* Daniel Jones; Head-Rapson, David
*Cc:* cf-dev(a)lists.cloudfoundry.org
*Subject:* Re: [Cf-dev] [vcap-dev] Java OOM debugging




For my case, it turned out to be essential to reserve enough memory for
"native" in the JBP. For the 2GB total memory, I set the minimum to 330M.
With that setting I have been able to get over 2 weeks up time by now.

I mentioned this in my previous email:

The workaround for that in my case was to add a native key under
memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is for a
2GB total memory).
see example
https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the memory
bounds. I'm sure there is now also a way to get the keys without forking
the buildpack. I could have also adjusted the percentage portions, but I
wanted to set a hard minimum for this case.


I've been trying to get some insight by diffing the reports gathered from
the meminfo servlet https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy


Here is such an example of a diff:

https://gist.github.com/lhotari/ee77decc2585f56cf3ad#file-meminfo_diff_example-txt

meminfo has pmap output included to get the report of the memory map of
the process. I have just noticed that most of the memory has already been
mmap:ed from the OS and it's just growing in RSS size. For example:
< 00000000a7600000 1471488 1469556 1469556 rw--- [ anon ]
00000000a7600000 1471744 1470444 1470444 rw--- [ anon ]
The pmap output from lucid64 didn't include the RSS size, so you have to
use cflinuxfs2 for this. It's also better because of other reasons. The
glibc in lucid64 is old and has some bugs around the MALLOC_ARENA_MAX.

I was manually able to estimate the maximum size of the RSS size of what
the Java process will consume by simply picking the large anon-blocks from
the pmap report and calculating those blocks by the allocated virtual size
(VSS).
Based on this calculation, I picked the minimum of 330M for "native" in
open_jdk_jre.yml as I mentioned before.

It looks like these rows are for the Heap size:
< 00000000a7600000 1471488 1469556 1469556 rw--- [ anon ]
00000000a7600000 1471744 1470444 1470444 rw--- [ anon ]
It looks like the JVM doesn't fully allocate that block in RSS initially
and most of the growth of RSS size comes from that in my case. In your
case, it might be something different.

I also added a servlet for getting glibc malloc_info statistics in XML
format (). I haven't really analysed that information because of time
constraints and because I don't have a pressing problem any more. btw. The
malloc_info XML report is missing some key elements, that has been added in
later glibc versions (
https://github.com/bminor/glibc/commit/4d653a59ffeae0f46f76a40230e2cfa9587b7e7e
).

If killjava.sh never fires and the app crashed with Warden out of memory
errors, then I believe it's the kernel's cgroups OOM killer that has killed
the container processes. I have found this location where Warden oom
notifier gets the OOM notification event:

https://github.com/cloudfoundry/warden/blob/ad18bff/warden/lib/warden/container/features/mem_limit.rb#L70
This is the oom.c source code:
https://github.com/cloudfoundry/warden/blob/ad18bff7dc56acbc55ff10bcc6045ebdf0b20c97/warden/src/oom/oom.c
. It reads the cgroups control files and receives events from the kernel
that way.

I'd suggest that you use pmap for the Java process after it has started
and calculate the maximum RSS size by calculating the VSS size of the large
anon blocks instead of RSS for the blocks that the Java process has
reserved for it's different memory areas (I think you shouldn't . You
should discard adding VSS for the CompressedClassSpaceSize block.
After this calculation, add enough memory to the "native" parameter in
JBP until the RSS size calculated this way stays under the limit.
That's the only "method" I have come up by now.

It might be required to have some RSS space allocated for any zip/jar
files read by the Java process. I think that Java uses mmap files for zip
file reading by default and that might go on top of all other limits.
To test this theory, I'd suggest testing by adding
-Dsun.zip.disableMemoryMapping=true system property setting to JAVA_OPTS.
That disables the native mmap for zip/jar file reading. I haven't had time
to test this assumption.

I guess the only way to understand how Java allocates memory is to look
at the source code.
from http://openjdk.java.net/projects/jdk8u/ , the instructions to get
the source code of JDK 8:
hg clone http://hg.openjdk.java.net/jdk8u/jdk8u;cd jdk8u;sh
get_source.sh
This tool is really good for grepping and searching the source code:
http://geoff.greer.fm/ag/
On Ubuntu it's in silversearcher-ag package, "apt-get install
silversearcher-ag" and on MacOSX brew it's "brew install
the_silver_searcher".
This alias is pretty useful:
alias codegrep='ag --color --group --pager less -C 5'
Then you just search for the correct location in code by starting with
the tokens you know about:
codegrep MaxMetaspaceSize
this gives pretty good starting points in looking how the JDK allocates
memory.

So the JDK source code is only a few commands away.

It would be interesting to hear more about this if someone has the time
to dig in to this. This is about how far I got and I hope sharing this
information helps someone continue. :)


Lari
github/twitter: lhotari

On 15-05-08 10:02 AM, Daniel Jones wrote:

Hi Lari et al,



Thanks for your help Lari.



David and I are pairing on this issue, and we're yet to resolve it. We're
in the process of creating a repeatable test case (our most crashy app
makes calls to external services that need mocking), but in the meantime,
here's what we've seen.



Between Java Buildpack commit e89e546 and 17162df, we see apps crashing
with Warden out of memory errors. killjava.sh never fires, and this has led
us to believe that the kernel is shooting a cgroup process in the head
after the cgroup oversteps its memory limit. We cannot find any evidence of
the OOM killer firing in any logs, but we may not be looking in the right
place.



The JBP is setting heap to be 70%, metaspace to be 15% (with max set to
the same as initial), 5% for "stack", 5% for "normalised stack" and 10% for
"native". We do not understand why this adds up to 105%, but haven't looked
into the JBP algorithm yet. Any pointers on what "normalised stack" is
would be much appreciated, as this doesn't appear in the list of heuristics
supplied via app env.



Other team members tried applying the same settings that you suggested -
thanks for this. Apps still crash with these settings, albeit less
frequently.



After reading the blog you linked to (
http://java.dzone.com/articles/java-8-permgen-metaspace) we wondered
whether the increased *reserved *metaspace claimed after metaspace GC
might be causing a problem; however we reused the test code to create a
metaspace leak in a CF app and saw metaspace GCs occur correctly, and
memory usage never grow over MaxMetaspaceSize. This figures, as the
committed metaspace is still less than MaxMetaspaceSize, and the reserved
appears to be whatever RAM is free across the whole DEA.



We noted that an Oracle blog (
https://blogs.oracle.com/poonam/entry/about_g1_garbage_collector_permanent)
mentions that the metaspace size parameters are approximate. We're
currently wondering if native allocations by Tomcat (APR, NIO) are taking
up more container memory, and so when the metaspace fills, it's creeping
slightly over the limit and triggering the kernel's OOM killer.



Any suggestions would be much appreciated. We've tried to resist tweaking
heuristics blindly, but are running out of options as we're struggling to
figure out how the Java process is using *committed* memory. pmap seems
to show virtual memory, and so it's hard to see if things like the
metaspace or NIO ByteBuffers are nabbing too much and trigger the kernel's
OOM killer.



Thanks for all your help,



Daniel Jones & David Head-Rapson



On Wed, Apr 29, 2015 at 8:07 PM, Lari Hotari <Lari(a)hotari.net> wrote:

Hi,

I created a few tools to debug OOM problems since the application I was
responsible for running on CF was failing constantly because of OOM
problems. The problems I had, turned out not to be actual memory leaks in
the Java application.

In the "cf events appname" log I would get entries like this:
2015-xx-xxTxx:xx:xx.00-0400 app.crash appname index: 1,
reason: CRASHED, exit_description: out of memory, exit_status: 255

These type of entries are produced when the container goes over it's
memory resource limits. It doesn't mean that there is a memory leak in the
Java application. The container gets killed by the Linux kernel oom killer (
https://github.com/cloudfoundry/warden/blob/master/warden/README.md#limit-handle-mem-value)
based on the resource limits set to the warden container.

The memory limit is specified in number of bytes. It is enforced using
the control group associated with the container. When a container exceeds
this limit, one or more of its processes will be killed by the kernel.
Additionally, the Warden will be notified that an OOM happened and it
subsequently tears down the container.

In my case it never got killed by the killjava.sh script that gets called
in the java-buildpack when an OOM happens in Java.

This is the tool I built to debug the problems:
https://github.com/lhotari/java-buildpack-diagnostics-app
I deployed that app as part of the forked buildpack I'm using.
Please read the readme about what it's limitations are. It worked for me,
but it might not work for you. It's opensource and you can fork it. :)

There is a solution in my toolcase for creating a heapdump and uploading
that to S3:

https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/HeapDumpServlet.groovy
The readme explains how to setup Amazon S3 keys for this:
https://github.com/lhotari/java-buildpack-diagnostics-app#amazon-s3-setup
Once you get a dump, you can then analyse the dump in a java profiler
tool like YourKit.

I also have a solution that forks the java-buildpack modifies killjava.sh
and adds a script that uploads the heapdump to S3 in the case of OOM:

https://github.com/lhotari/java-buildpack/commit/2d654b80f3bf1a0e0f1bae4f29cb85f56f5f8c46

In java-buildpack-diagnostics-app I have also other tools for getting
Linux operation system specific memory information, for example:


https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemoryInfoServlet.groovy

https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MemorySmapServlet.groovy

https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/MallocInfoServlet.groovy

These tools are handy for looking at details of the Java process RSS
memory usage growth.

There is also a solution for getting ssh shell access inside your
application with tmate.io:

https://github.com/lhotari/java-buildpack-diagnostics-app/blob/master/src/main/groovy/io/github/lhotari/jbpdiagnostics/TmateSshServlet.groovy
(this version is only compatible with the new "cflinuxfs2" stack)

It looks like there are serious problems on CloudFoundry with the memory
sizing calculation. An application that doesn't have a OOM problem will get
killed by the oom killer because the Java process will go over the memory
limits.
I filed this issue:
https://github.com/cloudfoundry/java-buildpack/issues/157 , but that
might not cover everything.

The workaround for that in my case was to add a native key under
memory_sizes in open_jdk_jre.yml and set the minimum to 330M (that is for a
2GB total memory).
see example
https://github.com/grails-samples/java-buildpack/blob/22e0f6a/config/open_jdk_jre.yml#L25
that was how I got the app I'm running on CF to stay within the memory
bounds. I'm sure there is now also a way to get the keys without forking
the buildpack. I could have also adjusted the percentage portions, but I
wanted to set a hard minimum for this case.

It was also required to do some other tuning.

I added this to JAVA_OPTS:
-XX:CompressedClassSpaceSize=256M -XX:InitialCodeCacheSize=64M
-XX:CodeCacheExpansionSize=1M -XX:CodeCacheMinimumFreeSpace=1M
-XX:ReservedCodeCacheSize=200M -XX:MinMetaspaceExpansion=1M
-XX:MaxMetaspaceExpansion=8M -XX:MaxDirectMemorySize=96M
while trying to keep the Java process from growing in RSS memory size.

The memory overhead of a 64 bit Java process on Linux can be reduced by
specifying these environment variables:

stack: cflinuxfs2
.
.
.
env:
MALLOC_ARENA_MAX: 2
MALLOC_MMAP_THRESHOLD_: 131072
MALLOC_TRIM_THRESHOLD_: 131072
MALLOC_TOP_PAD_: 131072
MALLOC_MMAP_MAX_: 65536

MALLOC_ARENA_MAX works only on cflinuxfs2 stack (the lucid64 stack has a
buggy version of glibc).

explanation about MALLOC_ARENA_MAX from Heroku:
https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
some measurement data how it reduces memory consumption:
https://devcenter.heroku.com/articles/testing-cedar-14-memory-use

I have created a PR to add this to CF java-buildpack:
https://github.com/cloudfoundry/java-buildpack/pull/160

I also created an issues
https://github.com/cloudfoundry/java-buildpack/issues/163 and
https://github.com/cloudfoundry/java-buildpack/pull/159 .

I hope this information helps others struggling with OOM problems in CF.
I'm not saying that this is a ready made solution just for you. YMMV. It
worked for me.

-Lari




On 15-04-29 10:53 AM, Head-Rapson, David wrote:

Hi,

I’m after some guidance on how to get profile Java apps in CF, in order
to get to the bottom of memory issues.

We have an app that’s crashing every few hours with OOM error, most
likely it’s a memory leak.

I’d like to profile the JVM and work out what’s eating memory, however
tools like yourkit require connectivity INTO the JVM server (i.e. the
warden container), either via host / port or via SSH.

Since warden containers cannot be connected to on ports other than for
HTTP and cannot be SSHd to, neither of these works for me.



I tried installed a standalone JDK onto the warden container, however as
soon as I ran ‘jmap’ to invoke the dump, warden cleaned up the container –
most likely for memory over-consumption.



I had previously found a hack in the Weblogic buildpack (
https://github.com/pivotal-cf/weblogic-buildpack/blob/master/docs/container-wls-monitoring.md)
for modifying the start script which, when used with
–XX:HeapDumpOnOutOfMemoryError, should copy any heapdump files to a file
share somewhere. I have my own custom buildpack so I could use something
similar.

Has anyone got a better solution than this?



We would love to use newrelic / app dynamics for this however we’re not
allowed. And I’m not 100% certain they could help with this either.



Dave



The information transmitted is intended for the person or entity to which
it is addressed and may contain confidential, privileged or copyrighted
material. If you receive this in error, please contact the sender and
delete the material from any computer. Fidelity only gives information on
products and services and does not give investment advice to retail clients
based on individual circumstances. Any comments or statements made are not
necessarily those of Fidelity. All e-mails may be monitored. FIL
Investments International (Reg. No.1448245), FIL Investment Services (UK)
Limited (Reg. No. 2016555), FIL Pensions Management (Reg. No. 2015142) and
Financial Administration Services Limited (Reg. No. 1629709) are authorised
and regulated in the UK by the Financial Conduct Authority. FIL Life
Insurance Limited (Reg No. 3406905) is authorised in the UK by the
Prudential Regulation Authority and regulated in the UK by the Financial
Conduct Authority and the Prudential Regulation Authority. Registered
offices at Oakhill House, 130 Tonbridge Road, Hildenborough, Tonbridge,
Kent TN11 9DZ.

--
You received this message because you are subscribed to the Google Groups
"Cloud Foundry Developers" group.
To view this discussion on the web visit
https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com
<https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/DFFA4ADB9F3BC34194429921AB329336408CAB04%40UKFIL7006WIN.intl.intlroot.fid-intl.com?utm_medium=email&utm_source=footer>
.
To unsubscribe from this group and stop receiving emails from it, send an
email to vcap-dev+unsubscribe(a)cloudfoundry.org.




_______________________________________________
Cf-dev mailing list
Cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev





--

Regards,



Daniel Jones

EngineerBetter.com





--
Regards,

Daniel Jones
EngineerBetter.com

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Re: [vcap-dev] Java OOM debugging

Daniel Mikusa
 

On Thu, May 14, 2015 at 2:59 PM, Lari Hotari <Lari(a)hotari.net> wrote:

On 15-05-14 10:23 AM, Daniel Jones wrote:
Thanks again for your input. Have you seen this problem with versions
of Tomcat before 8.0.20?
I don't have proper data gathered from older than 8.0.20, so I cannot
compare.
I was just wondering when did 8.0.20 become available in JBP, I found
this date:
HEAD https://download.run.pivotal.io/tomcat/tomcat-8.0.20.tar.gz | grep
Last-Modified
Last-Modified: Tue, 03 Mar 2015 11:35:19 GMT


David and I think we've narrowed down the issue to a change from using
Tomcat 8.0.18 to 8.0.21. We're running more tests and collaborating
with Pivotal support. We also noticed that non-prod versions of our
apps were taking longer to crash, so it would seem to be
activity-related at least.

Do you know how Tomcat's APR/NIO memory gets allocated? Is there a way
of telling from pmap whether pages are being used for NIO buffers or
by the APR?
I don't think you can get the info from pmap. The malloc_info xml shows
better allocation stats, but only stats.
Is Tomcat using APR library or NIO by default in CloudFoundry? I'd
assume that NIO isn't used by default.
Sorry for the duplicate. Resending as my first reply didn't go back to the
list.

Since the Connector in server.xml is not specifically setting an
implementation it should use the NIO connector, which is the default in
Tomcat 8. A quick test on PWS confirmed this for me.


https://github.com/cloudfoundry/java-buildpack/blob/master/resources/tomcat/conf/server.xml#L22

It would be interesting to see if the BIO or APR connectors have similar
issues. BIO would be easy to test, just add
`protocol="org.apache.coyote.http11.Http11Protocol"` to the Connector tag
on line #22.

APR would be trickier as you'd need to compile the native library and pull
that into the environment.

Dan



Have you tried the "-Dsun.zip.disableMemoryMapping=true" JVM option to
rule out the possibility that zip/jar file access is causing the
trouble? There has been some bugs in the past in the JVM in that area:

http://javaeesupportpatterns.blogspot.com.es/2011/08/mmap-file-outofmemoryerror-and-pmap.html
. That has been fixed
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6280693 , but doing
a check with "-Dsun.zip.disableMemoryMapping=true" JVM option would be
interesting.
Mainly concerned about this commit:

https://github.com/apache/tomcat/commit/6e5420c67fbad81973d888ad3701a392fac4fc71

Since most commits weren't very interesting in this diff:
https://github.com/apache/tomcat/compare/075bc2d6...c0eb033f?w=1
Might make a difference to Jar file access. I'm not saying that this
commit is a problem, just seemed like a big change.

-Lari
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

9201 - 9220 of 9390