Date   

Re: openstack / CF v234 deployment

Amit Kumar Gupta
 

Hi William,

Yes, this looks like a mixup between domain and system_domain. Both should
be the same value, system.domain.com. The two separate properties exist
for historical reasons, we plan to reduce back down to 1 to avoid these
confusions in the future.

Cheers,
Amit

On Fri, Apr 15, 2016 at 11:43 AM, Bean William R <BeanWilliamR(a)johndeere.com
wrote:
We've ventured through
https://docs.cloudfoundry.org/deploying/openstack/index.html and have CF
deployed on OpenStack.

[root(a)cfinstaller my-bosh]# bosh deployments
...

+------------------+--------------+------------------------------------------------+--------------+
| Name | Release(s) | Stemcell(s)
| Cloud Config |

+------------------+--------------+------------------------------------------------+--------------+
| cloudfoundry-lab | cf/234+dev.1 |
bosh-openstack-kvm-ubuntu-trusty-go_agent/3215 | none |

+------------------+--------------+------------------------------------------------+--------------+

After a bosh deploy, we are able to login with the admin credentials, and
cf push a staticfile app:


[root(a)cfinstaller billhello-project]# cf push billhello -m 64M
...
requested state: started
instances: 1/1
usage: 64M x 1 instances
urls: billhello.domain.com
last uploaded: Fri Apr 15 18:30:38 UTC 2016
stack: unknown
buildpack: staticfile 1.3.5

state since cpu memory disk
details
#0 running 2016-04-15 06:30:50 PM 0.0% 3.6M of 64M 5.4M of 1G


However we are not able to use the loggregator service from the cf-cli:

[root(a)cfinstaller billhello-project]# export CF_TRACE=true
[root(a)cfinstaller billhello-project]# cf logs billhello
...
WEBSOCKET REQUEST: [2016-04-15T18:35:22Z]
GET /tail/?app=d0a97a19-4794-4584-82ce-1b2fe596cf78 HTTP/1.1
Host: wss://loggregator.system.domain.com:4443
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: [HIDDEN]
Origin: http://localhost
Authorization: [PRIVATE DATA HIDDEN]


WEBSOCKET RESPONSE: [2016-04-15T18:35:22Z]
HTTP/1.1 404 Not Found
Content-Length: 91
Content-Type: text/plain; charset=utf-8
X-Cf-Routererror: unknown_route
X-Content-Type-Options: nosniff
X-Vcap-Request-Id: c8a05edd-a6b9-48d0-6f05-1c0aaa76f45f
Date: Fri, 15 Apr 2016 18:35:23 GMT

FAILED
Error dialing loggregator server: websocket: bad handshake.
Please ask your Cloud Foundry Operator to check the platform configuration
(loggregator endpoint is wss://loggregator.system.cflab.deere.com:4443).


We've already tried just restarting the services on the
loggregator_trafficcontroller_z1/0 instance, and logging out & back in with
the cf-cli... neither helps.

How does this wss://loggregator.system.domain.com:4443 get added to
nats? From the loggregator_trafficcontroller_z1/0 instance, the only
routes in /var/vcap/jobs/route_registrar/config/registrar_settings.yml are:

routes:
[{"name":"doppler","port":8081,"registration_interval":"20s","uris":["
doppler.domain.com
"]},{"name":"loggregator","port":8080,"registration_interval":"20s","uris":["
loggregator.domain.com"]}]

Should there be one for loggregator.system.domain.com? Is this just a
mismatch between system_domain and domain? Any troubleshooting tips?

Thanks,
William Bean


CF CLI v6.17.0 Released Today

Koper, Dies <diesk@...>
 

The CF CLI team just cut 6.17.0. Binaries and link to release notes are available at:

https://github.com/cloudfoundry/cli#downloads

The command reference guide on http://cli.cloudfoundry.org is being updated now.

Built with Golang 1.6.1 which addresses two security vulnerabilities

Golang 1.6.1 has just been released, addressing two vulnerabilities that could affect cf CLI users.
See https://groups.google.com/forum/#!topic/golang-nuts/9eqIHqaWvck for details.

TCP Routing

Various commands have been enhanced to support TCP routes for apps deployed to the Diego runtime.
This feature requires the target CF release to be v234 (CC API v2.53.0) or higher and Diego and the Routing API to be enabled.

App Instance Quotas

Quota related commands have been enhanced to expose app instance quotas.
This feature requires the target CF release to be v214 (CC API v2.33.0) or higher for org quotas and v221 (CC API v2.40.0) or higher for space quotas.

Native build on Mac OS

Prevents a fatal runtime error on certain Mac OS versions and Anti-Virus/Security software. See #783<https://github.com/cloudfoundry/cli/issues/783>,#789<https://github.com/cloudfoundry/cli/issues/789>

New Commands

* cf router-groups lists the router groups available to your targeted Cloud Foundry. Once an admin creates a new shared domain associated with a TCP router group, developers may create TCP routes from this domain.
* cf version shows the cf CLI version. cf --version and cf -v will remain offering the same functionality but are omitted from cf help's GLOBAL OPTIONS section in favor of the new command.

Updated Commands

* create-shared-domain now accepts a router group to create a domain for (http://cli.cloudfoundry.org/en-US/cf/create-shared-domain.html)
* domains now displays the routing type of each domain
* create-route, map-route, unmap-route, delete-route and push now support an additional option to specify a TCP route's port number
* create-route and map-route now support an additional option to request a random port for a TCP route
* routes output now includes the port number and type of route
* create-space help output now doesn't incorrectly indicate there is a default space quota (#774<https://github.com/cloudfoundry/cli/issues/774>)
* create-space now correctly looks for the specified quota in the specified org (#775<https://github.com/cloudfoundry/cli/issues/775>)
* create-service-broker now has an alias, csb
* curl now defaults to performing a POST when the -d option is specified (#788<https://github.com/cloudfoundry/cli/issues/788>)
* curl no longer displays a message that you should be logged in
* buildpacks no longer tries to retrieve information from the CF endpoint after displaying you are not logged in
* map-route and unmap-route now map and unmap routes with paths correctly (#792<https://github.com/cloudfoundry/cli/issues/792>)
* app now reports the correct URL when using routes with paths (#809<https://github.com/cloudfoundry/cli/issues/809>)
* create-app-manifest now doesn't wrap an extra set of quotes around environment variables (#800<https://github.com/cloudfoundry/cli/issues/800>)
* copy-source usage in its help page now correctly reflects that specification of a space also requires specification of the space's org

Updated Global Options

* -h is now accepted also after the command to display help, e.g. cf push myapp -h
* -v when used with a command, e.g. cf apps -v, prints API request diagnostics. This makes enabling trace for a single command much easier, particularly on Windows
* --version and -v when used by themselves still display the CLI version, but are omitted from thecf help listing in favor of the new version command
* --build, -b still display the Golang version the cf CLI was built with, but is omitted from the cf help listing as it's not relevant to most users

Updated Plugins:

* cf willitconnect v1.1.0: https://github.com/gambtho/cf_will_it_connect_plugin
* Diego Enabler v1.1.0: http://github.com/cloudfoundry-incubator/Diego-Enabler<https://github.com/cloudfoundry-incubator/Diego-Enabler>
* Usage Report v1.3.0: http://github.com/krujos/usagereport-plugin<https://github.com/krujos/usagereport-plugin>
Enjoy!

Regards,
Dies Koper
Cloud Foundry CLI PM


openstack / CF v234 deployment

Bean William R
 

We've ventured through https://docs.cloudfoundry.org/deploying/openstack/index.html and have CF deployed on OpenStack.

[root(a)cfinstaller my-bosh]# bosh deployments
...
+------------------+--------------+------------------------------------------------+--------------+
| Name | Release(s) | Stemcell(s) | Cloud Config |
+------------------+--------------+------------------------------------------------+--------------+
| cloudfoundry-lab | cf/234+dev.1 | bosh-openstack-kvm-ubuntu-trusty-go_agent/3215 | none |
+------------------+--------------+------------------------------------------------+--------------+

After a bosh deploy, we are able to login with the admin credentials, and cf push a staticfile app:


[root(a)cfinstaller billhello-project]# cf push billhello -m 64M
...
requested state: started
instances: 1/1
usage: 64M x 1 instances
urls: billhello.domain.com
last uploaded: Fri Apr 15 18:30:38 UTC 2016
stack: unknown
buildpack: staticfile 1.3.5

state since cpu memory disk details
#0 running 2016-04-15 06:30:50 PM 0.0% 3.6M of 64M 5.4M of 1G


However we are not able to use the loggregator service from the cf-cli:

[root(a)cfinstaller billhello-project]# export CF_TRACE=true
[root(a)cfinstaller billhello-project]# cf logs billhello
...
WEBSOCKET REQUEST: [2016-04-15T18:35:22Z]
GET /tail/?app=d0a97a19-4794-4584-82ce-1b2fe596cf78 HTTP/1.1
Host: wss://loggregator.system.domain.com:4443
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: [HIDDEN]
Origin: http://localhost
Authorization: [PRIVATE DATA HIDDEN]


WEBSOCKET RESPONSE: [2016-04-15T18:35:22Z]
HTTP/1.1 404 Not Found
Content-Length: 91
Content-Type: text/plain; charset=utf-8
X-Cf-Routererror: unknown_route
X-Content-Type-Options: nosniff
X-Vcap-Request-Id: c8a05edd-a6b9-48d0-6f05-1c0aaa76f45f
Date: Fri, 15 Apr 2016 18:35:23 GMT

FAILED
Error dialing loggregator server: websocket: bad handshake.
Please ask your Cloud Foundry Operator to check the platform configuration (loggregator endpoint is wss://loggregator.system.cflab.deere.com:4443).


We've already tried just restarting the services on the loggregator_trafficcontroller_z1/0 instance, and logging out & back in with the cf-cli... neither helps.

How does this wss://loggregator.system.domain.com:4443 get added to nats? From the loggregator_trafficcontroller_z1/0 instance, the only routes in /var/vcap/jobs/route_registrar/config/registrar_settings.yml are:

routes: [{"name":"doppler","port":8081,"registration_interval":"20s","uris":["doppler.domain.com"]},{"name":"loggregator","port":8080,"registration_interval":"20s","uris":["loggregator.domain.com"]}]

Should there be one for loggregator.system.domain.com? Is this just a mismatch between system_domain and domain? Any troubleshooting tips?

Thanks,
William Bean


Re: Remarks about the “confab” wrapper for consul

Benjamin Gandon
 

As an update, it looks like I’m running into the Node health flapping <https://github.com/hashicorp/consul/issues/1212> issue that is more frequent with consul 0.5.x servers compared to 0.6.x servers.

→ Q1: Are you planning to upgrade the consul version used in CF and Diego from 0.5.2 to 0.6.4 in near future?


Also, people recommend the following settings to mitigate the issue.
"dns_config": {
"allow_stale": true,
"node_ttl": "5s",
"service_ttl": {
"*": "5s"
}
}
I’ll try those and keep you updated with the results next week. Unfortunately, I’ll have to fork the consul-release <https://github.com/cloudfoundry-incubator/consul-release> because those settings are also hardwired to their default <https://github.com/cloudfoundry-incubator/consul-release/blob/master/src/confab/config/consul_config_definer.go#L13-L35> in confab.

→ Q2: Are you planing so update of confab so that people can tweak their consul settings directly from BOSH deployment?


Regarding my previous remark about properly configuring “skip_leave_on_interrupt” and “leave_on_terminate” in confab, I understand that the default value of “true” for “leave_on_terminate” might be necessary to properly scale down a consul cluster with BOSH.

But I saw today that skip_leave_on_interrupt will default to true <https://github.com/hashicorp/consul/blob/master/CHANGELOG.md> for consul servers in the upcoming version 0.7.0. Currently, this config is hard-wired to its default value of “false” in confab.

→ Q3: Are you planning to update this “skip_leave_on_interrupt” config in confab?


/Benjamin

Le 14 avr. 2016 à 17:00, Benjamin Gandon <benjamin(a)gandon.org> a écrit :

Thank you Amit for your answer.


I ran again in the “all-consuls-go-crazy” situation today, as quite every day actually. As soon as they start this flapping membership issue, the whole cf+diego deployment goes down.

Before I delete the content of the persistent storage, when I restart the consul servers, they don’t manage to elect a leader :
https://gist.github.com/bgandon/08707466324be7c9a093a56fd95a64e4 <https://gist.github.com/bgandon/08707466324be7c9a093a56fd95a64e4>

After I delete /var/vcap/store/consul_agent on all 3 consul servers, a consul leader is properly elected, but the cluster rapidly re-start flapping again with failures suspicions, missing acks, and timeouts :
https://gist.github.com/bgandon/cab53c22da66b24beff46389ba7f0bdc <https://gist.github.com/bgandon/cab53c22da66b24beff46389ba7f0bdc>

And at that time, the load of the bosh-ite VM goes up to 280+ and everything becomes very unresponsive.

How is it possible to bring the consul cluster in a healthy state again? I don’t want to reboot the bosh-lite VM and recreate all deployments with cloudchecks anymore.


/Benjamin


Le 11 avr. 2016 à 22:40, Amit Gupta <agupta(a)pivotal.io <mailto:agupta(a)pivotal.io>> a écrit :

Orchestrating a raft cluster in a way that requires no manual intervention is incredibly difficult. We write the PID file late for a specific reason:

https://www.pivotaltracker.com/story/show/112018069
<https://www.pivotaltracker.com/story/show/112018069>

For dealing with wedged states like the one you encountered, we have some recommendations in the documentation:

https://github.com/cloudfoundry-incubator/consul-release/#disaster-recovery <https://github.com/cloudfoundry-incubator/consul-release/#disaster-recovery>

We have acceptance tests we run in CI that exercise rolling a 3 node cluster, so if you hit a failure it would be useful to get logs if you have any.

Cheers,
Amit

On Mon, Apr 11, 2016 at 9:38 AM, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> wrote:
Actually, doing some further tests, I realize a mere 'join' is definitely not enough.

Instead, you need to restore the raft/peers.json on each one of the 3 consul server nodes:

monit stop consul_agent
echo '["10.244.0.58:8300 <http://10.244.0.58:8300/>","10.244.2.54:8300 <http://10.244.2.54:8300/>","10.244.0.54:8300 <http://10.244.0.54:8300/>"]' > /var/vcap/store/consul_agent/raft/peers.json

And make sure you start them quite at the same time with “monit start consul_agent”

So this advocates a strongly for setting skip_leave_on_interrupt=true and leave_on_terminate=false in confab, because loosing the peers.json is really something we don't want in our CF deployments!

/Benjamin


Le 11 avr. 2016 à 18:15, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> a écrit :

Hi cf devs,


I’m running a CF deployment with redundancy, and I just experienced my consul servers not being able to elect any leader.
That’s a VERY frustrating situation that keeps the whole CF deployment down, until you get a deeper understanding of consul, and figure out they just need a silly manual 'join' so that they get back together.

But that was definitely not easy to nail down because at first look, I could just see monit restarting the “agent_ctl” every 60 seconds because confab was not writing the damn PID file.


More specifically, the 3 consul servers (i.e. consul_z1/0, consul_z1/1 and consul_z2/0) had properly left oneanother uppon a graceful shutdown. This state was persisted in /var/vcap/store/raft/peers.json being “null” on each one of them, so they would not get back together on restart. A manual 'join' was necessary. But it took me hours to get there because I’m no expert with consul.

And until the 'join' is made, VerifySynced() was negative in confab, and monit was constantly starting and stopping it every 60 seconds. But once you step back, you realize confab was actually waiting for the new leader to be elected before it writes the PID file. Which is questionable.

So, I’m asking 3 questions here:

1. Does writing the PID file in confab that late really makes sense?
2. Could someone please write some minimal documentation about confab, at least to tell what it is supposed to do?
3. Wouldn’t it be wiser that whenever any of the consul servers is not here, then the cluster gets unhealthy?

With this 3rd question, I mean that even on a graceful TERM or INT, no consul server should not perform any graceful 'leave'. With this different approach, then they would properly be back up even when performing a complete graceful restart of the cluster.

This can be done with those extra configs from the “confab” wrapper:

{
"skip_leave_on_interrupt": true,
"leave_on_terminate": false
}

What do you guys think of it?


/Benjamin


Changes to cf-test-helpers

David Sabeti
 

Hi all,

tl;dr We've introduced a breaking change to cf-test-helpers (https://github.com/cloudfoundry-incubator/cf-test-helpers), so be careful if you decide to update to the newest version in your test suite.

The Release Integration team is making changes to cf-test-helpers, with a few different goals. One is to ensure that cf-test-helpers stop leaking credentials. Another is to push assertions from the test helpers up into the tests themselves. As a result, we've had to re-design a few parts of cf-test-helpers, specifically the `cmdRunner`. The class included a good deal of logic around running sub-processes (like the cf cli), including retries and making assertions on process output. In addition to occasionally leaking credentials, this logic was difficult to work with and should probably have been implemented in test cases rather than a helper package. So, we've removed the `cmdRunner` entirely. If your tests use `cmdRunner` (with the constructor `NewCmdRunner`), you'll need to modify your tests when you upgrade cf-test-helpers.

It's likely that this change will not affect anybody too seriously, since most people are using the package function `Run()` to shell out to sub-processes, and that interface has not changed at all. Still, some of you may be using the `cmdRunner` and deserve a head-up about the change.

If you'd like to update your dependency on cf-test-helpers, and you're using the `cmdRunner` in your tests, please feel free to reach out to the Release Integration team for help in migrating your test code.

David && Dennis
CF Release Integration


Re: Scope Error Insufficient scope for user

Filip Hanik
 

ClientAuthenticationFailure ('Bad credentials'): principal=admin_ui_client

your password for client admin_ui_client is incorrect

On Thu, Apr 14, 2016 at 1:32 PM, V Kumar <vikramvilli(a)gmail.com> wrote:

Recently I started using cloud foundry admin_ui .When I logging in I am
getting Scope Error Insufficient scope for user while logging in.I
followd all the steps in
https://github.com/cloudfoundry-incubator/admin-ui/blob/534fd698ff504c286531022110b6205cf91cd029/README.md#running-with-bosh-lite-cloudfoundry
I am giving correct user name and password while logging in.I checked
UAA.log I found password is not macthing.I Even tried by creating new user
same issue I am getting.Please help me on this

uaa.log
DEBUG --- JdbcTemplate: Executing prepared SQL statement [select
client_id, client_secret, resource_ids, scope, authorized_grant_types,
web_server_redirect_uri, authorities, access_token_validity,
refresh_token_validity, additional_information, autoapprove from
oauth_client_details where client_id = ?]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... DEBUG ---
DaoAuthenticationProvider: Authentication failed: password does not match
stored value
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... INFO ---
Audit: PrincipalAuthenticationFailure ('null'): principal=admin_ui_client,
origin=[10.22.0.82]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... INFO ---
Audit: ClientAuthenticationFailure ('Bad credentials'):
principal=admin_ui_client, origin=[remoteAddress=10.22.0.82,
clientId=admin_ui_client]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... DEBUG ---
BasicAuthenticationFilter: Authentication request for failed:
org.springframework.security.authentication.BadCredentialsException: Bad
credentials


Scope Error Insufficient scope for user

V Kumar
 

Recently I started using cloud foundry admin_ui .When I logging in I am getting Scope Error Insufficient scope for user while logging in.I followd all the steps in https://github.com/cloudfoundry-incubator/admin-ui/blob/534fd698ff504c286531022110b6205cf91cd029/README.md#running-with-bosh-lite-cloudfoundry
I am giving correct user name and password while logging in.I checked UAA.log I found password is not macthing.I Even tried by creating new user same issue I am getting.Please help me on this

uaa.log
DEBUG --- JdbcTemplate: Executing prepared SQL statement [select client_id, client_secret, resource_ids, scope, authorized_grant_types, web_server_redirect_uri, authorities, access_token_validity, refresh_token_validity, additional_information, autoapprove from oauth_client_details where client_id = ?]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... DEBUG --- DaoAuthenticationProvider: Authentication failed: password does not match stored value
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... INFO --- Audit: PrincipalAuthenticationFailure ('null'): principal=admin_ui_client, origin=[10.22.0.82]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... INFO --- Audit: ClientAuthenticationFailure ('Bad credentials'): principal=admin_ui_client, origin=[remoteAddress=10.22.0.82, clientId=admin_ui_client]
[2016-04-14 07:26:53.491] uaa - 5784 [http-bio-8080-exec-7] .... DEBUG --- BasicAuthenticationFilter: Authentication request for failed: org.springframework.security.authentication.BadCredentialsException: Bad credentials


Re: AWS / CF v233 deployment

George Dean
 

Hi Sylvain,

Good to hear, let us know if you have any other problems.


Re: How can we customized "404 Not Found"

Mike Youngstrom <youngm@...>
 

We passed the smoke tests by:

* Only returning a 503 if the requested route exists.
* Embed the old 404 page text in a comment of the returned html.

Mike

On Thu, Apr 14, 2016 at 9:56 AM, Stefan Mayr <stefan(a)mayr-stefan.de> wrote:

Hi,

Am 14.04.2016 um 13:23 schrieb James Leavers:

The end of this presentation [1] from the CF Summit has an example of
creating a wildcard route to route 404s to an app, using cf map-route.

[1]

http://berlin2015.cfsummit.com/sites/berlin2015.cfsummit.com/files//pages/files/summit-berlin.pdf

On 14 April 2016 at 11:20:17, Stanley Shen (meteorping(a)gmail.com
<mailto:meteorping(a)gmail.com>) wrote:

Thanks Mike.
Could you show me how can I do things like that, of if there are some
document about it?

If there are other ways to do that?

Thanks,
Stanley
This week we did basically the same
- map application with a wildcard domain
- change the errorpage
- change the statuscode to 503 service unavailable (search engines should
not remove us from their index because of a 404)

We used a staticfile application with a modified nginx.conf to achive
this. Michal Kuratczyk put a simplified prototype into his github repo (
https://github.com/mkuratczyk/maintenance)

Warning: this broke the smoke tests for Pivotal Cloud Foundry 1.6

Regards,

Stefan Mayr


Re: How can we customized "404 Not Found"

Stefan Mayr
 

Hi,

Am 14.04.2016 um 13:23 schrieb James Leavers:
The end of this presentation [1] from the CF Summit has an example of
creating a wildcard route to route 404s to an app, using cf map-route.

[1]
http://berlin2015.cfsummit.com/sites/berlin2015.cfsummit.com/files//pages/files/summit-berlin.pdf

On 14 April 2016 at 11:20:17, Stanley Shen (meteorping(a)gmail.com
<mailto:meteorping(a)gmail.com>) wrote:

Thanks Mike.
Could you show me how can I do things like that, of if there are some
document about it?

If there are other ways to do that?

Thanks,
Stanley
This week we did basically the same
- map application with a wildcard domain
- change the errorpage
- change the statuscode to 503 service unavailable (search engines
should not remove us from their index because of a 404)

We used a staticfile application with a modified nginx.conf to achive
this. Michal Kuratczyk put a simplified prototype into his github repo
(https://github.com/mkuratczyk/maintenance)

Warning: this broke the smoke tests for Pivotal Cloud Foundry 1.6

Regards,

Stefan Mayr


Re: Remarks about the “confab” wrapper for consul

Benjamin Gandon
 

Thank you Amit for your answer.


I ran again in the “all-consuls-go-crazy” situation today, as quite every day actually. As soon as they start this flapping membership issue, the whole cf+diego deployment goes down.

Before I delete the content of the persistent storage, when I restart the consul servers, they don’t manage to elect a leader :
https://gist.github.com/bgandon/08707466324be7c9a093a56fd95a64e4 <https://gist.github.com/bgandon/08707466324be7c9a093a56fd95a64e4>

After I delete /var/vcap/store/consul_agent on all 3 consul servers, a consul leader is properly elected, but the cluster rapidly re-start flapping again with failures suspicions, missing acks, and timeouts :
https://gist.github.com/bgandon/cab53c22da66b24beff46389ba7f0bdc <https://gist.github.com/bgandon/cab53c22da66b24beff46389ba7f0bdc>

And at that time, the load of the bosh-ite VM goes up to 280+ and everything becomes very unresponsive.

How is it possible to bring the consul cluster in a healthy state again? I don’t want to reboot the bosh-lite VM and recreate all deployments with cloudchecks anymore.


/Benjamin

Le 11 avr. 2016 à 22:40, Amit Gupta <agupta(a)pivotal.io> a écrit :

Orchestrating a raft cluster in a way that requires no manual intervention is incredibly difficult. We write the PID file late for a specific reason:

https://www.pivotaltracker.com/story/show/112018069
<https://www.pivotaltracker.com/story/show/112018069>

For dealing with wedged states like the one you encountered, we have some recommendations in the documentation:

https://github.com/cloudfoundry-incubator/consul-release/#disaster-recovery <https://github.com/cloudfoundry-incubator/consul-release/#disaster-recovery>

We have acceptance tests we run in CI that exercise rolling a 3 node cluster, so if you hit a failure it would be useful to get logs if you have any.

Cheers,
Amit

On Mon, Apr 11, 2016 at 9:38 AM, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> wrote:
Actually, doing some further tests, I realize a mere 'join' is definitely not enough.

Instead, you need to restore the raft/peers.json on each one of the 3 consul server nodes:

monit stop consul_agent
echo '["10.244.0.58:8300 <http://10.244.0.58:8300/>","10.244.2.54:8300 <http://10.244.2.54:8300/>","10.244.0.54:8300 <http://10.244.0.54:8300/>"]' > /var/vcap/store/consul_agent/raft/peers.json

And make sure you start them quite at the same time with “monit start consul_agent”

So this advocates a strongly for setting skip_leave_on_interrupt=true and leave_on_terminate=false in confab, because loosing the peers.json is really something we don't want in our CF deployments!

/Benjamin


Le 11 avr. 2016 à 18:15, Benjamin Gandon <benjamin(a)gandon.org <mailto:benjamin(a)gandon.org>> a écrit :

Hi cf devs,


I’m running a CF deployment with redundancy, and I just experienced my consul servers not being able to elect any leader.
That’s a VERY frustrating situation that keeps the whole CF deployment down, until you get a deeper understanding of consul, and figure out they just need a silly manual 'join' so that they get back together.

But that was definitely not easy to nail down because at first look, I could just see monit restarting the “agent_ctl” every 60 seconds because confab was not writing the damn PID file.


More specifically, the 3 consul servers (i.e. consul_z1/0, consul_z1/1 and consul_z2/0) had properly left oneanother uppon a graceful shutdown. This state was persisted in /var/vcap/store/raft/peers.json being “null” on each one of them, so they would not get back together on restart. A manual 'join' was necessary. But it took me hours to get there because I’m no expert with consul.

And until the 'join' is made, VerifySynced() was negative in confab, and monit was constantly starting and stopping it every 60 seconds. But once you step back, you realize confab was actually waiting for the new leader to be elected before it writes the PID file. Which is questionable.

So, I’m asking 3 questions here:

1. Does writing the PID file in confab that late really makes sense?
2. Could someone please write some minimal documentation about confab, at least to tell what it is supposed to do?
3. Wouldn’t it be wiser that whenever any of the consul servers is not here, then the cluster gets unhealthy?

With this 3rd question, I mean that even on a graceful TERM or INT, no consul server should not perform any graceful 'leave'. With this different approach, then they would properly be back up even when performing a complete graceful restart of the cluster.

This can be done with those extra configs from the “confab” wrapper:

{
"skip_leave_on_interrupt": true,
"leave_on_terminate": false
}

What do you guys think of it?


/Benjamin


Re: CF Auto-scaling with an external application

Daniel Mikusa
 

On Fri, Apr 8, 2016 at 2:12 PM, Giovanni Napoli <gio.napoli2(a)gmail.com>
wrote:

@Daniel Mikusa

Thank you for your support. I have few more questions and hope that you
can help me.
If i'll use the library you linked, i'd like to have some suggestions
about the task i have to solve:

- is there a way to collect in some way the data i need in a struct that i
could use for send "the scale command"? I mean, would be great to have a
struct in wich i could have fields like "AppName", "CpuUse", "MemoryUse",
etc. so i cloud just check the App.CpuUse field, for istance, and send a
"cf scale" command to CF to solve resource problems.

- i found this client library for CF, do you think could help my work?
https://github.com/cloudfoundry/cf-java-client/ I'm asking you cause i'll
prefer to use Java, by the way. Is you know, is there a way to have ".jar"
library of this repo so i could use as simple as i can? Also i found this
http://www.ibm.com/developerworks/cloud/library/cl-bluemix-cloudappswithjava/index.html
and a "cloudfoundry-client-lib.jar" online but i don't kwon if could be
good for my problem.
There's two parts to this problem:

1.) You need to get metrics from the platform. I don't think (although
this could have changed, since it's being rewritten at the moment) that
cf-java-client supports this. I know it will allow you to stream logs, but
I'm not sure if you can get metrics as well. If Ben Hale see's this,
perhaps he can comment and confirm.

2.) Once you get metrics, you'd have to manage them and when appropriate
initiate a request to scale the app. You can definitely use the
cf-java-client for this. You can also use it to query information if for
example you have an app guid and need the app name.

Hope that helps!

Thanks,

Dan


Re: Static IP setup for routers on AWS

Daniel Mikusa
 

On Fri, Apr 8, 2016 at 7:04 AM, Engelke, Johannes <info(a)johannes-engelke.de>
wrote:

Hi Amit,
thanks for your answer. I deployed cloud foundry without using static
IP’s. It is working well.

As far as I understood the uaa config the entire 10.x.x.x network is
allowed to access the UAA Servers anyway, so there is no reason to place
the dedicated static IP's of the routers into the config.
Are you referring to the RemoteIpValve that is configured for UAA?

https://github.com/cloudfoundry/uaa-release/blob/develop/jobs/uaa/templates/tomcat.server.xml.erb#L70-L73

Because the RemoteIpValve doesn't restrict access to Tomcat / UAA. It's
controls how (and if) Tomcat handles the x-forwarded-* headers. In short,
it will only process those headers if it "trusts" them (by trust, it really
means if the regex matches).

My understanding is that the UAA job will take the gorouter IP's and
prepend them to the front of this regex so that it will always match at
least the IP's for the gorouter. If you're using private IP's, it's not
really necessary as the default regex used by Tomcat will match all private
IP's.

If you're using public IP's for some reason, you'd need to configure this
or UAA might not detect the incoming connects as HTTPS and it would very
likely detect the wrong remote IP address (necessary for audit records in
the logs).


Do you see any security improvements, if only routers are allowed to
access the UAA?
As long as we're talking about RemoteIpValve, sorry if I'm not following
the conversation completely I jumped in a little late, and you're using
private IP addresses for your VMs then I don't see any difference in
behavior.

If you have public IP's assigned to your gorouter VMs then you may see some
issues with how the x-forwarded-for and x-forwarded-proto headers are
processed, which in turn could affect the accuracy of the audit messages in
the logs.

Hope that helps!

Dan



On 08 Apr 2016, at 02:19, Amit Gupta <agupta(a)pivotal.io> wrote:

The UAA needs to know the router IPs to know which IPs to accept inbound
requests from. If you don't care about this, you can try configuring UAA
to allow requests from many IPs, and remove the static IPs from gorouter.
I would be interested to find out the result of this experiment should you
try it out.

Best,
Amit

On Thu, Apr 7, 2016 at 6:28 AM, Engelke, Johannes <
info(a)johannes-engelke.de> wrote:

Hi,
does anybody know, why the routers got static ips in the
cf-infrastructure-aws.yml file?
https://github.com/cloudfoundry/cf-release/blob/master/templates/cf-infrastructure-aws.yml#L173

Bosh is assigning the instances to ELB’s during deploy time, so there
should be no need to have static addresses here.

If nobody know’s a good reason should we remove them ;-)

Cheers
Johannes


Re: cf push working

Bharath
 

https://docs.pivotal.io/pivotalcf/concepts/how-applications-are-staged.html

you can check the curl messages by exporting environment variable
CF_TRACE=true

bharath

On Wed, Apr 13, 2016 at 12:21 PM, Ankur Srivastava <ankursri1(a)gmail.com>
wrote:

Hi,
What does cf push do in the background ? What all steps are done by cf
push.

Regards,
Ankur


Removing 1.4 support from the Go Buildpack

Danny Rosen
 

Hello!

As the [official Go release policy](
https://golang.org/doc/devel/release.html) has deprecated support for Go
1.4, the Buildpacks team would like to propose the removal of Go 1.4
support from the Go buildpack as well.

If you have any questions or concerns regarding this change we would like
to hear from you! Please respond to this thread or this issue [1
<https://github.com/cloudfoundry/go-buildpack/issues/36>] no later than
4/25, when we plan to schedule work to make the change.

[1] - https://github.com/cloudfoundry/go-buildpack/issues/36


Re: How can we customized "404 Not Found"

James Leavers
 

The end of this presentation [1] from the CF Summit has an example of creating a wildcard route to route 404s to an app, using cf map-route.

[1] http://berlin2015.cfsummit.com/sites/berlin2015.cfsummit.com/files//pages/files/summit-berlin.pdf

On 14 April 2016 at 11:20:17, Stanley Shen (meteorping(a)gmail.com) wrote:

Thanks Mike.
Could you show me how can I do things like that, of if there are some document about it?

If there are other ways to do that?

Thanks,
Stanley


Re: AWS / CF v233 deployment

Sylvain Gibier
 

Hi,

i end up regenerating the whole stack using v234 instead and it worked -
using the same consul certificates.

Sylvain

On Wed, Apr 13, 2016 at 7:54 PM, Christian Ang <cang(a)pivotal.io> wrote:

Hi Sylvain,

It looks like your problem might be that one or more of the consul
certificates in your cf manifest is not a valid PEM encoded certificate, or
the certificates are missing entirely. Do the consul properties in your cf
manifest look approximately like this (with your own certificates and keys):


https://github.com/cloudfoundry-incubator/consul-release/blob/master/manifests/aws/multi-az-ssl.yml#L122-L261

Also, if you decode your certificates by running `openssl x509 -in
server-ca.crt -text -noout`, do they appear to be valid?

If they are invalid you can try regenerating them using
`scripts/generate-consul-certs` and copying each files contents into the
appropriate place in your cf manifest's consul properties.

Thanks,
Christian and George


Re: How can we customized "404 Not Found"

Stanley Shen <meteorping@...>
 

Thanks Mike.
Could you show me how can I do things like that, of if there are some document about it?

If there are other ways to do that?

Thanks,
Stanley


Re: Doppler/Firehose - Multiline Log Entry

Mike Youngstrom <youngm@...>
 

I'm an idiot. I see what you and Eric are saying now. Put the code in
Dropsonde then let the Executor simply initialize Dropsonde that way.
Works for me.

Thanks,
Mike

On Wed, Apr 13, 2016 at 5:26 PM, Jim CF Campbell <jcampbell(a)pivotal.io>
wrote:

My last 2 cents. It'll be configurable so will only be active in users of
dropsonde that want the functionality such as the Executor.

On Wed, Apr 13, 2016 at 5:21 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

You may want to reference the issue I created on executor.

In that issue I note that I don't think dropsonde is the right place to
do this token replacement because dropsonde doesn't know that the event
originally came through the limited stdout/stderr interface that needs this
functionality. However, executor does. If I'm using the dropsonde API
directly where I can safely put new line characters I don't want dropsonde
looking to replace a character I don't want replaced especially since that
character replacement isn't even needed when using a more rich interface
like dropsonde directly.

That's my 2 cents.

Thanks,
Mike

On Wed, Apr 13, 2016 at 4:34 PM, Jim CF Campbell <jcampbell(a)pivotal.io>
wrote:

We're going to look into it
<https://www.pivotaltracker.com/story/show/117583365>.

On Wed, Apr 13, 2016 at 12:33 PM, Eric Malm <emalm(a)pivotal.io> wrote:

Thanks, Mike. If source-side processing is the right place to do
that \u2028-to-newline substitution, I think that there could also be a
config option on the dropsonde library to have its LogSender perform that
within each message before forwarding it on. The local metron-agent could
also do that processing. I think it's appropriate to push as much of that
log processing as possible to the Loggregator components and libraries:
it's already a bit much that the executor knows anything at all about the
content of the byte-streams that it receives from the stdout and stderr of
a process in the container, so that it can break those streams into the
log-lines that the dropsonde library expects.

Best,
Eric

On Wed, Apr 13, 2016 at 11:00 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Thanks for the insight Jim. I still think that the Executor is the
place to fix this since multi line logging isn't a Loggregator limitation
it is a log inject limitation which is owned by the Executor. I'll open an
issue with Diego and see how it goes.

Thanks,
Mike

On Tue, Apr 12, 2016 at 2:51 PM, Jim CF Campbell <jcampbell(a)pivotal.io
wrote:
That strategy is going to be hard to sell. Diego's Executor takes the
log lines out of Garden and drops them into dropsonde messages. I doubt
they'll think it's a good idea to implement substitution in that
processing. You can certainly ask Eric - he's very aware of the underlying
problem.

After that point, the Loggregator system does it's best to touch
messages as little as possible, and to improve performance and reliability,
we have thinking about the future that will lower the amount of touching
ever further. The next place that log message processing can be done is
either in a nozzle, or the injester of a log aggregator.

I'd vote for those downstream places - a single configuration and
algorithm instead of distributed across runner VMs.

On Tue, Apr 12, 2016 at 2:15 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I was thinking whoever demarcates and submits the original event to
loggregator. dea_logging_agent and the equivalent in Deigo. Doing it at
that point could provide a bit more flexibility. I know this isn't
necessarily the loggregator team's code but I think loggregator team buy
off would be important for those projects to accept such a PR.

Unless you can think of a better place to make that transformation
within the loggregator processing chain?

Mike

On Tue, Apr 12, 2016 at 2:02 PM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

what exactly do you mean by "event creation time"?

On Tue, Apr 12, 2016 at 1:57 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Before I submit the CLI issue let me ask one more question.

Would it be better to replace the newline token with /n at event
creation time instead of asking the cli, splunk, anyone listening on the
firehose, etc. to do so?

The obvious downside is this would probably need to be a global
configuration. However, I know my organization wouldn't have a problem
swapping /u2028 with /n for a deployment. The feature would obviously be
off by default.

Thoughs?

Mike

On Tue, Apr 12, 2016 at 11:24 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Sounds good. I'll submit an issue to start the discussion. I
imagine the first question Dies will ask though is if you would support
something like that. :)

Mike

On Tue, Apr 12, 2016 at 11:12 AM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

cf logs
<https://github.com/cloudfoundry/cli/blob/40eb5be48eaac697c3700d5f1e6f654bae471cec/cf/commands/application/logs.go>
is actually maintained by the CLI team under Dies
<https://www.pivotaltracker.com/n/projects/892938>. You can
talk to them. I'll certainly support you by helping explain the need. I'd
think we want a general solution (token in ENV for instance).



On Tue, Apr 12, 2016 at 11:02 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Jim,

If I submitted a CLI PR to change the cf logs command to
substitute /u2028 with /n could the loggregator team get behind that?

Mike

On Tue, Apr 12, 2016 at 10:20 AM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

Mike,

When you get a bit more desperate ;-) here is a nozzle plug in
<https://github.com/jtuchscherer/nozzle-plugin> for the CLI.
It's attaches to the firehose to display everything, but would be easy to
modify to just look at a single app, and sub out the magic token for
newlines.

Jim

On Tue, Apr 12, 2016 at 9:56 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Hi David,

The problem for me is that I'm searching for a solution that
can works for development (though less of a priority cause you can switch
config between dev and cf) and for viewing logs via "cf logs" in addition
to a log aggregator. I had hoped that /u2028 would work for viewing logs
via "cf logs" but it doesn't in bash. I'd need to write a plugin or
something for cf logs and train all my users to use it. Certainly possible
but I'm not that desperate yet. :)

Mike

On Tue, Apr 12, 2016 at 5:58 AM, David Laing <
david(a)davidlaing.com> wrote:

FWIW, the technique is to have your logging solution (eg,
logback, log4j) log a token (eg, \u2028) other than \n to
denote line breaks in your stack traces; and then have your log aggregation
software replace that token with a \n again when processing the log
messages.

If \u2028 doesn't work in your environment; use something
else; eg NEWLINE

On Mon, 11 Apr 2016 at 21:12 Mike Youngstrom <
youngm(a)gmail.com> wrote:

Finally got around to testing this. Preliminary testing
show that "\u2028" doesn't function as a new line
character in bash and causes eclipse console to wig out. I don't think "
\u2028" is a viable long term solution. Hope you make
progress on a metric format available to an app in a container. I too
would like a tracker link to such a feature if there is one.

Thanks,
Mike

On Mon, Mar 14, 2016 at 2:28 PM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Hi Jim,

So, to be clear what we're basically doing is using
unicode newline character to fool loggregator (which is looking for \n)
into thinking that it isn't a new log event right? Does \u2028 work as a
new line character when tailing logs in the CLI? Anyone tried this unicode
new line character in various consoles? IDE, xterm, etc? I'm wondering if
developers will need to have different config for development.

Mike

On Mon, Mar 14, 2016 at 12:17 PM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

Hi Mike and Alex,

Two things - for Java, we are working toward defining an
enhanced metric format that will support transport of Multi Lines.

The second is this workaround that David Laing suggested
for Logstash. Think you could use it for Splunk?

With the Java Logback library you can do this by adding
"%replace(%xException){'\n','\u2028'}%nopex" to your logging config[1] ,
and then use the following logstash conf.[2]
Replace the unicode newline character \u2028 with \n,
which Kibana will display as a new line.

mutate {

gsub => [ "[@message]", '\u2028', "

"]
^^^ Seems that passing a string with an actual newline in
it is the only way to make gsub work

}

to replace the token with a regular newline again so it
displays "properly" in Kibana.

[1] github.com/dpin...ication.yml#L12
<https://github.com/dpinto-pivotal/cf-SpringBootTrader-config/blob/master/application.yml#L12>

[2] github.com/logs...se.conf#L60-L64
<https://github.com/logsearch/logsearch-for-cloudfoundry/blob/master/src/logsearch-config/src/logstash-filters/snippets/firehose.conf#L60-L64>


On Mon, Mar 14, 2016 at 11:11 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

I'll let the Loggregator team respond formally. But, in
my conversations with the Loggregator team I think we're basically stuck
not sure what the right thing to do is on the client side. How does the
client trigger in loggregator that this is a multi line log message or what
is the right way for loggregator to detect that the client is trying to
send a multi line log message? Any ideas?

Mike

On Mon, Mar 14, 2016 at 10:25 AM, Aliaksandr Prysmakou <
prysmakou(a)gmail.com> wrote:

Hi guys,
Are there any updates about "Multiline Log Entry"
issue? How correctly deal with stacktraces?
Links to the tracker to read?
----
Alex Prysmakou / Altoros
Tel: (617) 841-2121 ext. 5161 | Toll free: 855-ALTOROS
Skype: aliaksandr.prysmakou
www.altoros.com | blog.altoros.com |
twitter.com/altoros


--
Jim Campbell | Product Manager | Cloud Foundry |
Pivotal.io | 303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io | 303.618.0963


Re: Doppler/Firehose - Multiline Log Entry

Jim CF Campbell
 

My last 2 cents. It'll be configurable so will only be active in users of
dropsonde that want the functionality such as the Executor.

On Wed, Apr 13, 2016 at 5:21 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

You may want to reference the issue I created on executor.

In that issue I note that I don't think dropsonde is the right place to do
this token replacement because dropsonde doesn't know that the event
originally came through the limited stdout/stderr interface that needs this
functionality. However, executor does. If I'm using the dropsonde API
directly where I can safely put new line characters I don't want dropsonde
looking to replace a character I don't want replaced especially since that
character replacement isn't even needed when using a more rich interface
like dropsonde directly.

That's my 2 cents.

Thanks,
Mike

On Wed, Apr 13, 2016 at 4:34 PM, Jim CF Campbell <jcampbell(a)pivotal.io>
wrote:

We're going to look into it
<https://www.pivotaltracker.com/story/show/117583365>.

On Wed, Apr 13, 2016 at 12:33 PM, Eric Malm <emalm(a)pivotal.io> wrote:

Thanks, Mike. If source-side processing is the right place to do
that \u2028-to-newline substitution, I think that there could also be a
config option on the dropsonde library to have its LogSender perform that
within each message before forwarding it on. The local metron-agent could
also do that processing. I think it's appropriate to push as much of that
log processing as possible to the Loggregator components and libraries:
it's already a bit much that the executor knows anything at all about the
content of the byte-streams that it receives from the stdout and stderr of
a process in the container, so that it can break those streams into the
log-lines that the dropsonde library expects.

Best,
Eric

On Wed, Apr 13, 2016 at 11:00 AM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Thanks for the insight Jim. I still think that the Executor is the
place to fix this since multi line logging isn't a Loggregator limitation
it is a log inject limitation which is owned by the Executor. I'll open an
issue with Diego and see how it goes.

Thanks,
Mike

On Tue, Apr 12, 2016 at 2:51 PM, Jim CF Campbell <jcampbell(a)pivotal.io>
wrote:

That strategy is going to be hard to sell. Diego's Executor takes the
log lines out of Garden and drops them into dropsonde messages. I doubt
they'll think it's a good idea to implement substitution in that
processing. You can certainly ask Eric - he's very aware of the underlying
problem.

After that point, the Loggregator system does it's best to touch
messages as little as possible, and to improve performance and reliability,
we have thinking about the future that will lower the amount of touching
ever further. The next place that log message processing can be done is
either in a nozzle, or the injester of a log aggregator.

I'd vote for those downstream places - a single configuration and
algorithm instead of distributed across runner VMs.

On Tue, Apr 12, 2016 at 2:15 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I was thinking whoever demarcates and submits the original event to
loggregator. dea_logging_agent and the equivalent in Deigo. Doing it at
that point could provide a bit more flexibility. I know this isn't
necessarily the loggregator team's code but I think loggregator team buy
off would be important for those projects to accept such a PR.

Unless you can think of a better place to make that transformation
within the loggregator processing chain?

Mike

On Tue, Apr 12, 2016 at 2:02 PM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

what exactly do you mean by "event creation time"?

On Tue, Apr 12, 2016 at 1:57 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Before I submit the CLI issue let me ask one more question.

Would it be better to replace the newline token with /n at event
creation time instead of asking the cli, splunk, anyone listening on the
firehose, etc. to do so?

The obvious downside is this would probably need to be a global
configuration. However, I know my organization wouldn't have a problem
swapping /u2028 with /n for a deployment. The feature would obviously be
off by default.

Thoughs?

Mike

On Tue, Apr 12, 2016 at 11:24 AM, Mike Youngstrom <youngm(a)gmail.com
wrote:
Sounds good. I'll submit an issue to start the discussion. I
imagine the first question Dies will ask though is if you would support
something like that. :)

Mike

On Tue, Apr 12, 2016 at 11:12 AM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

cf logs
<https://github.com/cloudfoundry/cli/blob/40eb5be48eaac697c3700d5f1e6f654bae471cec/cf/commands/application/logs.go>
is actually maintained by the CLI team under Dies
<https://www.pivotaltracker.com/n/projects/892938>. You can talk
to them. I'll certainly support you by helping explain the need. I'd think
we want a general solution (token in ENV for instance).



On Tue, Apr 12, 2016 at 11:02 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Jim,

If I submitted a CLI PR to change the cf logs command to
substitute /u2028 with /n could the loggregator team get behind that?

Mike

On Tue, Apr 12, 2016 at 10:20 AM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

Mike,

When you get a bit more desperate ;-) here is a nozzle plug in
<https://github.com/jtuchscherer/nozzle-plugin> for the CLI.
It's attaches to the firehose to display everything, but would be easy to
modify to just look at a single app, and sub out the magic token for
newlines.

Jim

On Tue, Apr 12, 2016 at 9:56 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Hi David,

The problem for me is that I'm searching for a solution that
can works for development (though less of a priority cause you can switch
config between dev and cf) and for viewing logs via "cf logs" in addition
to a log aggregator. I had hoped that /u2028 would work for viewing logs
via "cf logs" but it doesn't in bash. I'd need to write a plugin or
something for cf logs and train all my users to use it. Certainly possible
but I'm not that desperate yet. :)

Mike

On Tue, Apr 12, 2016 at 5:58 AM, David Laing <
david(a)davidlaing.com> wrote:

FWIW, the technique is to have your logging solution (eg,
logback, log4j) log a token (eg, \u2028) other than \n to
denote line breaks in your stack traces; and then have your log aggregation
software replace that token with a \n again when processing the log
messages.

If \u2028 doesn't work in your environment; use something
else; eg NEWLINE

On Mon, 11 Apr 2016 at 21:12 Mike Youngstrom <
youngm(a)gmail.com> wrote:

Finally got around to testing this. Preliminary testing
show that "\u2028" doesn't function as a new line character
in bash and causes eclipse console to wig out. I don't think "
\u2028" is a viable long term solution. Hope you make
progress on a metric format available to an app in a container. I too
would like a tracker link to such a feature if there is one.

Thanks,
Mike

On Mon, Mar 14, 2016 at 2:28 PM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

Hi Jim,

So, to be clear what we're basically doing is using unicode
newline character to fool loggregator (which is looking for \n) into
thinking that it isn't a new log event right? Does \u2028 work as a new
line character when tailing logs in the CLI? Anyone tried this unicode new
line character in various consoles? IDE, xterm, etc? I'm wondering if
developers will need to have different config for development.

Mike

On Mon, Mar 14, 2016 at 12:17 PM, Jim CF Campbell <
jcampbell(a)pivotal.io> wrote:

Hi Mike and Alex,

Two things - for Java, we are working toward defining an
enhanced metric format that will support transport of Multi Lines.

The second is this workaround that David Laing suggested
for Logstash. Think you could use it for Splunk?

With the Java Logback library you can do this by adding
"%replace(%xException){'\n','\u2028'}%nopex" to your logging config[1] ,
and then use the following logstash conf.[2]
Replace the unicode newline character \u2028 with \n,
which Kibana will display as a new line.

mutate {

gsub => [ "[@message]", '\u2028', "

"]
^^^ Seems that passing a string with an actual newline in
it is the only way to make gsub work

}

to replace the token with a regular newline again so it
displays "properly" in Kibana.

[1] github.com/dpin...ication.yml#L12
<https://github.com/dpinto-pivotal/cf-SpringBootTrader-config/blob/master/application.yml#L12>

[2] github.com/logs...se.conf#L60-L64
<https://github.com/logsearch/logsearch-for-cloudfoundry/blob/master/src/logsearch-config/src/logstash-filters/snippets/firehose.conf#L60-L64>


On Mon, Mar 14, 2016 at 11:11 AM, Mike Youngstrom <
youngm(a)gmail.com> wrote:

I'll let the Loggregator team respond formally. But, in
my conversations with the Loggregator team I think we're basically stuck
not sure what the right thing to do is on the client side. How does the
client trigger in loggregator that this is a multi line log message or what
is the right way for loggregator to detect that the client is trying to
send a multi line log message? Any ideas?

Mike

On Mon, Mar 14, 2016 at 10:25 AM, Aliaksandr Prysmakou <
prysmakou(a)gmail.com> wrote:

Hi guys,
Are there any updates about "Multiline Log Entry" issue?
How correctly deal with stacktraces?
Links to the tracker to read?
----
Alex Prysmakou / Altoros
Tel: (617) 841-2121 ext. 5161 | Toll free: 855-ALTOROS
Skype: aliaksandr.prysmakou
www.altoros.com | blog.altoros.com | twitter.com/altoros


--
Jim Campbell | Product Manager | Cloud Foundry |
Pivotal.io | 303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963
--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io | 303.618.0963