Date   

Failing to push standalone java app

Rahul Gupta
 

Hi,

I am trying to push a standalone Java app that has a 'public static void main(..)' and uses other dependencies. I tried setting the classpath in the jar's MANIFEST.MF, created a new jar that also contains dependent jars in its root and did a cf push but that didn't help either - the 'cf push -p xxxxxxx.jar' fails while resolving runtime dependencies

e.g. ERR Exception in thread "main" java.lang.NoClassDefFoundError: com/XXX/client/AbcXyz


Here is the content of manifest.mf:
Manifest-Version: 1.0
Archiver-Version: Plexus Archiver
Built-By: smokingfly
Class-Path: XXX-123.jar AAA.789.jar
Created-By: Apache Maven 3.2.3
Build-Jdk: 1.8.0_40
Main-Class: com.cf.samples.TestClient

TestClient is the class with main method.

I could not find any documentation that could help me with this. Could someone please help?

Many thanks.


Re: [abacus] Accommodating for plans in a resource config

Jean-Sebastien Delfino
 

On Fri, Dec 11, 2015 at 4:47 PM, Benjamin Cheng <bscheng(a)us.ibm.com> wrote:

Abacus will want to support plans in its resource config (as mentioned in
issue #153 https://github.com/cloudfoundry-incubator/cf-abacus/issues/153)

Starting with a basic approach, there would be a plans property(an array)
added to the top-level of a resource config. The current metrics and
measures properties would be moved under that plans property. This will
allow them to be scoped to a plan.

+1 that makes sense to me as different plans may want to use different
measures, metrics, and metering, accumulation and aggregation functions.


Despite moving metrics and measures under plans, there will be a need of a
common sets of measures/metrics for plans to fall back on. This comes into
play in the report for example when summary/charge functions are running on
aggregated usage across all plans.
Not sure about that. AIUI with that refined design plans can now use
different metrics so usage gets aggregated at the plan level rather than
the resource level (as it wouldn't make sense to aggregate usage from
different plans metered using different metrics). That means that the
aggregation, summary and charge functions only apply to the plan level
rather than the resource level.



In terms of the common section, there's of a choice of leaving
measures/metrics on the top level as the common/default or putting those
under a different property name.

I think there's a couple of things to consider here:
-Defaulting for a plan to the common section if there is no formula
defined. This may require the plan to point to the common section or logic
that would automatically default to the common section (and subsequently
the absolute resource config defaults that are already in place).
-If there's no plan id passed(for example some of the charge/summary
calls), they would need to go this common section.

Assuming that my above statement that 'aggregation, summary and charge
functions only apply to the plan level' is correct, there's no 'common
section' anymore, so no problem with processing usage in that non-existent
common section anymore :) Makes sense?


Thoughts/Concerns/Suggestions?
- Jean-Sebastien


Re: Organization quota definition-questions

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Sorry, before I didn't reply some questions.

1. Didn't test it. In my tests, I defined a quota at org level but I will test it.
2. I answered with the pseudocode.
3. The space adquired the limits defined in the quota for the organization.

Juan Antonio


Re: Organization quota definition-questions

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Hi,

You have the reason. Disk quota is a parameter defined to app level only.
http://apidocs.cloudfoundry.org/213/apps/creating_an_app.html

When you create a new App, you define a set of parameters and one of them is disk of quota but when you define a Org Quota, disk_quota is not defined at that level.
http://apidocs.cloudfoundry.org/213/organization_quota_definitions/creating_a_organization_quota_definition.html

I am not sure if someone from Pivotal could confirm this fact, but I think that CC API doesn't have that feature at org/space level.

Anyway, at the moment, using the API, it is possible to do the same task but not in a direct way:

IDEA:

spaces = getSpacesFromOrg(org_guid)
long org_disk_quota = 0;
for each(space in spaces) {
apps = getAppsFromSpace(space_guid)
for each(app in apps) {
app_stat = getAppSummary(app_guid) or getAppStats(app_guid)
http://apidocs.cloudfoundry.org/226/apps/get_app_summary.html
http://apidocs.cloudfoundry.org/226/apps/get_detailed_stats_for_a_started_app.html
org_used_disk = app_stat.getDiskQuota();
}
}
System.out.println("Disk quota for current org: " + org_used_disk);

Juan Antonio


A hanged etcd used by hm9000 makes an impact on the delayed detection time of crashed application instances

Masumi Ito
 

Hi,

I found that one of etcds hanged up delayed the detection of crashed
application instances, resulting in the slow recovery time. Although this
depended on the condition of which hm9000 processes were connecting to the
each etcd VM, it approximately took up to 15min to recover and I think it
too long delayed.

Does anyone know how to calculate time for hm9000 to detect a hanged etcd VM
and switch to healthy etcds? I have encounted two different scenarios as
follows.

1. hm9000 analyzer was connecting to the hanged etcd however hm9000 listner
was connecting to the normal etcd. (About 8 min for analyzer to be
recovered. The other hm9000 analyzer took over instead.)
The analyzer seemed to be hanged up accidentally just after the connected
etcd was hanged because "Analyzer completed succesfully" was not found in
the log.
After approximately 8 min passed, the other hm9000 analyzer acquired the
lock and started to work instead. And then it identified crashed instance
and enqueued start message. the crashed app was relaunched within ten min
after the detection.

2. hm9000 analyzer was connecting to the normal etcd however hm9000 listner
was connecting to the hanged etcd. (About 15 min for listener to be
recovered. The same hm9000 listener seemed to be recovered somehow.)
The listener started to fail to sync heartbeats just after the connected
etcd was hanged. After 15min, "Save took too long. Not bumping freshness."
was showed in the listner's log and then analyzer also complained about the
old actual state: "Analyzer failed with error - Error:Actual state is not
fresh" and stopped analyzing tasks. After 10 sec hm9000 listener had
recovered somehow and started to bump freshness periodically then analyzer
also started to analyze actual state and desied state and raised the request
to start a crashed instance.

Regards,
Masumi



--
View this message in context: http://cf-dev.70369.x6.nabble.com/A-hanged-etcd-used-by-hm9000-makes-an-impact-on-the-delayed-detection-time-of-crashed-application-ins-tp3096.html
Sent from the CF Dev mailing list archive at Nabble.com.


Re: Organization quota definition-questions

Ponraj E
 

Hi Juan Antonio,

Thanks for the reply.

The API that you have mentioned gives me the memory usage of the org and not the disk quota/usage of the org. I need to know this info. In addition to that, I have added couple of more questions in my latest reply.

1. Sometimes the sum of space quota definition exceeds the org quota definition. Is this a valid use case or bug?
2. Currently at an org level, there is no API to display the disk quota limit/usage, but its only at the application level.How do we approach this?
3. Also at the space level, there is a possibility that a space not being associated with the space quota definition. So, how do we get the total resources available(like memory, services, routes) for this space?


Regards,
Ponraj


Re: Organization quota definition-questions

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Good morning,

yes it is possible.

If you observe PWS panel or Bluemix you can observe that information.

Every organization has binded a OrganizationQuota and this definition affects to every applicattion deployed in any spaces binded to that organization.

The REST methods used to get the definition is:

http://apidocs.cloudfoundry.org/213/organization_quota_definitions/retrieve_a_particular_organization_quota_definition.html

The method to read the memory used is:

http://apidocs.cloudfoundry.org/222/organizations/retrieving_organization_memory_usage.html

You have an example here:
https://github.com/prosociallearnEU/cf-nodejs-dashboard/blob/master/services/HomeService.js#L69-L79

Remember that the memory used is the active memory. You can many applications staged but stopped. When you sum memory to the counter is when you start a new application to set of applications running in a space.

Juan Antonio


回复:Re: about consul_agent's cert

于长江 <yuchangjiang at cmss.chinamobile.com...>
 

it works, thank you~




于长江
15101057694


原始邮件
发件人:Gwenn Etourneaugetourneau(a)pivotal.io
收件人:Discussions about Cloud Foundry projects and the system overall.cf-dev(a)lists.cloudfoundry.org
发送时间:2015年12月14日(周一) 12:48
主题:[cf-dev] Re: about consul_agent's cert


Please read the documentationhttp://docs.cloudfoundry.org/deploying/common/consul-security.html

On Mon, Dec 14, 2015 at 11:35 AM, 于长江 yuchangjiang(a)cmss.chinamobile.com wrote:

hi,
when i deploy cf-release, consul agent job failed start, i found the err log in the vm.


== Starting Consul agent...
== Error starting agent: Failed to start Consul server: Failed to parse any CA certificates
--------------------------------------------
then i found the configuration in cf’s manifest file is not correct,like this:


consul:
encrypt_keys:
- CONSUL_ENCRYPT_KEY
ca_cert: CONSUL_CA_CERT
server_cert: CONSUL_SERVER_CERT
server_key: CONSUL_SERVER_KEY
agent_cert: CONSUL_AGENT_CERT
agent_key: CONSUL_AGENT_KEY


i have no idea of how to complete these fields, can someone give me an example, thanks~


于长江
15101057694


Re: Organization quota definition-questions

Ponraj E
 

Hi,



Since the documentation for the quota definition is quite unclear at the moment, have more questions reg the same.

I want to display the resource consumption (memory,disk usage,etc) at the org and space level.

1. Sometimes the sum of space quota definition exceeds the org quota definition. Is this a valid use case or bug?
2. Currently at an org level, there is no API to display the disk quota limit/usage, but its only at the application level.How do we approach this?
3. Also at the space level, there is a possibility that a space not being associated with the space quota definition. So, how do we get the total resources available(like memory, services, routes) for this space?

Regards,
Ponraj


Organization quota definition-questions

Ponraj E
 

Hi,

Is it possible to get the disk quota at an organization level? As far as I see, the quota definition api doesnt return the disk quota[upper limit] info?

I want to calculate the used disc quota/Total disk quota for an organization.


Regards,
Ponraj


Re: Certificate management for non-Java applications

Daniel Mikusa
 

I think it depends on what language / runtime / library and where it's
looking for the default set of certs. It's easy with Java because it uses
it's own cert store and that file is owned by the vcap user. If a language
/ runtime / library is looking at `/etc/ssl/certs`, you can't change that
as a user (at least not from staging / runtime).

The best thing, just like Java, is if your applications provides you with
the facilities to configure its usage of certs. I'd imaging that all
language / runtime / libraries provide you with a way to override the
defaults and use your own certs.

Besides that, you might be able to set various environment variables to
point the default cert store to a different location. I'm not aware of a
standard one though, so it's likely going to depend on the specific
language / runtime / library and what it supports.

Dan


On Mon, Dec 14, 2015 at 6:01 PM, john mcteague <john.mcteague(a)gmail.com>
wrote:

Previous threads have focused on adding a trusted CA to the JDK's trust
store at application startup, a pattern that I have employed also.

We are facing increased demand from our non-Java developers to have the
same functionality. Whether it be custom CA's, certs for authentication
(against something like MQ for example) or for our internal LDAP server
which requires ldaps, we need a way to add user defined certificates at app
deploy time based on user requirements.

My work with Java buildpacks has resulted in a certificate as a service
style function; declare which cert from a certificate store should be
injected into the app at runtime. What I lack for non-java runtimes is a
reliable way to get those certs into the correct linux container directory
either during staging or at app startup.

Have others been able to establish a pattern around this? Without this
abiity we go from a polygot platform to simply Java only.

Thanks,
John


[Abacus] Tagging usage

KRuelY <kevinyudhiswara@...>
 

Hi,

I'm looking for a way to classify usage. My use case is this:

I have 2 type of usage:
1. Usage of type A.
2. Usage of type B.

The initial thought to do this with the current abacus is to create 2
separate plan(say plan a & plan b). This way, usage of type A would go under
'plan a' and usage of type B would go under 'plan b'.

Due to past design, I am unable to do so and hence I need to separate usage
under the same plan into 2 type: type A and type B.

A more detailed use case would be when pricing change in the middle of the
month. Here, I need to separate my usage into usage of type A that would use
the old pricing, and usage of type B that would use the new pricing.

The main reason of splitting them up is that I would like to have two
different bucket for the two usage because keeping the two usage in the same
bucket is not going to work(will give inaccurate result) due to:

Use cases:
1. Price changed on the 15th. Get quantity usage from beginning of the
month(A), and get quantity usage at the end of the month(B). The price would
be: A * old price + (B - A) * new price. This would work if the accumulation
formula is plain sum, but if the formula is max / average this would not
work.

2. Query beginning of the month to middle of the month and query middle of
the month to end of the month. querying from x to y is not supported, and
with the accumulating and retracting dataflow model that we have this is not
possible.

3. One way that would work is to utilize the flexibility of the meter,
accumulate, and aggregate functions in the resource config to separate usage
of type A and type B. I don't think this is a good design, but the point is
that we can utilize the flexibility of the functions in the resource config
to maybe generate a new idea.

Would tagging a usage be a good idea? For example Under plan we would have

plan:
bucket A:
aggregated_usage:
bucket B:
aggregated_usage:

my concern with this is we would have to add another (key, value) pair to
create another landmark window, and also the usage submitted would need
another field that will serve as the 'tag'.

Any thought or idea to implement this? Other than to create 2 separate plans
and tagging, the ideas mentioned above are not really viable.

Thanks!







--
View this message in context: http://cf-dev.70369.x6.nabble.com/Abacus-Tagging-usage-tp3089.html
Sent from the CF Dev mailing list archive at Nabble.com.


Certificate management for non-Java applications

john mcteague <john.mcteague@...>
 

Previous threads have focused on adding a trusted CA to the JDK's trust
store at application startup, a pattern that I have employed also.

We are facing increased demand from our non-Java developers to have the
same functionality. Whether it be custom CA's, certs for authentication
(against something like MQ for example) or for our internal LDAP server
which requires ldaps, we need a way to add user defined certificates at app
deploy time based on user requirements.

My work with Java buildpacks has resulted in a certificate as a service
style function; declare which cert from a certificate store should be
injected into the app at runtime. What I lack for non-java runtimes is a
reliable way to get those certs into the correct linux container directory
either during staging or at app startup.

Have others been able to establish a pattern around this? Without this
abiity we go from a polygot platform to simply Java only.

Thanks,
John


Re: Web sockets + Cloud Foundry

Matthew Sykes <matthew.sykes@...>
 

James Bayer posted about a fun experiment with web sockets a while back.
Might be a good starting point:

http://www.iamjambay.com/2013/12/send-interactive-commands-to-cloud.html

You need to make sure that the url uses the correct protocol and port for
your CF target. You can reference `doppler_logging_endpoint` from /v2/info
as a template.

On Mon, Dec 14, 2015 at 4:53 PM, Lakshman Mukkamalla (lmukkama) <
lmukkama(a)cisco.com> wrote:

Hi CF Dev team,
I am just starting to look into how web sockets based app can run in a
cloud foundry env. If you have any reference links/wiki’s that walk thru
this web sockets working, will help me here. I understand from cloud
foundry docs that it has some level of support with web sockets but will
give a try on a sample app in the meantime.

Thanks.


--
Matthew Sykes
matthew.sykes(a)gmail.com


Exposing more meta-data for admin buildpacks

Jack Cai
 

Now that most buildpacks only support a small set of runtime versions (as
defined in their manifest), it makes good sense to make this information
easily accessible by users. They can of course go to the github project and
open that manifest file, or read the doc, but I think the better way might
be making that information accessible through API/CLI. We already have a
set of API/CLI-commands for managing admin buildpacks. It seems pretty
straightforward to add that support, e.g., allowing users to issue:

cf buildpack xyz_buildpack

which returns some descriptive information on supported runtime versions
(and the default) in addition to existing information like index, enabled,
etc. These descriptive information can be registered when
"create-buildpack" is issued, either provided explicitly, or reading from a
certain file in the buildpack package.

Does anybody like this idea? Can we add it to the backlog?

Jack


Web sockets + Cloud Foundry

Lakshman Mukkamalla
 

Hi CF Dev team,
I am just starting to look into how web sockets based app can run in a cloud foundry env. If you have any reference links/wiki's that walk thru this web sockets working, will help me here. I understand from cloud foundry docs that it has some level of support with web sockets but will give a try on a sample app in the meantime.

Thanks.


Re: Diego docker app launch issue with Diego's v0.1443.0

Anuj Jain <anuj17280@...>
 

Hi Eric - please ignore last message - I found the issue - I installed
diego with collocated VM as well as other VMs - which make CC confuse and
collocated VM was not having all the functionality/jobs - now I am not
getting 500 message and can run buildpack apps post diego enabled.

On Mon, Dec 14, 2015 at 5:23 PM, Anuj Jain <anuj17280(a)gmail.com> wrote:

Hi Eric – Thanks for trying to help me to resolve my issues – please check
comments inline:

On Mon, Dec 14, 2015 at 12:02 AM, Eric Malm <emalm(a)pivotal.io> wrote:

Hi, Anuj,

Thanks for the info, and sorry to hear you've run into some difficulties.
It sounds like Cloud Controller is getting a 503 error from the
nsync-listener service on the CC-Bridge. That most likely means it's
encountering some sort of error in communicating with Diego's BBS API. You
mentioned that you had some problems with the database jobs when upgrading
as well. Does BOSH now report that all the VMs in the Diego deployment are
running correctly?

=> All VMs under (CF, Diego and Diego docker cache) showing running
=> Database issue was in manifest, which got resolved once I
update/changed the log_level value from debug2 to debug


One next step to try would be to tail logs from the nsync-listener
processes on the CC-Bridge VMs with `tail -f
/var/vcap/sys/log/nsync/nsync_listener.stdout.log`, and from the BBS
processes on the database VMs with `tail -f
/var/vcap/sys/log/bbs/bbs.stdout.log`, then try restarting your app that
targets Diego, and see if there are any errors in the logs. It may also
help to filter the logs to contain only the ones with your app guid, which
you can get from the CF CLI via `cf app APP_NAME --guid`.

=> It could not able to see any logs for my app on CC-Bridge - checked
stager and/or Nsync job logs
=> I checked CC host and run 'netstat -anp | grep consul' couple of time
and found that sometime it showing established connection with one consul
server and sometime not - below is the sample output:

# netstat -anp | grep consul
tcp 0 0 10.5.139.156:8301 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:8400 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:8500 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:53 0.0.0.0:*
LISTEN 4559/consul
udp 0 0 127.0.0.1:53 0.0.0.0:*
4559/consul
udp 0 0 10.5.139.156:8301 0.0.0.0:*
4559/consul
root(a)07c40eae-6a8c-4fb1-996d-e638637b5caa:/var/vcap/bosh_ssh/bosh_eerpetbrz#
netstat -anp | grep consul
tcp 0 0 10.5.139.156:8301 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:8400 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:8500 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:53 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 10.5.139.156:60316 10.5.139.140:8300
ESTABLISHED 4559/consul
udp 0 0 127.0.0.1:53 0.0.0.0:*
4559/consul
udp 0 0 10.5.139.156:8301 0.0.0.0:*
4559/consul

=> After that I also checked/verify consul agent logs on CC which showing
EventmemberFailed and EventmemberJoin messages


========================================================================================
logs:
2015/12/14 09:32:43 [INFO] serf: EventMemberFailed: api-z1-0
10.5.139.156
2015/12/14 09:32:44 [INFO] serf: EventMemberJoin: docker-cache-0
10.5.139.252
2015/12/14 09:32:44 [INFO] serf: EventMemberJoin: api-z1-0 10.5.139.156
2015/12/14 09:32:49 [INFO] serf: EventMemberJoin: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:33:32 [INFO] memberlist: Suspect ha-proxy-z1-1 has
failed, no acks received
2015/12/14 09:33:40 [INFO] serf: EventMemberFailed: ha-proxy-z1-1
10.5.103.104
2015/12/14 09:33:40 [INFO] serf: EventMemberFailed: database-z1-2
10.5.139.194
2015/12/14 09:33:40 [INFO] memberlist: Marking ha-proxy-z1-0 as
failed, suspect timeout reached
2015/12/14 09:33:40 [INFO] serf: EventMemberFailed: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:33:41 [INFO] serf: EventMemberJoin: database-z1-2
10.5.139.194
2015/12/14 09:33:42 [INFO] serf: EventMemberFailed: cell-z1-3
10.5.139.199
2015/12/14 09:33:43 [INFO] serf: EventMemberJoin: cell-z1-3
10.5.139.199
2015/12/14 09:33:43 [INFO] serf: EventMemberFailed: api-worker-z1-0
10.5.139.159
2015/12/14 09:33:44 [INFO] serf: EventMemberJoin: ha-proxy-z1-1
10.5.103.104
2015/12/14 09:33:44 [INFO] memberlist: Marking uaa-z1-1 as failed,
suspect timeout reached
2015/12/14 09:33:44 [INFO] serf: EventMemberFailed: uaa-z1-1
10.5.139.155
2015/12/14 09:33:46 [INFO] serf: EventMemberFailed: cell-z1-1
10.5.139.197
2015/12/14 09:33:46 [INFO] serf: EventMemberJoin: uaa-z1-1 10.5.139.155
2015/12/14 09:33:47 [INFO] memberlist: Marking cc-bridge-z1-0 as
failed, suspect timeout reached
2015/12/14 09:33:47 [INFO] serf: EventMemberFailed: cc-bridge-z1-0
10.5.139.200
2015/12/14 09:33:49 [INFO] serf: EventMemberFailed: database-z1-1
10.5.139.193
2015/12/14 09:33:58 [INFO] serf: EventMemberJoin: api-worker-z1-0
10.5.139.159
2015/12/14 09:33:58 [INFO] serf: EventMemberJoin: database-z1-1
10.5.139.193
2015/12/14 09:33:59 [INFO] serf: EventMemberJoin: cell-z1-1
10.5.139.197
2015/12/14 09:33:59 [INFO] serf: EventMemberJoin: cc-bridge-z1-0
10.5.139.200
2015/12/14 09:34:01 [INFO] serf: EventMemberFailed: database-z1-1
10.5.139.193
2015/12/14 09:34:07 [INFO] serf: EventMemberJoin: database-z1-1
10.5.139.193
2015/12/14 09:34:09 [INFO] memberlist: Marking cell-z1-1 as failed,
suspect timeout reached
2015/12/14 09:34:09 [INFO] serf: EventMemberFailed: cell-z1-1
10.5.139.197
2015/12/14 09:34:20 [INFO] serf: EventMemberJoin: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:34:28 [INFO] serf: EventMemberJoin: cell-z1-1
10.5.139.197
2015/12/14 09:34:38 [INFO] serf: EventMemberFailed: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:34:42 [INFO] serf: EventMemberFailed: ha-proxy-z1-1
10.5.103.104
2015/12/14 09:34:44 [INFO] memberlist: Marking api-z1-0 as failed,
suspect timeout reached
2015/12/14 09:34:44 [INFO] serf: EventMemberFailed: api-z1-0
10.5.139.156
2015/12/14 09:34:48 [INFO] serf: EventMemberJoin: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:34:48 [INFO] memberlist: Marking api-worker-z1-0 as
failed, suspect timeout reached
2015/12/14 09:34:48 [INFO] serf: EventMemberFailed: api-worker-z1-0
10.5.139.159
2015/12/14 09:34:49 [INFO] serf: EventMemberJoin: api-z1-0 10.5.139.156
2015/12/14 09:34:52 [INFO] serf: EventMemberJoin: ha-proxy-z1-1
10.5.103.104
2015/12/14 09:34:58 [INFO] serf: EventMemberJoin: api-worker-z1-0
10.5.139.159

================================================================================



Also, are you able to run a buildpack-based app on the Diego backend, or
do you get the same error as with this Docker-based app?

=> No, I am also not able to run buildpack-based app on Diego backend -
verified that by enable-diego on one of the app and then tried starting it
- got same 500 Error.


Best,
Eric

On Thu, Dec 10, 2015 at 6:45 AM, Anuj Jain <anuj17280(a)gmail.com> wrote:

Hi,

I deployed the latest CF v226 with Diego v0.1443.0 - I was able to
successfully upgrade both deployments and verified that CF is working as
expected. currently seeing problem with Diego while trying to deploy any
docker app - I am getting *'Server error, status code: 500, error code:
170016, message: Runner error: stop app failed: 503' *- below you can
see the CF_TRACE output of last few lines.

I also notice that while trying to upgrade diego v0.1443.0 - it gave
me the error while trying to upgrade database job - the fix which I applied
(changed debug2 to debug from diego manifest file - path: properties =>
consul => log_level: debug)


RESPONSE: [2015-12-10T09:35:07-05:00]
HTTP/1.1 500 Internal Server Error
Content-Length: 110
Content-Type: application/json;charset=utf-8
Date: Thu, 10 Dec 2015 14:35:07 GMT
Server: nginx
X-Cf-Requestid: 8328f518-4847-41ec-5836-507d4bb054bb
X-Content-Type-Options: nosniff
X-Vcap-Request-Id:
324d0fc0-2146-48f0-6265-755efb556e23::5c869046-8803-4dac-a620-8ca701f5bd22

{
"code": 170016,
"description": "Runner error: stop app failed: 503",
"error_code": "CF-RunnerError"
}

FAILED
Server error, status code: 500, error code: 170016, message: Runner
error: stop app failed: 503
FAILED
Server error, status code: 500, error code: 170016, message: Runner
error: stop app failed: 503
FAILED
Error: Error executing cli core command
Starting app testing89 in org PAAS / space dev as admin...

FAILED

Server error, status code: 500, error code: 170016, message: Runner
error: stop app failed: 503


- Anuj


Re: Loggregator roadmap for CF Community input

Jim CF Campbell
 

Well I *thought* I had put in a universal link. Here is one
<https://docs.google.com/spreadsheets/d/1QOCUIlTkhGzVwfRji7Q14vczqkBbFGkiDWrJSKdRLRg/edit?usp=sharing>
.

I've also responded to all your requests for sharing. Looking forward to
you feedback!

Jim

On Mon, Dec 14, 2015 at 6:08 AM, Voelz, Marco <marco.voelz(a)sap.com> wrote:

Same here, seems like the document is not publicly readable?

Warm regards
Marco




On 14/12/15 06:06, "Noburou TANIGUCHI" <dev(a)nota.m001.jp> wrote:

I'm asked to sign in to Google account to read the Roadmap.
Is this an intentional behavior?

Thanks in advance.


Jim Campbell wrote
Hi cf-dev,

Over the past two months, I've been gathering customer input about the
CF
OSS logging. I've created a first draft of a Loggregator Roadmap
&lt;
https://docs.google.com/spreadsheets/d/1QOCUIlTkhGzVwfRji7Q14vczqkBbFGkiDWrJSKdRLRg/edit?usp=sharing>
;.
I'm looking for feedback from the folks on this list. You can comment on
the doc and/or put your feedback in this thread.

Thanks!

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io |
303.618.0963





-----
I'm not a ...
noburou taniguchi
--
View this message in context:
http://cf-dev.70369.x6.nabble.com/cf-dev-Loggregator-roadmap-for-CF-Community-input-tp3016p3080.html
Sent from the CF Dev mailing list archive at Nabble.com.
--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io | 303.618.0963


Re: Loggregator roadmap for CF Community input

Marco Voelz
 

Same here, seems like the document is not publicly readable?

Warm regards
Marco

On 14/12/15 06:06, "Noburou TANIGUCHI" <dev(a)nota.m001.jp> wrote:

I'm asked to sign in to Google account to read the Roadmap.
Is this an intentional behavior?

Thanks in advance.


Jim Campbell wrote
Hi cf-dev,

Over the past two months, I've been gathering customer input about the CF
OSS logging. I've created a first draft of a Loggregator Roadmap
&lt;https://docs.google.com/spreadsheets/d/1QOCUIlTkhGzVwfRji7Q14vczqkBbFGkiDWrJSKdRLRg/edit?usp=sharing>;.
I'm looking for feedback from the folks on this list. You can comment on
the doc and/or put your feedback in this thread.

Thanks!

--
Jim Campbell | Product Manager | Cloud Foundry | Pivotal.io | 303.618.0963




-----
I'm not a ...
noburou taniguchi
--
View this message in context: http://cf-dev.70369.x6.nabble.com/cf-dev-Loggregator-roadmap-for-CF-Community-input-tp3016p3080.html
Sent from the CF Dev mailing list archive at Nabble.com.


Re: Diego docker app launch issue with Diego's v0.1443.0

Anuj Jain <anuj17280@...>
 

Hi Eric – Thanks for trying to help me to resolve my issues – please check
comments inline:

On Mon, Dec 14, 2015 at 12:02 AM, Eric Malm <emalm(a)pivotal.io> wrote:

Hi, Anuj,

Thanks for the info, and sorry to hear you've run into some difficulties.
It sounds like Cloud Controller is getting a 503 error from the
nsync-listener service on the CC-Bridge. That most likely means it's
encountering some sort of error in communicating with Diego's BBS API. You
mentioned that you had some problems with the database jobs when upgrading
as well. Does BOSH now report that all the VMs in the Diego deployment are
running correctly?

=> All VMs under (CF, Diego and Diego docker cache) showing running
=> Database issue was in manifest, which got resolved once I update/changed
the log_level value from debug2 to debug


One next step to try would be to tail logs from the nsync-listener
processes on the CC-Bridge VMs with `tail -f
/var/vcap/sys/log/nsync/nsync_listener.stdout.log`, and from the BBS
processes on the database VMs with `tail -f
/var/vcap/sys/log/bbs/bbs.stdout.log`, then try restarting your app that
targets Diego, and see if there are any errors in the logs. It may also
help to filter the logs to contain only the ones with your app guid, which
you can get from the CF CLI via `cf app APP_NAME --guid`.

=> It could not able to see any logs for my app on CC-Bridge - checked
stager and/or Nsync job logs
=> I checked CC host and run 'netstat -anp | grep consul' couple of time
and found that sometime it showing established connection with one consul
server and sometime not - below is the sample output:

# netstat -anp | grep consul
tcp 0 0 10.5.139.156:8301 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:8400 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:8500 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:53 0.0.0.0:*
LISTEN 4559/consul
udp 0 0 127.0.0.1:53 0.0.0.0:*
4559/consul
udp 0 0 10.5.139.156:8301 0.0.0.0:*
4559/consul
root(a)07c40eae-6a8c-4fb1-996d-e638637b5caa:/var/vcap/bosh_ssh/bosh_eerpetbrz#
netstat -anp | grep consul
tcp 0 0 10.5.139.156:8301 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:8400 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:8500 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 127.0.0.1:53 0.0.0.0:*
LISTEN 4559/consul
tcp 0 0 10.5.139.156:60316 10.5.139.140:8300
ESTABLISHED 4559/consul
udp 0 0 127.0.0.1:53 0.0.0.0:*
4559/consul
udp 0 0 10.5.139.156:8301 0.0.0.0:*
4559/consul

=> After that I also checked/verify consul agent logs on CC which showing
EventmemberFailed and EventmemberJoin messages

========================================================================================
logs:
2015/12/14 09:32:43 [INFO] serf: EventMemberFailed: api-z1-0
10.5.139.156
2015/12/14 09:32:44 [INFO] serf: EventMemberJoin: docker-cache-0
10.5.139.252
2015/12/14 09:32:44 [INFO] serf: EventMemberJoin: api-z1-0 10.5.139.156
2015/12/14 09:32:49 [INFO] serf: EventMemberJoin: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:33:32 [INFO] memberlist: Suspect ha-proxy-z1-1 has
failed, no acks received
2015/12/14 09:33:40 [INFO] serf: EventMemberFailed: ha-proxy-z1-1
10.5.103.104
2015/12/14 09:33:40 [INFO] serf: EventMemberFailed: database-z1-2
10.5.139.194
2015/12/14 09:33:40 [INFO] memberlist: Marking ha-proxy-z1-0 as failed,
suspect timeout reached
2015/12/14 09:33:40 [INFO] serf: EventMemberFailed: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:33:41 [INFO] serf: EventMemberJoin: database-z1-2
10.5.139.194
2015/12/14 09:33:42 [INFO] serf: EventMemberFailed: cell-z1-3
10.5.139.199
2015/12/14 09:33:43 [INFO] serf: EventMemberJoin: cell-z1-3 10.5.139.199
2015/12/14 09:33:43 [INFO] serf: EventMemberFailed: api-worker-z1-0
10.5.139.159
2015/12/14 09:33:44 [INFO] serf: EventMemberJoin: ha-proxy-z1-1
10.5.103.104
2015/12/14 09:33:44 [INFO] memberlist: Marking uaa-z1-1 as failed,
suspect timeout reached
2015/12/14 09:33:44 [INFO] serf: EventMemberFailed: uaa-z1-1
10.5.139.155
2015/12/14 09:33:46 [INFO] serf: EventMemberFailed: cell-z1-1
10.5.139.197
2015/12/14 09:33:46 [INFO] serf: EventMemberJoin: uaa-z1-1 10.5.139.155
2015/12/14 09:33:47 [INFO] memberlist: Marking cc-bridge-z1-0 as
failed, suspect timeout reached
2015/12/14 09:33:47 [INFO] serf: EventMemberFailed: cc-bridge-z1-0
10.5.139.200
2015/12/14 09:33:49 [INFO] serf: EventMemberFailed: database-z1-1
10.5.139.193
2015/12/14 09:33:58 [INFO] serf: EventMemberJoin: api-worker-z1-0
10.5.139.159
2015/12/14 09:33:58 [INFO] serf: EventMemberJoin: database-z1-1
10.5.139.193
2015/12/14 09:33:59 [INFO] serf: EventMemberJoin: cell-z1-1 10.5.139.197
2015/12/14 09:33:59 [INFO] serf: EventMemberJoin: cc-bridge-z1-0
10.5.139.200
2015/12/14 09:34:01 [INFO] serf: EventMemberFailed: database-z1-1
10.5.139.193
2015/12/14 09:34:07 [INFO] serf: EventMemberJoin: database-z1-1
10.5.139.193
2015/12/14 09:34:09 [INFO] memberlist: Marking cell-z1-1 as failed,
suspect timeout reached
2015/12/14 09:34:09 [INFO] serf: EventMemberFailed: cell-z1-1
10.5.139.197
2015/12/14 09:34:20 [INFO] serf: EventMemberJoin: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:34:28 [INFO] serf: EventMemberJoin: cell-z1-1 10.5.139.197
2015/12/14 09:34:38 [INFO] serf: EventMemberFailed: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:34:42 [INFO] serf: EventMemberFailed: ha-proxy-z1-1
10.5.103.104
2015/12/14 09:34:44 [INFO] memberlist: Marking api-z1-0 as failed,
suspect timeout reached
2015/12/14 09:34:44 [INFO] serf: EventMemberFailed: api-z1-0
10.5.139.156
2015/12/14 09:34:48 [INFO] serf: EventMemberJoin: ha-proxy-z1-0
10.5.103.103
2015/12/14 09:34:48 [INFO] memberlist: Marking api-worker-z1-0 as
failed, suspect timeout reached
2015/12/14 09:34:48 [INFO] serf: EventMemberFailed: api-worker-z1-0
10.5.139.159
2015/12/14 09:34:49 [INFO] serf: EventMemberJoin: api-z1-0 10.5.139.156
2015/12/14 09:34:52 [INFO] serf: EventMemberJoin: ha-proxy-z1-1
10.5.103.104
2015/12/14 09:34:58 [INFO] serf: EventMemberJoin: api-worker-z1-0
10.5.139.159
================================================================================



Also, are you able to run a buildpack-based app on the Diego backend, or
do you get the same error as with this Docker-based app?

=> No, I am also not able to run buildpack-based app on Diego backend -
verified that by enable-diego on one of the app and then tried starting it
- got same 500 Error.


Best,
Eric

On Thu, Dec 10, 2015 at 6:45 AM, Anuj Jain <anuj17280(a)gmail.com> wrote:

Hi,

I deployed the latest CF v226 with Diego v0.1443.0 - I was able to
successfully upgrade both deployments and verified that CF is working as
expected. currently seeing problem with Diego while trying to deploy any
docker app - I am getting *'Server error, status code: 500, error code:
170016, message: Runner error: stop app failed: 503' *- below you can
see the CF_TRACE output of last few lines.

I also notice that while trying to upgrade diego v0.1443.0 - it gave
me the error while trying to upgrade database job - the fix which I applied
(changed debug2 to debug from diego manifest file - path: properties =>
consul => log_level: debug)


RESPONSE: [2015-12-10T09:35:07-05:00]
HTTP/1.1 500 Internal Server Error
Content-Length: 110
Content-Type: application/json;charset=utf-8
Date: Thu, 10 Dec 2015 14:35:07 GMT
Server: nginx
X-Cf-Requestid: 8328f518-4847-41ec-5836-507d4bb054bb
X-Content-Type-Options: nosniff
X-Vcap-Request-Id:
324d0fc0-2146-48f0-6265-755efb556e23::5c869046-8803-4dac-a620-8ca701f5bd22

{
"code": 170016,
"description": "Runner error: stop app failed: 503",
"error_code": "CF-RunnerError"
}

FAILED
Server error, status code: 500, error code: 170016, message: Runner
error: stop app failed: 503
FAILED
Server error, status code: 500, error code: 170016, message: Runner
error: stop app failed: 503
FAILED
Error: Error executing cli core command
Starting app testing89 in org PAAS / space dev as admin...

FAILED

Server error, status code: 500, error code: 170016, message: Runner
error: stop app failed: 503


- Anuj

6321 - 6340 of 9426