Re: A hanged etcd used by hm9000 makes an impact on the delayed detection time of crashed application instances
Gwenn Etourneau
Just one question what about moving to Diego to get ride of HM9000 / DEA ?
toggle quoted message
Show quoted text
On Tue, Dec 15, 2015 at 9:44 PM, Masumi Ito <msmi10f(a)gmail.com> wrote:
Hi, |
|
Re: [abacus] Accommodating for plans in a resource config
Benjamin Cheng
Not sure about that. AIUI with that refined design plans can now useYes, I agree that it makes sense. We wouldn't want to deal with metrics/measures existing in specific plans mixing and matching with each other unless a real need pops up. |
|
Re: [cf-bosh] PLEASE READ: BOSH-Lite stemcell broken
Dmitriy Kalinin <dkalinin@...>
we believe 3146 addresses the problem.
toggle quoted message
Show quoted text
Sent from my iPhone On Dec 15, 2015, at 1:52 PM, Aristoteles Neto <dds.neto(a)gmail.com> wrote: |
|
Re: How to estimate reconnection / failover time between gorouter and nats
Christopher Piraino <cpiraino@...>
Hi Masumi,
toggle quoted message
Show quoted text
The sequence/estimation that you describe sounds accurate to us. I think ideally we should configure that NATs reconnection logic to initiate a reconnect before the stale_threshold value. We have put a story in our icebox <https://www.pivotaltracker.com/story/show/110199022> for our PM to prioritize. We also have some upcoming work around being able to configure the router to not prune routes when NATs is down. See this issue <https://github.com/cloudfoundry/gorouter/issues/102> on the GoRouter with related discussion. Chris and Shash - CF Routing Team On Mon, Dec 7, 2015 at 8:28 AM, Masumi Ito <msmi10f(a)gmail.com> wrote:
Hi, |
|
Re: [cf-bosh] PLEASE READ: BOSH-Lite stemcell broken
Aristoteles Neto
I see there is a new version (3146).
toggle quoted message
Show quoted text
Does that address the issues relating to 3126? Or should we stick with 2776 until advised otherwise? Aristoteles Neto dds.neto(a)gmail.com On 1/12/2015, at 8:43, Amit Gupta <agupta(a)pivotal.io> wrote:
Hey all, |
|
[ANNOUNCE] CVE-2015-5350: Garden Nstar vulnerability
Chip Childers <cchilders@...>
CVE-2015-5350: Garden Nstar vulnerabilitySeverity:
High Vendor: Cloud Foundry Foundation Versions Affected: Garden versions 0.22.0-0.329.0 Description: A vulnerability has been discovered in the garden-linux nstar executable that allows access to files on the host system. By staging an application on Cloud Foundry using Diego and Garden installations with a malicious custom buildpack an end user could read files on the host system that the BOSH-created vcap user has permissions to read and then package them into their app droplet. Affected Cloud Foundry Products and Versions: - All Garden versions prior to v0.330.0 Mitigation: - The Cloud Foundry project recommends that Cloud Foundry Deployments using Diego and Garden upgrade to Garden Linux Release v0.330.0 or higher. Diego release v0.1444.0 includes Garden Linux v.0.330.0. Credit: Julian Friedman Will Pragnell Eric Malm References: Cloud Foundry: * Garden-Linux-Release <https://github.com/cloudfoundry-incubator/garden-linux-release> * Diego-Release <https://github.com/cloudfoundry-incubator/diego-release> |
|
Failing to push standalone java app
Rahul Gupta
Hi,
I am trying to push a standalone Java app that has a 'public static void main(..)' and uses other dependencies. I tried setting the classpath in the jar's MANIFEST.MF, created a new jar that also contains dependent jars in its root and did a cf push but that didn't help either - the 'cf push -p xxxxxxx.jar' fails while resolving runtime dependencies e.g. ERR Exception in thread "main" java.lang.NoClassDefFoundError: com/XXX/client/AbcXyz Here is the content of manifest.mf: Manifest-Version: 1.0 Archiver-Version: Plexus Archiver Built-By: smokingfly Class-Path: XXX-123.jar AAA.789.jar Created-By: Apache Maven 3.2.3 Build-Jdk: 1.8.0_40 Main-Class: com.cf.samples.TestClient TestClient is the class with main method. I could not find any documentation that could help me with this. Could someone please help? Many thanks. |
|
Re: [abacus] Accommodating for plans in a resource config
Jean-Sebastien Delfino
On Fri, Dec 11, 2015 at 4:47 PM, Benjamin Cheng <bscheng(a)us.ibm.com> wrote:
Abacus will want to support plans in its resource config (as mentioned in +1 that makes sense to me as different plans may want to use different measures, metrics, and metering, accumulation and aggregation functions. Despite moving metrics and measures under plans, there will be a need of aNot sure about that. AIUI with that refined design plans can now use different metrics so usage gets aggregated at the plan level rather than the resource level (as it wouldn't make sense to aggregate usage from different plans metered using different metrics). That means that the aggregation, summary and charge functions only apply to the plan level rather than the resource level. Assuming that my above statement that 'aggregation, summary and charge functions only apply to the plan level' is correct, there's no 'common section' anymore, so no problem with processing usage in that non-existent common section anymore :) Makes sense? Thoughts/Concerns/Suggestions?- Jean-Sebastien |
|
Re: Organization quota definition-questions
Juan Antonio Breña Moral <bren at juanantonio.info...>
Sorry, before I didn't reply some questions.
1. Didn't test it. In my tests, I defined a quota at org level but I will test it. 2. I answered with the pseudocode. 3. The space adquired the limits defined in the quota for the organization. Juan Antonio |
|
Re: Organization quota definition-questions
Juan Antonio Breña Moral <bren at juanantonio.info...>
Hi,
You have the reason. Disk quota is a parameter defined to app level only. http://apidocs.cloudfoundry.org/213/apps/creating_an_app.html When you create a new App, you define a set of parameters and one of them is disk of quota but when you define a Org Quota, disk_quota is not defined at that level. http://apidocs.cloudfoundry.org/213/organization_quota_definitions/creating_a_organization_quota_definition.html I am not sure if someone from Pivotal could confirm this fact, but I think that CC API doesn't have that feature at org/space level. Anyway, at the moment, using the API, it is possible to do the same task but not in a direct way: IDEA: spaces = getSpacesFromOrg(org_guid) long org_disk_quota = 0; for each(space in spaces) { apps = getAppsFromSpace(space_guid) for each(app in apps) { app_stat = getAppSummary(app_guid) or getAppStats(app_guid) http://apidocs.cloudfoundry.org/226/apps/get_app_summary.html http://apidocs.cloudfoundry.org/226/apps/get_detailed_stats_for_a_started_app.html org_used_disk = app_stat.getDiskQuota(); } } System.out.println("Disk quota for current org: " + org_used_disk); Juan Antonio |
|
A hanged etcd used by hm9000 makes an impact on the delayed detection time of crashed application instances
Masumi Ito
Hi,
I found that one of etcds hanged up delayed the detection of crashed application instances, resulting in the slow recovery time. Although this depended on the condition of which hm9000 processes were connecting to the each etcd VM, it approximately took up to 15min to recover and I think it too long delayed. Does anyone know how to calculate time for hm9000 to detect a hanged etcd VM and switch to healthy etcds? I have encounted two different scenarios as follows. 1. hm9000 analyzer was connecting to the hanged etcd however hm9000 listner was connecting to the normal etcd. (About 8 min for analyzer to be recovered. The other hm9000 analyzer took over instead.) The analyzer seemed to be hanged up accidentally just after the connected etcd was hanged because "Analyzer completed succesfully" was not found in the log. After approximately 8 min passed, the other hm9000 analyzer acquired the lock and started to work instead. And then it identified crashed instance and enqueued start message. the crashed app was relaunched within ten min after the detection. 2. hm9000 analyzer was connecting to the normal etcd however hm9000 listner was connecting to the hanged etcd. (About 15 min for listener to be recovered. The same hm9000 listener seemed to be recovered somehow.) The listener started to fail to sync heartbeats just after the connected etcd was hanged. After 15min, "Save took too long. Not bumping freshness." was showed in the listner's log and then analyzer also complained about the old actual state: "Analyzer failed with error - Error:Actual state is not fresh" and stopped analyzing tasks. After 10 sec hm9000 listener had recovered somehow and started to bump freshness periodically then analyzer also started to analyze actual state and desied state and raised the request to start a crashed instance. Regards, Masumi -- View this message in context: http://cf-dev.70369.x6.nabble.com/A-hanged-etcd-used-by-hm9000-makes-an-impact-on-the-delayed-detection-time-of-crashed-application-ins-tp3096.html Sent from the CF Dev mailing list archive at Nabble.com. |
|
Re: Organization quota definition-questions
Ponraj E
Hi Juan Antonio,
Thanks for the reply. The API that you have mentioned gives me the memory usage of the org and not the disk quota/usage of the org. I need to know this info. In addition to that, I have added couple of more questions in my latest reply. 1. Sometimes the sum of space quota definition exceeds the org quota definition. Is this a valid use case or bug? 2. Currently at an org level, there is no API to display the disk quota limit/usage, but its only at the application level.How do we approach this? 3. Also at the space level, there is a possibility that a space not being associated with the space quota definition. So, how do we get the total resources available(like memory, services, routes) for this space? Regards, Ponraj |
|
Re: Organization quota definition-questions
Juan Antonio Breña Moral <bren at juanantonio.info...>
Good morning,
yes it is possible. If you observe PWS panel or Bluemix you can observe that information. Every organization has binded a OrganizationQuota and this definition affects to every applicattion deployed in any spaces binded to that organization. The REST methods used to get the definition is: http://apidocs.cloudfoundry.org/213/organization_quota_definitions/retrieve_a_particular_organization_quota_definition.html The method to read the memory used is: http://apidocs.cloudfoundry.org/222/organizations/retrieving_organization_memory_usage.html You have an example here: https://github.com/prosociallearnEU/cf-nodejs-dashboard/blob/master/services/HomeService.js#L69-L79 Remember that the memory used is the active memory. You can many applications staged but stopped. When you sum memory to the counter is when you start a new application to set of applications running in a space. Juan Antonio |
|
回复:Re: about consul_agent's cert
于长江 <yuchangjiang at cmss.chinamobile.com...>
it works, thank you~
toggle quoted message
Show quoted text
于长江 15101057694 原始邮件 发件人:Gwenn Etourneaugetourneau(a)pivotal.io 收件人:Discussions about Cloud Foundry projects and the system overall.cf-dev(a)lists.cloudfoundry.org 发送时间:2015年12月14日(周一) 12:48 主题:[cf-dev] Re: about consul_agent's cert Please read the documentationhttp://docs.cloudfoundry.org/deploying/common/consul-security.html On Mon, Dec 14, 2015 at 11:35 AM, 于长江 yuchangjiang(a)cmss.chinamobile.com wrote:
hi, when i deploy cf-release, consul agent job failed start, i found the err log in the vm. == Starting Consul agent... == Error starting agent: Failed to start Consul server: Failed to parse any CA certificates -------------------------------------------- then i found the configuration in cf’s manifest file is not correct,like this: consul: encrypt_keys: - CONSUL_ENCRYPT_KEY ca_cert: CONSUL_CA_CERT server_cert: CONSUL_SERVER_CERT server_key: CONSUL_SERVER_KEY agent_cert: CONSUL_AGENT_CERT agent_key: CONSUL_AGENT_KEY i have no idea of how to complete these fields, can someone give me an example, thanks~ 于长江 15101057694 |
|
Re: Organization quota definition-questions
Ponraj E
Hi,
Since the documentation for the quota definition is quite unclear at the moment, have more questions reg the same. I want to display the resource consumption (memory,disk usage,etc) at the org and space level. 1. Sometimes the sum of space quota definition exceeds the org quota definition. Is this a valid use case or bug? 2. Currently at an org level, there is no API to display the disk quota limit/usage, but its only at the application level.How do we approach this? 3. Also at the space level, there is a possibility that a space not being associated with the space quota definition. So, how do we get the total resources available(like memory, services, routes) for this space? Regards, Ponraj |
|
Organization quota definition-questions
Ponraj E
Hi,
Is it possible to get the disk quota at an organization level? As far as I see, the quota definition api doesnt return the disk quota[upper limit] info? I want to calculate the used disc quota/Total disk quota for an organization. Regards, Ponraj |
|
Re: Certificate management for non-Java applications
Daniel Mikusa
I think it depends on what language / runtime / library and where it's
looking for the default set of certs. It's easy with Java because it uses it's own cert store and that file is owned by the vcap user. If a language / runtime / library is looking at `/etc/ssl/certs`, you can't change that as a user (at least not from staging / runtime). The best thing, just like Java, is if your applications provides you with the facilities to configure its usage of certs. I'd imaging that all language / runtime / libraries provide you with a way to override the defaults and use your own certs. Besides that, you might be able to set various environment variables to point the default cert store to a different location. I'm not aware of a standard one though, so it's likely going to depend on the specific language / runtime / library and what it supports. Dan On Mon, Dec 14, 2015 at 6:01 PM, john mcteague <john.mcteague(a)gmail.com> wrote: Previous threads have focused on adding a trusted CA to the JDK's trust |
|
[Abacus] Tagging usage
KRuelY <kevinyudhiswara@...>
Hi,
I'm looking for a way to classify usage. My use case is this: I have 2 type of usage: 1. Usage of type A. 2. Usage of type B. The initial thought to do this with the current abacus is to create 2 separate plan(say plan a & plan b). This way, usage of type A would go under 'plan a' and usage of type B would go under 'plan b'. Due to past design, I am unable to do so and hence I need to separate usage under the same plan into 2 type: type A and type B. A more detailed use case would be when pricing change in the middle of the month. Here, I need to separate my usage into usage of type A that would use the old pricing, and usage of type B that would use the new pricing. The main reason of splitting them up is that I would like to have two different bucket for the two usage because keeping the two usage in the same bucket is not going to work(will give inaccurate result) due to: Use cases: 1. Price changed on the 15th. Get quantity usage from beginning of the month(A), and get quantity usage at the end of the month(B). The price would be: A * old price + (B - A) * new price. This would work if the accumulation formula is plain sum, but if the formula is max / average this would not work. 2. Query beginning of the month to middle of the month and query middle of the month to end of the month. querying from x to y is not supported, and with the accumulating and retracting dataflow model that we have this is not possible. 3. One way that would work is to utilize the flexibility of the meter, accumulate, and aggregate functions in the resource config to separate usage of type A and type B. I don't think this is a good design, but the point is that we can utilize the flexibility of the functions in the resource config to maybe generate a new idea. Would tagging a usage be a good idea? For example Under plan we would have plan: bucket A: aggregated_usage: bucket B: aggregated_usage: my concern with this is we would have to add another (key, value) pair to create another landmark window, and also the usage submitted would need another field that will serve as the 'tag'. Any thought or idea to implement this? Other than to create 2 separate plans and tagging, the ideas mentioned above are not really viable. Thanks! -- View this message in context: http://cf-dev.70369.x6.nabble.com/Abacus-Tagging-usage-tp3089.html Sent from the CF Dev mailing list archive at Nabble.com. |
|
Certificate management for non-Java applications
john mcteague <john.mcteague@...>
Previous threads have focused on adding a trusted CA to the JDK's trust
store at application startup, a pattern that I have employed also. We are facing increased demand from our non-Java developers to have the same functionality. Whether it be custom CA's, certs for authentication (against something like MQ for example) or for our internal LDAP server which requires ldaps, we need a way to add user defined certificates at app deploy time based on user requirements. My work with Java buildpacks has resulted in a certificate as a service style function; declare which cert from a certificate store should be injected into the app at runtime. What I lack for non-java runtimes is a reliable way to get those certs into the correct linux container directory either during staging or at app startup. Have others been able to establish a pattern around this? Without this abiity we go from a polygot platform to simply Java only. Thanks, John |
|
Re: Web sockets + Cloud Foundry
Matthew Sykes <matthew.sykes@...>
James Bayer posted about a fun experiment with web sockets a while back.
Might be a good starting point: http://www.iamjambay.com/2013/12/send-interactive-commands-to-cloud.html You need to make sure that the url uses the correct protocol and port for your CF target. You can reference `doppler_logging_endpoint` from /v2/info as a template. On Mon, Dec 14, 2015 at 4:53 PM, Lakshman Mukkamalla (lmukkama) < lmukkama(a)cisco.com> wrote: Hi CF Dev team, -- Matthew Sykes matthew.sykes(a)gmail.com |
|