new feature discuss: User can use CF to deploy APP in specific zone.
Liangbiao
Hi,
Currently, DEA can specified to a "zone", and Cloud Controller can schedule APP instance according to zone.(https://github.com/cloudfoundry/cloud_controller_ng/blob/965dbc4bdf65df89f382329aef39f86a916b3f05/lib/cloud_controller/dea/pool.rb#L47) So, I think whether we can push it more further. For example, APP developer can specify which zone to deploy the APP. Regards, Rexxar
|
|
Re: bosh-lite diego "found no compatible cell"
Ted Young
"found no compatible cell" is the error you will get when all diego Cells
toggle quoted messageShow quoted text
have failed to deploy. Start by double checking that the cell is up and running via `bosh vms`. -Ted
On Tue, Nov 24, 2015 at 6:43 PM, Eric Malm <emalm(a)pivotal.io> wrote:
Hi, Christian,
|
|
Re: bosh-lite diego "found no compatible cell"
Eric Malm <emalm@...>
Hi, Christian,
toggle quoted messageShow quoted text
Thanks for asking. From what you've described, it sounds like Cloud Controller is able to tell Diego to run the app, but the app instance isn't getting placed on a cell. It's probably worth looking at the logs of the auctioneer, in /var/vcap/sys/log/auctioneer/auctioneer.stdout.log on the brain_z1/0 Diego VM, and the cell rep, in /var/vcap/sys/log/rep/rep.stdout.log on the cell_z1/0 Diego VM (assuming a standard BOSH-Lite deployment). It might even be useful to follow those logs in real-time with `tail -f` while you stop and start the CF app. Also, what versions of CF, Diego, Garden-Linux, and the BOSH-Lite stemcell do you have deployed? Thanks, Eric, CF Runtime Diego PM
On Mon, Nov 23, 2015 at 2:19 PM, Christian Stocker <chregu(a)liip.ch> wrote:
Hi
|
|
Warden stemcell 3126 not usable for CF+Diego deployments to BOSH-Lite
Eric Malm <emalm@...>
Hi, all,
If you've tried using the 3126 Warden stemcell for your CF and Diego deployments to BOSH-Lite, you will likely have discovered that Diego doesn't deploy correctly. As it turns out, there is a change to the resolvconf configuration in that version of that stemcell that prevents the consul agent from providing DNS to CF and Diego components. Consequently, some Diego components are unable even to start correctly, and Cloud Controller will be unable to communicate with the Diego deployment. The BOSH team is working on fixing the configuration issue in https://www.pivotaltracker.com/n/projects/956238/stories/107958688, and there are more details about the problem available at https://github.com/cloudfoundry/bosh-lite/issues/315. For the meantime, we recommend using the previous BOSH-Lite Warden stemcell version, 2776, instead. I've already amended the BOSH-Lite instructions in the diego-release README to obtain this specific version of the stemcell from bosh.io, rather than the latest version. Also, please note that this issue affects only the BOSH-Lite Warden stemcell, and stemcells for non-BOSH-Lite IaaSes are correctly compatible with consul agent-based DNS. Thanks, Eric Malm, CF Runtime Diego PM
|
|
connection draining for TCP Router in cf-routing-release
Shannon Coen
On the CAB call Dr. Nic asked about support in the routing tier for
connection draining. I asked him out-of-band to elaborate, then realized this was a topic the community might be interested in. Nic explained that he's looking for a TCP router to route requests from apps on CF to a clustered service, and wants to allow graceful draining of requests before a backend was moved. When a backend for a route is removed from the routing table, the TCP Router will prevent new requests for the route from being routed to that backend, and will reject requests for the route when all associated backends are removed. The routing table is updated via the Routing API; the TCP router fetches its configuration by subscribing to the API via SSE, as well as a periodic bulk fetch. When backends are removed for a route, existing connections remain up until closed by either the client or backend. We don't currently sever open connections after a timeout. In CF, when Diego removes an app instance it sends a TERM to the process in the container which has 10s to drain active connections before the container is torn down and all the processes killed. In parallel the backend will be removed from the route, preventing new connections. Nic: Does the existing behavior described above meet your needs, or would you require a timeout and proactive connection severing by the router? I recall we found this difficult using HAProxy last year, leading us to build the Switchboard proxy for cf-mysql-release. Have you considered Switchboard? In your use case could the IPs of your cluster nodes change at any time, or only on a deploy? In either case, you could use the Routing API to configure the router with the node addresses (similar to the way clients must currently register routes via NATS). Would you expect other clients to register routes with the same deployment of the API, or would you isolated it to the deployment of your service? The Routing API, like NATS, doesn't support multi-tenant isolation yet, so multiple clients could potentially add unrelated backends for the same route. Finally, are you only interested in TCP routing; if so, I imagine you would deploy the routing-release with only the API and TCP router jobs? Shannon Coen Product Manager, Cloud Foundry Pivotal, Inc.
|
|
Re: [abacus] Refactor Aggregated Usage and Aggregated Rated Usage data model
Jean-Sebastien Delfino
Hi all,
Here's an update on this topic and the design discussions Assk, Ben and I had in the last few days: I'll start with a description of the problem we're trying to solve here: Abacus currently computes and stores the aggregated usage at various levels within an org in real time. Each time new usage for resource instances gets submitted we compute your latest aggregated usage at the org, space, app, resource and plan level, and store that in a new document keyed by the org id and the current time. We effectively write a history of your org's aggregated usage in the Abacus database, and that design allows us to efficiently report your latest usage, your usage history, or trigger usage limit alerts in real time for example, simply because we always have your latest usage for a given time in hand in a single doc, as opposed to having to run complex database queries pulling all your usage data into an aggregation when it's needed. So, that design is all good until somebody creates a thousand (or even a hundred) apps in the org. With many apps, our aggregated usage (JSON) docs get pretty big as we're keeping track of the aggregated usage for each app, JSON is not very space-efficient at representing all that data (that's a euphemism), and since we're writing a new doc for each new submitted usage, we eventually overload our Couch database with these big JSON docs. Long story short... this discussion is about trying to optimize our data model for aggregated usage to fix that problem. It's also an example of the typical tension in systems that need to stream a lot of data, compute some aggregates, and make quick decisions based on them: (a) do you pro-actively compute and store the aggregated values in real time as you're consuming your stream of input data? or (b) do you just write the input data and then run a mix of pseudo-real time and batch queries over and over on that data to compute the aggregates later? Our current design is along the lines of (a), but we're starting to also poke at ideas from the (b) camp to mitigate some of the issues of the (a) camp. The initial proposal described by Assk earlier in this thread was to split the single org level doc containing all the usage aggregations within the org into smaller docs: one doc per app for example (aka consumer in Abacus as we support usage from other things than pure apps). That's what he was calling 'normalized' usage, since the exercise of coming up with that new structure would be similar to a 'normalization' of the data in the relational database sense, as opposed to the 'denormalization' we went through to design the structure of our current aggregated usage doc (a JSON hierarchical structure including some data duplication). Now, while that data 'normalization' would help reduce the size of the docs and the amount of data written to record the history of your org's aggregated usage, in the last few days we've also started to realize that it would on the other hand increase the amount of data we'd have to read, to retrieve all the little docs representing the current aggregated usage and 'join' them into a complete view of the org's aggregated usage before adding new usage to it... Like I said before, a tension between two approaches, (a) writes a lot of data, is cheap on reads, (b) writes the minimum, requires a lot of reads... nothing's easy or perfect :) So the next step here is going to be an evaluation of some of the trade-offs between: a) write all the aggregated usage data for an org in one doc like we do now but simplify and refactor a bit the JSON format we use to represent it, in an attempt to make that JSON representation much smaller; b) split the aggregated usage in separate docs, one per app, linked together by a parent doc per org containing their ids, and optimize (with caching for example) the reads and 'joins' of all the docs forming the aggregated usage for the org; c) a middle-ground approach where we'll store the aggregated usage per app in separate docs, but maintain the aggregated usage at the upper levels (org, space, resource, plan) in the parent doc linking the app usage docs together, and explore what constrains or limitations that would impose on our ability to trigger real time usage limit alerts at any org, space, resource, plan, app etc level. This is a rather complex subject, so please feel free to ask questions or send any thoughts here, or in the tracker and Github issues referenced by Assk earlier if that's easier. Thanks! - Jean-Sebastien On Fri, Nov 20, 2015 at 11:09 AM, Saravanakumar A Srinivasan < sasrin(a)us.ibm.com> wrote: Started to look into two user stories([1] and [2]) titled "Organize the
|
|
Re: Unable to deploy application
Deepak Arn <arn.deepak1@...>
Hi,
I tried with the lower as well as higher limit, but the instruction is keep hanging and no packets recieved. ubuntu(a)test:~$ ping github.com -M dont -s 1400 PING github.com (192.30.252.131) 1400(1428) bytes of data. ^C --- github.com ping statistics --- 282 packets transmitted, 0 received, 100% packet loss, time 281921ms ubuntu(a)test:~$ ping github.com -M dont -s 1420 PING github.com (192.30.252.128) 1420(1448) bytes of data. ^C --- github.com ping statistics --- 136 packets transmitted, 0 received, 100% packet loss, time 135493ms ubuntu(a)test:~$ ping github.com -M dont -s 1430 PING github.com (192.30.252.129) 1430(1458) bytes of data. ^C --- github.com ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 4024ms ubuntu(a)test:~$ ping github.com -M dont -s 1434 PING github.com (192.30.252.129) 1434(1462) bytes of data. ^C --- github.com ping statistics --- 10 packets transmitted, 0 received, 100% packet loss, time 9047ms ubuntu(a)test:~$ ping github.com -M dont -s 1450 PING github.com (192.30.252.128) 1450(1478) bytes of data. ^C --- github.com ping statistics --- 11 packets transmitted, 0 received, 100% packet loss, time 10027ms ubuntu(a)test:~$ ping github.com -M dont -s 1500 PING github.com (192.30.252.128) 1500(1528) bytes of data. ^C --- github.com ping statistics --- 6 packets transmitted, 0 received, 100% packet loss, time 5031ms ubuntu(a)test:~$ ping github.com -M dont -s 1462 PING github.com (192.30.252.128) 1462(1490) bytes of data. ^C --- github.com ping statistics --- 6 packets transmitted, 0 received, 100% packet loss, time 5027ms
|
|
Re: Unable to deploy application
CF Runtime
Hey Deepak,
toggle quoted messageShow quoted text
I found that you provided some more information about your problem on Github: https://github.com/cloudfoundry/cf-release/issues/823 Was there any message from the ping about why packets weren't being received? Have you tried a smaller limit than 1426? Natalie & Mikhail OSS Release & Integration
On Fri, Nov 20, 2015 at 9:30 AM, Deepak Arn <arn.deepak1(a)gmail.com> wrote:
Hi,
|
|
Re: REGARDING_api_z1/0_CANARY_UPDATE
CF Runtime
Have you checked the control script logs in the `/var/vcap/sys/log/`
toggle quoted messageShow quoted text
folder? If the jobs are failing to start that's a good place to start. If you send them to us we can tell you more. Also, what infrastructure are you deploying cloud foundry to, and can you send us the manifest you're using to deploy it? Natalie & Mikhail OSS Integration & Runtime
On Thu, Nov 19, 2015 at 1:19 AM, Parthiban A <senjiparthi(a)gmail.com> wrote:
Hello All,
|
|
Re: Staging Error while deploying application on OpenStack
D vidzz
Hi Daniel,
I tried curl -vv https://github.com/cloudfoundry/java-buildpack/ from the instance(on openstack) where CF is installed and that works. Regarding offline buildpacks, CF already has offline buildpacks and I also added a new buildpack (java-custom), below is the output of cf buildpacks command: buildpack position enabled locked filename java-custom 1 true false java-buildpack-master.zip java_buildpack 2 true false java-buildpack-v3.3.zip ruby_buildpack 3 true false ruby_buildpack-cached-v1.6.7.zip nodejs_buildpack 4 true false nodejs_buildpack-cached-v1.5.0.zip I pushed my app as cf push Web2291 -b java-custom and also using the existing buildpack as cf push Web2291 -b java_buildpack Both the time it gets stuck, see below log: Updating app Web2291 in org DevBox / space Applications as admin... OK Uploading Web2291... Uploading app files from: C:\Users\umroot\workspaceKeplerJee\Web2291 Uploading 15.3K, 29 files Done uploading OK Stopping app Web2291 in org DevBox / space Applications as admin... OK Starting app Web2291 in org DevBox / space Applications as admin... -----> Downloaded app package (1.1M) and in the logs its same as before: 2015-11-24T16:39:25.25-0500 [DEA/0] OUT Got staging request for app with id c4b8522c-5157-4fa4-bb73-814f63603b23 2015-11-24T16:39:25.29-0500 [STG/0] OUT 2015-11-24T16:39:25.29-0500 [STG/0] ERR 2015-11-24T16:39:27.06-0500 [STG/0] OUT -----> Downloaded app package (1.1M) 2015-11-24T16:46:23.28-0500 [DEA/0] OUT Got staging request for app with id c4b8522c-5157-4fa4-bb73-814f63603b23 2015-11-24T16:46:23.32-0500 [STG/0] OUT 2015-11-24T16:46:23.32-0500 [STG/0] ERR 2015-11-24T16:46:25.66-0500 [STG/0] OUT -----> Downloaded app package (1.1M) 2015-11-24T16:46:47.06-0500 [DEA/0] OUT Got staging request for app with id c4b8522c-5157-4fa4-bb73-814f63603b23 2015-11-24T16:46:47.09-0500 [STG/0] OUT 2015-11-24T16:46:47.09-0500 [STG/0] ERR 2015-11-24T16:46:48.80-0500 [STG/0] OUT -----> Downloaded app package (1.1M) Thanks,
|
|
Re: diego: disk filling up over time
Tom Sherrod <tom.sherrod@...>
Hi Eric,
Thank you. I am responding below with what I have available. Unfortunately, when the problem presents, developers are down so the current resolution is recreate cells. Looking at one below 98% full, opportunity for additional details may arise soon. Answers below inline - What are the exact errors you're seeing when CF users are trying to make containers? The errors from CF CLI logs or rep/garden logs would be greatDid not capture detailed logs. FAILED StagingError was all that was captured. I've asked to get more information on the next failure which may be coming up soon, I'm looking at a cell with 98% filled. No issue reported as of yet, of course, there are 8 cells to choose from. - What's the total amount of disk space available on the volume attached/dev/vda3 22025756 20278880 604964 98% /var/vcap/data tmpfs 1024 16 1008 2% /var/vcap/data/sys/run /dev/loop0 122835 1552 117352 2% /tmp /dev/loop1 20480000 17923904 1914816 91% /var/vcap/data/garden-linux/btrfs_graph cgroup 8216468 0 8216468 0% /tmp/garden-/cgroup - How much space is the rep configured to allocate for its executor cache? Is it the default 10GB provided by the rep's job spec in https://github.com/cloudfoundry-incubator/diego-release/blob/v0.1398.0/jobs/rep/spec#L70-L72? How much disk is actually used in /var/vcap/data/executor_cache (based on reporting from `du`, say)? Default (not listed in the manifest) root(a)a0acd863-07e5-4964-8758-fcdf295d119d:/var/vcap/data/executor_cache# du 42876 . - How much space have you directed garden-linux to allocate for its btrfsbtrfs_store_size_mb: 20000 root(a)a0acd863-07e5-4964-8758-fcdf295d119d:/var/vcap/packages/btrfs-progs/bin# ./btrfs filesystem usage /var/vcap/data/garden-linux/btrfs_graph Overall: Device size: 19.53GiB Device allocated: 17.79GiB Device unallocated: 1.75GiB Device missing: 0.00B Used: 16.78GiB Free (estimated): 1.83GiB (min: 976.89MiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 320.00MiB (used: 0.00B) Data,single: Size:12.01GiB, Used:11.93GiB /dev/loop1 12.01GiB Metadata,single: Size:8.00MiB, Used:0.00B /dev/loop1 8.00MiB Metadata,DUP: Size:2.88GiB, Used:2.43GiB /dev/loop1 5.75GiB System,single: Size:4.00MiB, Used:0.00B /dev/loop1 4.00MiB System,DUP: Size:8.00MiB, Used:16.00KiB /dev/loop1 16.00MiB Unallocated: /dev/loop1 1.75GiB
|
|
Re: Garden Port Assignment Story
Mike Youngstrom
Yes Will, that summary is essentially correct. But, for even more clarity
toggle quoted messageShow quoted text
let me restate the complete story again and reason I want 92085170 to work across stemcell upgrades. :) Today if NATS goes down after 2 minutes the routers will drop their routing tables and my entire CF deployment goes down. The routers behave this way because of an experience Dieu had [0]. I don't like this I would prefer for routers to not drop routing tables if it cannot connect to Nats. Therefore, the routing team is adding 'prune_on_config_unavailable'. I plan to set this to false to make my deployment less sensitive to NATS failure. In doing so I am incurring more risk of mis routed stale routes. I am hoping that 92085170 will help reduce some of that risk. Since one of the times I personally have experienced stale route routing was during a deploy I hope that Garden will consider a port selection technique that will help ensure uniqueness across stemcell upgrades, something we frequently do as part of a deploy. Consequently a stateless solution like random assignment or a consistent hash will work across stemcell upgrades. Thanks, Mike [0] https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/yuVYCZkMLG8/7t8FHnFzWEsJ
On Tue, Nov 24, 2015 at 3:44 AM, Will Pragnell <wpragnell(a)pivotal.io> wrote:
Hi Mike,
|
|
SSO Kerberos with Spring
Leumas Yajiv
Hi.
I am trying to integrate with application(Java/Spring) authentication with an SSO through Kerberos. Has anyone done this before, I am using tomcat 8, openjdk 7 with java-buildpack version 3.0. And I am using r170 release of CF. I have gone through this documentation, https://spring.io/blog/2009/09/28/spring-security-kerberos-spnego-extension, but I fail to understand how that will work on CF. ooo Leuma
|
|
Re: Garden Port Assignment Story
Will Pragnell <wpragnell@...>
Hi Mike,
toggle quoted messageShow quoted text
What I think you're saying is that once the new `prune_on_config_unavailable` property is available in the router, and if it's set to `false`, there's a case when NATs is not reachable from the router in which potentially stale routes will continue to exist until the router can reach NATs again. Is that correct? (Sorry to repeat you back at yourself, just want to make sure I've understood you correctly.) Will
On 23 November 2015 at 19:02, Mike Youngstrom <youngm(a)gmail.com> wrote:
Hi Will,
|
|
Re: CF-RELEASE v202 UPLOAD ERROR
Parthiban Annadurai <senjiparthi@...>
Okay.. Let me try with it.. Thanks..
toggle quoted messageShow quoted text
On 24 November 2015 at 14:02, ronak banka <ronakbanka.cse(a)gmail.com> wrote:
Subnet ranges on which your other components are provisioned.
|
|
Re: CF-RELEASE v202 UPLOAD ERROR
Ronak Banka
Subnet ranges on which your other components are provisioned.
allow_from_entries: - 192.168.33.0/24 On Tue, Nov 24, 2015 at 5:16 PM, Parthiban Annadurai <senjiparthi(a)gmail.com> wrote: Hello Ronak,
|
|
Re: CF-RELEASE v202 UPLOAD ERROR
Parthiban Annadurai <senjiparthi@...>
Hello Ronak,
toggle quoted messageShow quoted text
Actually, previously i have given the values for ALLOW_FROM_ENTRIES, after seeing some mail groups only i changed it to NULL. Could you tell me which IP i need to give their or something else?? Thanks..
On 24 November 2015 at 13:23, ronak banka <ronakbanka.cse(a)gmail.com> wrote:
Hi Parthiban,
|
|
Re: CF-RELEASE v202 UPLOAD ERROR
Ronak Banka
Hi Parthiban,
In your manifest , there is a global property block nfs_server: address: 192.168.33.53 allow_from_entries: - null - null share: null allow from entries are provided for cc individual property and not for actual debian nfs server, that is possible reason cc is not able to write to nfs https://github.com/cloudfoundry/cf-release/blob/master/jobs/debian_nfs_server/spec#L20 Thanks Ronak On Tue, Nov 24, 2015 at 3:42 PM, Parthiban Annadurai <senjiparthi(a)gmail.com> wrote: Thanks Amit for your faster reply. FYI, I have shared my deployment
|
|
Re: CF-RELEASE v202 UPLOAD ERROR
Parthiban Annadurai <senjiparthi@...>
Thanks Amit for your faster reply. FYI, I have shared my deployment
toggle quoted messageShow quoted text
manifest too. I got struck in this issue for past couple of weeks. Thanks..
On 24 November 2015 at 12:00, Amit Gupta <agupta(a)pivotal.io> wrote:
Hi Parthiban,
|
|
Re: CF-RELEASE v202 UPLOAD ERROR
Amit Kumar Gupta
Hi Parthiban,
Sorry to hear your deployment is still getting stuck. As Warren points out, based on your log output, it looks like an issue with NFS configuration. I will ask the CAPI team, who are experts on cloud controller and NFS server, to take a look at your question. Best, Amit On Thu, Nov 19, 2015 at 8:11 PM, Parthiban Annadurai <senjiparthi(a)gmail.com> wrote: Thanks for your suggestions Warren. I am attaching the Manifest file which
|
|