Date   

Re: Unconference at CF Summit Basel 2018

Dr Nic Williams <drnicwilliams@...>
 

Thanks for putting the time into another unconference.

I'm working on a book about the UAA; hopefully its done by the conf. Since the UAA is delightfully invisible to most people, I'd love to do 5-10 mins intro on why its interesting, how to deploy, and how to learn more.

Nic

On Mon, Jul 23, 2018 at 9:36 PM Daniel Jones <daniel.jones@...> wrote:
Hi all,

We're pleased to confirm that there'll be an Unconference at Basel again this year at 6pm on Tuesday 9th October.

We're planning on the same rough schedule as last year, so talks interspersed with open space sessions, topped off by something fun at the end. Oh, plus free food and beer.

Things we need from y'all:
  • Talks! We'd love for you folks to present, ideally for <= 10 minutes.
  • Suggestions! Would you like to do another pub quiz? Do you have any other ideas?
  • Sign-ups! If you're coming, please sign up so we know how much food and drink to order.

If you'd like to give a talk, please reply to me, Sara Lenz and Ivana Scott (CC'd).

Regards,
Daniel 'Deejay' Jones - CTO
+44 (0)79 8000 9153
EngineerBetter Ltd - More than Cloud Foundry specialists



--
Dr Nic Williams
Stark & Wayne LLC
+61 437 276 076
twitter @drnic


Unconference at CF Summit Basel 2018

Daniel Jones
 

Hi all,

We're pleased to confirm that there'll be an Unconference at Basel again this year at 6pm on Tuesday 9th October.

We're planning on the same rough schedule as last year, so talks interspersed with open space sessions, topped off by something fun at the end. Oh, plus free food and beer.

Things we need from y'all:
  • Talks! We'd love for you folks to present, ideally for <= 10 minutes.
  • Suggestions! Would you like to do another pub quiz? Do you have any other ideas?
  • Sign-ups! If you're coming, please sign up so we know how much food and drink to order.

If you'd like to give a talk, please reply to me, Sara Lenz and Ivana Scott (CC'd).

Regards,
Daniel 'Deejay' Jones - CTO
+44 (0)79 8000 9153
EngineerBetter Ltd - More than Cloud Foundry specialists


Re: [CAUTION] Re: [cf-dev] Proposed BOSH logging interface

Jesse T. Alford
 

We haven't done anything beyond proposing the interface and implementing the option to respect permissions.

Since the time of this proposal, BPM has implemented a feature that should allow us to run Blackbox in it, mounting the logs directory as read-only. We haven't tried it yet. Assuming it works, this would also reduce our concerns about running blackbox with read access to the entire file system.

Regarding your other feedback about what should go in the tags or structured data, we've not formally taken any of that on board; development of syslog-release is currently paused. I'd suggest putting these things up as issues on the syslog-release repo;
awareness of those is more likely to be durable enough to remain visible until such time as there's a team on this.


On Wed, Jul 4, 2018 at 4:32 AM Voelz, Marco <marco.voelz@...> wrote:

Dear Jesse,

 

did anything come out of this proposal? Did you end up picking up this track of work?

 

Warm regards

Marco

 

From: <cf-dev@...> on behalf of Marco Voelz <marco.voelz@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Tuesday, 8. May 2018 at 10:08
To: "cf-dev@..." <cf-dev@...>, Dmitriy Kalinin <dkalinin@...>
Subject: [CAUTION] Re: [cf-dev] Proposed BOSH logging interface

 

Dear Jesse,

 

Thanks for putting this proposal out there. We would be happy to see an automated logfile forwarding mechanism. Here's a couple of comments on your initial points:

* Including the filename in the syslog metadata is very useful and something we'd really like to have. Currently it is something we're working around a bit.

* The appname/tag field should probably contain the release's name as well as a prefix. My proposal here is `<deployment name>.<instance group name>.<job name>`. wdyt?

* We haven't made any particular use of the priority field, so losing control over this field wouldn't matter for out use-cases. Severity is usually something that the actual log message needs to contain, as the logger's severity can only be set on its initial creation, afaik.

* Restricting the depth of recursion seems reasonable. So far, I don't think we're using bosh releases which have more than 1 folder below their /var/vcap/sys/log/<job name>/ folder.

 

Concerning the requirements about permissions on the logfiles you'd want to forward: Did you talk to Dmitriy/the BOSH team about this? With stemcell series 3541.x the permissions on the standard folders below /var/vcap were tightened a bit, so just wanted to make sure that your assumptions are in line with the upcoming changes in the stemcells.

 

Warm regards

Marco


From: cf-dev@... <cf-dev@...> on behalf of Jesse T. Alford <jalford@...>
Sent: Tuesday, April 3, 2018 12:55:38 AM
To: cf-dev@...
Subject: [cf-dev] Proposed BOSH logging interface

 

Hello! We're the CF Platform Logging team. We maintain `syslog-release` and have been working to improve and regularize platform logging behavior.

 

This is a proposal intended to establish reasonable expectations about what should be logged and what should be forwarded in bosh-deployed cloud systems.

 

Historically, it has been up to each release to provide for their log forwarding, if any. We intend `syslog-release` to provide a consistent interface useful enough to replace all other provisions for the forwarding of logs from bosh jobs.

 

## Proposed Interface

If log forwarding is enabled, some files in `/var/vcap/sys/log` (and its subdirectories, recursively), will have any line written to them forwarded as the MSG portion of an RFC5424 compliant syslog message. Which files are forwarded is governed first by file extension, and secondarily by file permissions.

 

`syslog-release` attempts to read any file ending in `.log`.

(This allows us to avoid forwarding rotated logs, swapfiles, etc.)

It will forward from such files if either of the following are true:

- it is world-readable

- it is readable to the `vcap` group

 

In particular, this means that logs will not be forwarded from files where:

- user and group are root:root

- user and group are vcap:root or vcap:none

- user and group are vcap:vcap, but it is not group-readable

 

…unless they are world-readable.

 

We think that this interface will allow us to avoid running a log forwarder with elevated permissions, while also allowing jobs to, for instance, write DEBUG or similar logs to a file that is not group-readable, thus improving their security and reducing the load on the logging system while still making them available on the ephemeral disk for debugging purposes.

 

## Questions

There are a couple of things around this interface we're especially interested in feedback on, in addition to the obvious "will this be a problem for you" overall question.

 

We may have to have a proviso that the depth of this is not unlimited. This depends somewhat on what is inexpensive to implement and maintain, and is an area we'd appreciate feedback on. Is three levels deep from `/var/vcap/sys/log` (i.e. `/var/vcap/sys/log/jobname/processname/*`) enough? Would four be?

 

In the old way of doing things, more control over the PRI information and other syslog fields was available to release authors. Logs forwarded from files currently all come out as PRI 14, which translates to Facility: User, Severity: Info. Additionally, the appname/tag field is set to the name of the directory containing the log file. Is this enough/good info? If we were to include the filename, too, would that be useful? Sufficient?

 

## Testing with the Proposed Interface

We have recently implemented a feature to help release authors evaluate the proposed interface. If you set `syslog.respect_file_permissions: true`, blackbox will not be run with elevated capabilities, and you'll be able to see what is and isn't forwarded under the proposed interface.


Re: cf-deployment 3.0

Josh Collins
 

Thanks Geoff, Marco, Chip, Jesse, Bernd, and David for sharing your feedback and thoughts. You’ve expressed valid concerns and provided valuable context that I take to heart. I really appreciate the time and effort required for meaningful dialogue about the impacts of the proposed release cadence.


While the RelInt team's primary goal remains supporting the CF Foundation engineering teams and their ability to validate their commits in CI, your points underscore a tension we’re acutely aware of.


We’re trying to meet the needs of both the CFF Contributor and Operator and the ‘trick’ is to find a sustainable balance between the two. However, on occasions where we must prioritize one over the other we’re going to favor the CFF Contributor.  


I mentioned this earlier, but it’s worth restating that the RelInt team doesn’t have any plans provide LTS support and as Chip and Jesse pointed out that has traditionally been a value-added service provided by commercial vendors.  


In the spirit of iteration, I’d like to propose we proceed with the release cadence I originally outlined and see how it goes.


Again, thank you for providing such valuable feedback.


Cheers,


Josh Collins

_Current_ PM of CF Release Integration


Re: cf-deployment 3.0

Jesse T. Alford
 

Another point: most (certainly not all, but most) CVEs are stemcell, buildpack, or rootfs bumps that can be consumed safely/have minimal integration concerns. Even those that are in more substantive releases, such as routing and UAA, can be bumped-ahead fairly easily with a manual edit or an ops file, and are often fairly safe iterations over the releases that came before them, though it can be admittedly hard to tell.

If, somewhere along the spectrum of risk and difficulty I just described, the risk becomes too great for you to feel safe going straight to prod without relint's blessing, I recommend you test them first. If testing these adjustments to your particular environment is too burdensome, well, yes, it does become so, doesn't it?

Maintaining a whole passel of integration environments is a significant engineering and infrastructure burden, but happily it is one you can pay commercial integrators to shoulder for you.


On Wed, Jul 18, 2018 at 2:46 PM David Sabeti <dsabeti@...> wrote:
As the previous project lead for RelInt, I want to speak to Marco's concerns directly. We _definitely_ considered the operator as an important persona during any decision-making; if anything, we were overcommitted to that persona, evidenced by the fact that we became at times an obstacle to CFF dev teams out of fear of making a breaking changes for operators.

There's clearly some concern that operators won't be able to keep up with breaking changes. However, one impact of making breaking changes more frequently -- and, even better, on a schedule -- is to reduce the difficulty of adapting to them. To build a bit on what Josh said earlier in his example about cf-networking 2.0, as we pushed off releasing a major version of cf-deployment, more backwards-incompatible updates were stockpiled in the backlog. In the end, cf-deployment 2.0 included **seven** breaking changes instead of merely one or two.

To link this back to Marco's story -- "As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible" -- this is already a problem with cf-deployment. As others have mentioned, there's no back-porting of cf-deployment after major version bumps, so operators already have to accommodate breaking changes in order to get CVE fixes. I understand that the proposal means that this happens more often, but it also means that major version bumps will be more predictable and less risky.[0]

I wasn't sure if it was worth rehashing the days of cf-release or not, but since Jesse broached the subject, I'd give his comments a +1 all around. One of the ways I understood Josh's proposal was as an important course correction. If cf-release was too free-wheeling in making breaking changes, cf-deployment has been too conservative. The proposal for a regular cadence of breaking changes seems like a balance between those two. Similarly, this is a re-balancing with regards to the personas as well: based on experience, the RelInt team has learned that it should be more willing to release breaking changes for operators in order to empower the CFF dev teams.

Sabeti
Also _formerly_ of the RelInt team


[0] Bernd has an interesting point about providing patch updates only to the latest release of cf-deployment, as a way to provide operators with a CVE-fix-only release. Providing such releases is also non-trivial work that I'm not sure the RelInt team would prioritize. Also, RelInt ships minor releases twice per week, so the changesets are typically small. Still, it seems a bit more palatable than any kind of LTS because it assists operators in living up to the "you better run fast."



On Wed, Jul 18, 2018 at 10:59 AM Krannich, Bernd <bernd.krannich@...> wrote:

I was about to mention that I indeed enjoyed the existing CF model of releases which roughly translated to “you better run fast” for consumers.

 

The thing I found needed some tweaking in the existing model was the approach to including fixes for prio very high CVEs. Often times, in our quest to run fast and keep systems secure as fast as possible, we ended up pulling in a bunch of features which required additional validation and essentially slowed us down in our effort of rolling things out to production.

 

I felt that the better approach to support people that can keep the speed would have been to always provide fixes for prio very high CVEs as cherry-picks based on the latest released version (and then of course also include those fixes into the next “regular” release, too).

 

Based on the comments so far, it sounds like for consumers “you better run fast” will actually be harder with the newly proposed approach. But maybe I’m not fully understanding the concepts, so it would be great to get some more details on the plans.

 

Regards,

Bernd

 

From: <cf-dev@...> on behalf of Chip Childers <cchilders@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Wednesday, 18. July 2018 at 19:38
To: "cf-dev@..." <cf-dev@...>


Subject: Re: [cf-dev] cf-deployment 3.0

 

Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

 

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

 

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <geoff.franks@...> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <cf-dev@...> on behalf of Marco Voelz <marco.voelz@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Monday, July 16, 2018 at 1:34 AM
To: "cf-dev@..." <cf-dev@...>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: cf-dev@... <cf-dev@...> on behalf of Josh Collins <jcollins@...>
Sent: Friday, July 13, 2018 11:39:30 PM
To: cf-dev@...
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--

Chip Childers
CTO, Cloud Foundry Foundation
1.267.250.0815


Re: cf-deployment 3.0

David Sabeti
 

As the previous project lead for RelInt, I want to speak to Marco's concerns directly. We _definitely_ considered the operator as an important persona during any decision-making; if anything, we were overcommitted to that persona, evidenced by the fact that we became at times an obstacle to CFF dev teams out of fear of making a breaking changes for operators.

There's clearly some concern that operators won't be able to keep up with breaking changes. However, one impact of making breaking changes more frequently -- and, even better, on a schedule -- is to reduce the difficulty of adapting to them. To build a bit on what Josh said earlier in his example about cf-networking 2.0, as we pushed off releasing a major version of cf-deployment, more backwards-incompatible updates were stockpiled in the backlog. In the end, cf-deployment 2.0 included **seven** breaking changes instead of merely one or two.

To link this back to Marco's story -- "As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible" -- this is already a problem with cf-deployment. As others have mentioned, there's no back-porting of cf-deployment after major version bumps, so operators already have to accommodate breaking changes in order to get CVE fixes. I understand that the proposal means that this happens more often, but it also means that major version bumps will be more predictable and less risky.[0]

I wasn't sure if it was worth rehashing the days of cf-release or not, but since Jesse broached the subject, I'd give his comments a +1 all around. One of the ways I understood Josh's proposal was as an important course correction. If cf-release was too free-wheeling in making breaking changes, cf-deployment has been too conservative. The proposal for a regular cadence of breaking changes seems like a balance between those two. Similarly, this is a re-balancing with regards to the personas as well: based on experience, the RelInt team has learned that it should be more willing to release breaking changes for operators in order to empower the CFF dev teams.

Sabeti
Also _formerly_ of the RelInt team


[0] Bernd has an interesting point about providing patch updates only to the latest release of cf-deployment, as a way to provide operators with a CVE-fix-only release. Providing such releases is also non-trivial work that I'm not sure the RelInt team would prioritize. Also, RelInt ships minor releases twice per week, so the changesets are typically small. Still, it seems a bit more palatable than any kind of LTS because it assists operators in living up to the "you better run fast."



On Wed, Jul 18, 2018 at 10:59 AM Krannich, Bernd <bernd.krannich@...> wrote:

I was about to mention that I indeed enjoyed the existing CF model of releases which roughly translated to “you better run fast” for consumers.

 

The thing I found needed some tweaking in the existing model was the approach to including fixes for prio very high CVEs. Often times, in our quest to run fast and keep systems secure as fast as possible, we ended up pulling in a bunch of features which required additional validation and essentially slowed us down in our effort of rolling things out to production.

 

I felt that the better approach to support people that can keep the speed would have been to always provide fixes for prio very high CVEs as cherry-picks based on the latest released version (and then of course also include those fixes into the next “regular” release, too).

 

Based on the comments so far, it sounds like for consumers “you better run fast” will actually be harder with the newly proposed approach. But maybe I’m not fully understanding the concepts, so it would be great to get some more details on the plans.

 

Regards,

Bernd

 

From: <cf-dev@...> on behalf of Chip Childers <cchilders@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Wednesday, 18. July 2018 at 19:38
To: "cf-dev@..." <cf-dev@...>


Subject: Re: [cf-dev] cf-deployment 3.0

 

Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

 

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

 

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <geoff.franks@...> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <cf-dev@...> on behalf of Marco Voelz <marco.voelz@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Monday, July 16, 2018 at 1:34 AM
To: "cf-dev@..." <cf-dev@...>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: cf-dev@... <cf-dev@...> on behalf of Josh Collins <jcollins@...>
Sent: Friday, July 13, 2018 11:39:30 PM
To: cf-dev@...
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--

Chip Childers
CTO, Cloud Foundry Foundation
1.267.250.0815


[High Severity CVE] UAA accepts refresh token as access token on admin endpoints

Dan Jahner
 

CVE-2018-11047: UAA accepts refresh token as access token on admin endpoints


Severity

High

Vendor

Cloud Foundry Foundation

Affected Cloud Foundry Products and Versions

  • You are using uaa versions 4.19 prior to 4.19.2, 4.12 prior to 4.12.4, 4.10 prior to 4.10.2, 4.7 prior to 4.7.6, 4.5 prior to 4.5.7
  • You are using uaa-release versions v60 prior to v60.2, v57 prior to v57.4, v55 prior to v55.2, v52 prior to v52.10, v45 prior to v45.11

Description

Cloud Foundry UAA, versions 4.19 prior to 4.19.2 and 4.12 prior to 4.12.4 and 4.10 prior to 4.10.2 and 4.7 prior to 4.7.6 and 4.5 prior to 4.5.7, incorrectly authorizes requests to admin endpoints by accepting a valid refresh token in lieu of an access token. Refresh tokens by design have a longer expiration time than access tokens, allowing the possessor of a refresh token to authenticate longer than expected. This affects the administrative endpoints of the UAA, e.g. /Users, /Groups, etc. However, if the user has been deleted or had groups removed, or the client was deleted, the refresh token will no longer be valid.

Mitigation

Users of affected versions should apply the following mitigations or upgrades:

  • Releases that have fixed this issue include:
    • uaa versions 4.19.2, 4.12.4, 4.10.2, 4.7.6, 4.5.7
    • uaa-release versions v60.2, v57.4, v55.2, v52.10, v45.11


Re: cf-deployment 3.0

Krannich, Bernd
 

I was about to mention that I indeed enjoyed the existing CF model of releases which roughly translated to “you better run fast” for consumers.

 

The thing I found needed some tweaking in the existing model was the approach to including fixes for prio very high CVEs. Often times, in our quest to run fast and keep systems secure as fast as possible, we ended up pulling in a bunch of features which required additional validation and essentially slowed us down in our effort of rolling things out to production.

 

I felt that the better approach to support people that can keep the speed would have been to always provide fixes for prio very high CVEs as cherry-picks based on the latest released version (and then of course also include those fixes into the next “regular” release, too).

 

Based on the comments so far, it sounds like for consumers “you better run fast” will actually be harder with the newly proposed approach. But maybe I’m not fully understanding the concepts, so it would be great to get some more details on the plans.

 

Regards,

Bernd

 

From: <cf-dev@...> on behalf of Chip Childers <cchilders@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Wednesday, 18. July 2018 at 19:38
To: "cf-dev@..." <cf-dev@...>
Subject: Re: [cf-dev] cf-deployment 3.0

 

Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

 

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

 

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <geoff.franks@...> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <cf-dev@...> on behalf of Marco Voelz <marco.voelz@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Monday, July 16, 2018 at 1:34 AM
To: "cf-dev@..." <cf-dev@...>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: cf-dev@... <cf-dev@...> on behalf of Josh Collins <jcollins@...>
Sent: Friday, July 13, 2018 11:39:30 PM
To: cf-dev@...
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--

Chip Childers
CTO, Cloud Foundry Foundation
1.267.250.0815


Re: cf-deployment 3.0

Jesse T. Alford
 

I don't agree with the claim that we didn't introduce major breaking changes in the past - we did. Routinely.

`cf-release` was sem-ver only insofar as every version was a major version. Changes just as dramatic as this were made on some but not all arbitrary major releases.

The major thing cf-d brings here is real semver, so it's _clear_ that some versions are major changes.

The credo remains the same - forward, always.

Chip's point about long-term support/backported fixes is exactly on-point. It's a major support burden, and is one of the principle pieces of work done by commercial distributors.

Jesse Alford
_Formerly of_ CF Release Integration


On Wed, Jul 18, 2018 at 11:38 AM Chip Childers <cchilders@...> wrote:
Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <geoff.franks@...> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <cf-dev@...> on behalf of Marco Voelz <marco.voelz@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Monday, July 16, 2018 at 1:34 AM
To: "cf-dev@..." <cf-dev@...>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: cf-dev@... <cf-dev@...> on behalf of Josh Collins <jcollins@...>
Sent: Friday, July 13, 2018 11:39:30 PM
To: cf-dev@...
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--
Chip Childers
CTO, Cloud Foundry Foundation
1.267.250.0815


Re: cf-deployment 3.0

Chip Childers <cchilders@...>
 

Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <geoff.franks@...> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <cf-dev@...> on behalf of Marco Voelz <marco.voelz@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Monday, July 16, 2018 at 1:34 AM
To: "cf-dev@..." <cf-dev@...>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: cf-dev@... <cf-dev@...> on behalf of Josh Collins <jcollins@...>
Sent: Friday, July 13, 2018 11:39:30 PM
To: cf-dev@...
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--
Chip Childers
CTO, Cloud Foundry Foundation
1.267.250.0815


CF/K8S SIG Calls

Chip Childers <cchilders@...>
 

All,

We held our last CF/K8S SIG call today, which is great news. They served the purpose of getting a bunch of the interesting work that's happening out into the open, and now most of the efforts are either inside a CFF PMC or on their way there. The attendees agreed that the time had come to discontinue the calls (although Julz says that I should use a bat signal if / when needed in the future).

So for those interested, dive into the various projects directly within the Runtime, Extensions and BOSH PMCs. :)

-chip
--
Chip Childers
CTO, Cloud Foundry Foundation
1.267.250.0815


FINAL REMINDER: CAB call for July is Wednesday (tomorrow) 07/18 @ 8a PST or 11a EST

Michael Maximilien
 

FYI...

Zoom soon. Best,

dr.max
ibm ☁ 
silicon valley, ca



dr.max
ibm ☁ 
silicon valley, ca


On Jul 12, 2018, at 11:16 AM, Michael Maximilien <maxim@...> wrote:

FYI...


Please remember to join the Zoom call [0] Wednesday July 18th at 8a Pacific for QAs, highlights, and two presentations:


1. Project Shield v8 Updates by James Hunt of Stark & Wayne [1] 


2. CF-Extensions Project Service Fabrik Updates by Ashish Jain of SAP  [2] and [3]


Zoom soon. Best,




Re: CF Application Runtime PMC - CF Bits-Service Project Lead Call for Nominations

Simon D Moser
 

Hello all,

IBM is nominating Peter Goetz for the CF Bits Service Project Lead in the Application Runtime PMC.

Peter is a Software Engineer at IBM working both as a core contributor to the Cloud Foundry Bits-Service and on IBM's Cloud Foundry production system.

Prior to joining IBM, Peter worked at Amazon as a technical lead, developing systems to expand Amazon's international business; he holds a Diploma degree in Physics from the University of Stuttgart.

Mit freundlichen Grüßen / Kind regards

Simon Moser

Senior Technical Staff Member / IBM Master Inventor
Bluemix Application Platform Lead Architect
Dept. C727, IBM Research & Development Boeblingen
 
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Schoenaicher Str. 220
71032 Boeblingen
Phone: +49-7031-16-4304
Fax: +49-7031-16-4890
E-Mail: smoser@...
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH / Vorsitzender des
Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht
Stuttgart, HRB 243294
 
**
Great minds discuss ideas; average minds discuss events; small minds discuss people.
Eleanor Roosevelt



From:        "Dieu Cao" <dcao@...>
To:        cf-dev <cf-dev@...>
Date:        16/07/2018 22:19
Subject:        [cf-dev] CF Application Runtime PMC - CF Bits-Service Project Lead Call for Nominations
Sent by:        cf-dev@...




Hello All,

Simon Moser, the Project Lead for the Bits-Service team within the Application Runtime PMC, is rotating into a different role within IBM. We thank him for his time serving as the Bits-Service Project Lead. 

The Bits-Service team, located in Germany, now has an opening for its project lead. Project leads must be nominated by a Cloud Foundry Foundation member.

Please send nominations to me/in reply to this posting by end of day July 23rd, 2018.

If you have any questions about the role/process, please let me know.
These are described in the CFF governance documents. [1]

-Dieu Cao
CF Application Runtime PMC Lead

[1] https://www.cloudfoundry.org/wp-content/uploads/2015/09/CFF_Development_Operations_Policy.pdf




CF Application Runtime PMC - CF Bits-Service Project Lead Call for Nominations

Dieu Cao <dcao@...>
 

Hello All,

Simon Moser, the Project Lead for the Bits-Service team within the Application Runtime PMC, is rotating into a different role within IBM. We thank him for his time serving as the Bits-Service Project Lead. 

The Bits-Service team, located in Germany, now has an opening for its project lead. Project leads must be nominated by a Cloud Foundry Foundation member.

Please send nominations to me/in reply to this posting by end of day July 23rd, 2018.

If you have any questions about the role/process, please let me know.
These are described in the CFF governance documents. [1]

-Dieu Cao
CF Application Runtime PMC Lead


Re: Proposal for weighted routing user experience in Cloud Foundry

Filip Hanik
 

Use case: I want v1-stable to receive 5 times more traffic than each individual upgrade version I deploy


Phase 1: Deploy alpha 1

Proposed (sum MUST add up to a 100):
 v1-stable: 83
 v2-alpha1: 17

Suggested (simpler base1-lb)
v1-stable: 5
v2-alpha1: 1

Phase 2: Deploying Alpha 1 and 2

Proposed (sum MUST add up to a 100):
 v1-stable: 72
 v2-alpha1: 14
 v2-alpha2: 14

Suggested (simpler base1-lb)
v1-stable: 5
v2-alpha1: 1
v2-alpha2: 1

Why the simpler is better:
When adding v2-alpha2 I don't need to change the load balancing algorithm on all my settings. The relationship between v1 and v2-alpha1 remains exactly the same.
I also don't need to be doing any math to understand the relationship between the two.

The proposed base1-lb simply removes the need for percentages and calculations. 






On Fri, Jul 13, 2018 at 8:11 PM Filip Hanik <fhanik@...> wrote:
aaarrgh, there is a bug in my psuedo code

clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v3] should be
clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v4]

Full solution:
Implementation: "Randomized Round Robin" is also super simple [pseudo code follows]

clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v4] //very easy to create based on base1 solution
randomCluster = random(clusterWeight) 
int atomicPointer = 0;
for each request:
  next = atomicPointer.getAndIncrease();
  application = randomCluster[atomicPointer];


On Fri, Jul 13, 2018 at 8:09 PM Filip Hanik <fhanik@...> wrote:
I put a long comment in the doc, maybe comments are good for short notes. here is the spiel

"The sum of weights must add to 100"
I would say this is where being user friendly ends. If I add reviews-v4 I have to go in and rebalance the whole thing just to figure out how to get to 100.

an alternate solution can be much simpler:

What if you just used a single integer that is relative to the whole cluster. let's call it "base1-lb"

reviews-v1: 6
reviews-v2: 3
reviews-v3: 1

there are two ways to think of this

relative to each other:
In this scenario, v1 gets twice as many requests as v2, and six times as many requests as v3

or in consideration of X requests: (and this is most likely how the code implements it so that it doesn't have to do a lot of math)
This is saying is that for every (total) 10 requests, this is how they distributed.

to add v4
reviews-v1: 6
reviews-v2: 3
reviews-v3: 1
reviews-v4: 1

this is still super simple to look at. v1 gets 6x more than v3/v4, still gets 2x more than v2. I don't have to figure out how to "add up to a 100"

and it's not complicated to calculate either. for every 11 requests:
v1 gets 6
v2 gets 3
v3 gets 1
v4 gets 1

Implementation: "Randomized Round Robin" is also super simple [pseudo code follows]

clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v3] //very easy to create based on base1 solution
randomCluster = random(clusterWeight) 
int atomicPointer = 0;
for each request:
  next = atomicPointer.getAndIncrease();
  application = randomCluster[atomicPointer];

and that's it. the router doesn't have to figure out where the next request goes. This is a simple, elegant and easy to understand solution.

Filip








On Fri, Jul 13, 2018 at 3:09 PM Shubha Anjur Tupil <sanjurtupil@...> wrote:

The CF Routing team has received feedback from many users that support for weighted routing would make it easier to accomplish their goals. We have a proposal on the preferred user experience for weighted routing and the considerations we have taken into account.


If you have thoughts on this or have experience working with traffic splitting on other platforms, please share your feedback with us. Feel free to comment on the doc or reply here.


Regards,

CF Routing Team




Re: cf-deployment 3.0

Franks, Geoff
 

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <cf-dev@...> on behalf of Marco Voelz <marco.voelz@...>
Reply-To: "cf-dev@..." <cf-dev@...>
Date: Monday, July 16, 2018 at 1:34 AM
To: "cf-dev@..." <cf-dev@...>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: cf-dev@... <cf-dev@...> on behalf of Josh Collins <jcollins@...>
Sent: Friday, July 13, 2018 11:39:30 PM
To: cf-dev@...
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh


Re: Deprecate route-sync from CFCR to CFAR

Oleksandr Slynko
 

Hi Arghya,

You have mentioned in Github that you were able to overcome this issue.

For everyone else, here is the context and a bit more information.

History
In very early CFCR days, we did not support cloud provider and basically could not give access to the applications and API outside of the cluster. We had HA Proxies to give access to workloads and API.  At that point, several early adopters told us that they would like to try exposing routes in more dynamic way a-la CFAR and possibly reuse existing routing layer. The main point was that we would like to provision multiple clusters with ease and without changed to Cloud Config.
As result we created a route-sync. 

What is does
It solves two problems:
- have stable and known URL for the API, so we can use to sign the certificate
- have a way to expose applications

How we solve it now
For API, we suggest people to wire their load balancers directly and then add the URL to the manifest. For example, check how BBL does it https://github.com/cloudfoundry/bosh-bootloader/tree/master/plan-patches/cfcr-gcp

Are we diverging further from CFAR?
Yes, CFCR team is moving further to the "vanilla" Kubernetes. But we expect other team to provide solutions for both worlds. We don't have enough deep knowledge of CFAR components and getting this knowledge will slow us down in improving Kubernetes experience. 

We are ready to help anyone to understand Kubernetes more and provide better experience with both runtimes.

Sincerely,
Oleksandr


Re: cf-deployment 3.0

Marco Voelz
 

Dear Josh,


Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

With this process, you would have achieved the following:
  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:
"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

Does that sound reasonable?

Warm regards
Marco


From: cf-dev@... <cf-dev@...> on behalf of Josh Collins <jcollins@...>
Sent: Friday, July 13, 2018 11:39:30 PM
To: cf-dev@...
Subject: Re: [cf-dev] cf-deployment 3.0
 
Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh


Re: Proposal for weighted routing user experience in Cloud Foundry

Filip Hanik
 

aaarrgh, there is a bug in my psuedo code

clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v3] should be
clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v4]

Full solution:
Implementation: "Randomized Round Robin" is also super simple [pseudo code follows]

clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v4] //very easy to create based on base1 solution
randomCluster = random(clusterWeight) 
int atomicPointer = 0;
for each request:
  next = atomicPointer.getAndIncrease();
  application = randomCluster[atomicPointer];


On Fri, Jul 13, 2018 at 8:09 PM Filip Hanik <fhanik@...> wrote:
I put a long comment in the doc, maybe comments are good for short notes. here is the spiel

"The sum of weights must add to 100"
I would say this is where being user friendly ends. If I add reviews-v4 I have to go in and rebalance the whole thing just to figure out how to get to 100.

an alternate solution can be much simpler:

What if you just used a single integer that is relative to the whole cluster. let's call it "base1-lb"

reviews-v1: 6
reviews-v2: 3
reviews-v3: 1

there are two ways to think of this

relative to each other:
In this scenario, v1 gets twice as many requests as v2, and six times as many requests as v3

or in consideration of X requests: (and this is most likely how the code implements it so that it doesn't have to do a lot of math)
This is saying is that for every (total) 10 requests, this is how they distributed.

to add v4
reviews-v1: 6
reviews-v2: 3
reviews-v3: 1
reviews-v4: 1

this is still super simple to look at. v1 gets 6x more than v3/v4, still gets 2x more than v2. I don't have to figure out how to "add up to a 100"

and it's not complicated to calculate either. for every 11 requests:
v1 gets 6
v2 gets 3
v3 gets 1
v4 gets 1

Implementation: "Randomized Round Robin" is also super simple [pseudo code follows]

clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v3] //very easy to create based on base1 solution
randomCluster = random(clusterWeight) 
int atomicPointer = 0;
for each request:
  next = atomicPointer.getAndIncrease();
  application = randomCluster[atomicPointer];

and that's it. the router doesn't have to figure out where the next request goes. This is a simple, elegant and easy to understand solution.

Filip








On Fri, Jul 13, 2018 at 3:09 PM Shubha Anjur Tupil <sanjurtupil@...> wrote:

The CF Routing team has received feedback from many users that support for weighted routing would make it easier to accomplish their goals. We have a proposal on the preferred user experience for weighted routing and the considerations we have taken into account.


If you have thoughts on this or have experience working with traffic splitting on other platforms, please share your feedback with us. Feel free to comment on the doc or reply here.


Regards,

CF Routing Team




Re: Proposal for weighted routing user experience in Cloud Foundry

Filip Hanik
 

I put a long comment in the doc, maybe comments are good for short notes. here is the spiel

"The sum of weights must add to 100"
I would say this is where being user friendly ends. If I add reviews-v4 I have to go in and rebalance the whole thing just to figure out how to get to 100.

an alternate solution can be much simpler:

What if you just used a single integer that is relative to the whole cluster. let's call it "base1-lb"

reviews-v1: 6
reviews-v2: 3
reviews-v3: 1

there are two ways to think of this

relative to each other:
In this scenario, v1 gets twice as many requests as v2, and six times as many requests as v3

or in consideration of X requests: (and this is most likely how the code implements it so that it doesn't have to do a lot of math)
This is saying is that for every (total) 10 requests, this is how they distributed.

to add v4
reviews-v1: 6
reviews-v2: 3
reviews-v3: 1
reviews-v4: 1

this is still super simple to look at. v1 gets 6x more than v3/v4, still gets 2x more than v2. I don't have to figure out how to "add up to a 100"

and it's not complicated to calculate either. for every 11 requests:
v1 gets 6
v2 gets 3
v3 gets 1
v4 gets 1

Implementation: "Randomized Round Robin" is also super simple [pseudo code follows]

clusterWeight = [v1,v1,v1,v1,v1,v1,v2,v2,v2,v3,v3] //very easy to create based on base1 solution
randomCluster = random(clusterWeight) 
int atomicPointer = 0;
for each request:
  next = atomicPointer.getAndIncrease();
  application = randomCluster[atomicPointer];

and that's it. the router doesn't have to figure out where the next request goes. This is a simple, elegant and easy to understand solution.

Filip








On Fri, Jul 13, 2018 at 3:09 PM Shubha Anjur Tupil <sanjurtupil@...> wrote:

The CF Routing team has received feedback from many users that support for weighted routing would make it easier to accomplish their goals. We have a proposal on the preferred user experience for weighted routing and the considerations we have taken into account.


If you have thoughts on this or have experience working with traffic splitting on other platforms, please share your feedback with us. Feel free to comment on the doc or reply here.


Regards,

CF Routing Team