Rotating cf-deployment certificates


David Sabeti
 

Hey cf-dev,

The Release Integration team has had a few reports from other CF engineering teams that their long-running environments have had their internal TLS certificates expire. Since certificates generated by the BOSH CLI get a one-year expiration date, and it's been about a year since early adopters started using cf-deployment, we suspect that some older environments in the CF community are fast approaching this issue as well. We hope to provide enough of a warning that folks in the community can address this.

Check your certificate expiration dates
This is pretty simple to do. You can copy a certificate -- service_cf_internal_ca is a good one to try -- and paste it into the form on this site: https://www.sslshopper.com/certificate-decoder.html. You'll find the expiration date in the "Valid To" section. If your certificates going to expire soon, continue to the process below.

How to rotate certificates
This is not an easy process, but it's doable. I'll warn you right now that, during the transition, your CF will experience `cf push` downtime, but apps should remain available. Also, if you're deploying with the windows-cell.yml or secure-service-credentials.yml ops-files, the process will be a bit more complicated, so please reach out to the RelInt team for help.
  1. Deploy with concatenated CA certificates
    1. Generate new certs by running
    2. bosh int cf-deployment.yml [-o ... ] --vars-store new-vars.yml -v system_domain=$SYSTEM_DOMAIN
    3. For each new CA cert, concatenate the new CA certificate to both the `ca` and `certificate` field.
    4. Deploy
  2. Deploy with new leaf certificates
    1. For each leaf certificate in your vars-store, replace with the corresponding certificate from new-vars.yml. These leaf certificates are signed by the new CA's.
    2. Deploy. When the api instances roll, users will no longer be able to push apps, until you remove the old CA certificates.
  3. Deploy without the old CA certificates.
    1. For each CA certificate in your vars-store, remove the first certificate in the `ca` and `certificate` fields. The result should be that only the new CA certificates created in step 1.1 should be included in your vars-store.
The RelInt team has also worked through a process for rotating certificates that have already expired. If you have any questions or concerns, jump into the #release-integration channel in the Cloud Foundry slack and feel free to get a hold of the team there.

Thanks!
CF Release Integration



Carlo Alberto Ferraris
 

Just a couple of random notes about this:
- since we have a lot of certificates in our deployment manifest (not just the CF/diego ones) we actually have a step in our deployment process that automatically checks if any of them is close to the expiration date (or invalid for other reasons) if anybody is interested we can publish it out somewhere
- would be nice to have the cert generation scripts prompt for the desired validity of the certificates (to avoid surprises)


Benjamin Gandon
 

Hello Carlo,

I'm definitely interested by your step that checks if any of the certs are close to the expiration date!
If you can share this on a Github somewhere it would be perfect!

Cdt,
/Benjamin GANDON (depuis mon iPhone)

Le 6 mars 2018 à 03:05, Carlo Alberto Ferraris <carlo.ferraris@...> a écrit :

Just a couple of random notes about this:
- since we have a lot of certificates in our deployment manifest (not just the CF/diego ones) we actually have a step in our deployment process that automatically checks if any of them is close to the expiration date (or invalid for other reasons) if anybody is interested we can publish it out somewhere
- would be nice to have the cert generation scripts prompt for the desired validity of the certificates (to avoid surprises)


Mike Youngstrom <youngm@...>
 

Thanks for the heads up David.  I have questions about the rotation process.

Although all applications may remain up while re-deploying I imagine things like loggregator will stop working mid deploy when doppler and metron certs no longer match.  Perhaps reps will be unable to properly drain when their certs don't match?  Does that sound correct?

Is the expiration default the same for certificates created by credhub?  Are you aware of any way to increase the default expiration date for credhub or bosh-cli?

Long term are core teams working towards zero downtime cert rotation capabilities?  Or do you foresee the need to rotate with some service impact an issue long term?

Thanks,
Mike

On Fri, Mar 2, 2018 at 11:32 AM, David Sabeti <dsabeti@...> wrote:
Hey cf-dev,

The Release Integration team has had a few reports from other CF engineering teams that their long-running environments have had their internal TLS certificates expire. Since certificates generated by the BOSH CLI get a one-year expiration date, and it's been about a year since early adopters started using cf-deployment, we suspect that some older environments in the CF community are fast approaching this issue as well. We hope to provide enough of a warning that folks in the community can address this.

Check your certificate expiration dates
This is pretty simple to do. You can copy a certificate -- service_cf_internal_ca is a good one to try -- and paste it into the form on this site: https://www.sslshopper.com/certificate-decoder.html. You'll find the expiration date in the "Valid To" section. If your certificates going to expire soon, continue to the process below.

How to rotate certificates
This is not an easy process, but it's doable. I'll warn you right now that, during the transition, your CF will experience `cf push` downtime, but apps should remain available. Also, if you're deploying with the windows-cell.yml or secure-service-credentials.yml ops-files, the process will be a bit more complicated, so please reach out to the RelInt team for help.
  1. Deploy with concatenated CA certificates
    1. Generate new certs by running
    2. bosh int cf-deployment.yml [-o ... ] --vars-store new-vars.yml -v system_domain=$SYSTEM_DOMAIN
    3. For each new CA cert, concatenate the new CA certificate to both the `ca` and `certificate` field.
    4. Deploy
  2. Deploy with new leaf certificates
    1. For each leaf certificate in your vars-store, replace with the corresponding certificate from new-vars.yml. These leaf certificates are signed by the new CA's.
    2. Deploy. When the api instances roll, users will no longer be able to push apps, until you remove the old CA certificates.
  3. Deploy without the old CA certificates.
    1. For each CA certificate in your vars-store, remove the first certificate in the `ca` and `certificate` fields. The result should be that only the new CA certificates created in step 1.1 should be included in your vars-store.
The RelInt team has also worked through a process for rotating certificates that have already expired. If you have any questions or concerns, jump into the #release-integration channel in the Cloud Foundry slack and feel free to get a hold of the team there.

Thanks!
CF Release Integration




Iryna Shustava
 

Hey Mike,

Although all applications may remain up while re-deploying I imagine things like loggregator will stop working mid deploy when doppler and metron certs no longer match.  Perhaps reps will be unable to properly drain when their certs don't match?  Does that sound correct?

We expect no app routability or log availability downtime during the 3 step CA/cert rotation. That is because during Step 1 we make all components trust both CAs - the old one and the new one. During step 2, when we roll out new leaf certificates, all components should trust both CAs, so the certificate switch will happen without downtime. You will, however, see some cf push downtime, as David mentioned.

Is the expiration default the same for certificates created by credhub?  Are you aware of any way to increase the default expiration date for credhub or bosh-cli?

The default expiration is the same for CredHub and BOSH CLI. BOSH CLI does not allow you to change certificate expiration period, but CredHub does. You can do so by adding the duration property measured in days to your certificate or CA variable in the manifest.

Long term are core teams working towards zero downtime cert rotation capabilities?  Or do you foresee the need to rotate with some service impact an issue long term?

If you're interested, this doc describes reasons behind cf push downtime during CA rotation.

Thanks!
Iryna, CF Release Integration Team


On Tue, Mar 6, 2018 at 9:21 AM, Mike Youngstrom <youngm@...> wrote:
Thanks for the heads up David.  I have questions about the rotation process.

Although all applications may remain up while re-deploying I imagine things like loggregator will stop working mid deploy when doppler and metron certs no longer match.  Perhaps reps will be unable to properly drain when their certs don't match?  Does that sound correct?

Is the expiration default the same for certificates created by credhub?  Are you aware of any way to increase the default expiration date for credhub or bosh-cli?

Long term are core teams working towards zero downtime cert rotation capabilities?  Or do you foresee the need to rotate with some service impact an issue long term?

Thanks,
Mike

On Fri, Mar 2, 2018 at 11:32 AM, David Sabeti <dsabeti@...> wrote:
Hey cf-dev,

The Release Integration team has had a few reports from other CF engineering teams that their long-running environments have had their internal TLS certificates expire. Since certificates generated by the BOSH CLI get a one-year expiration date, and it's been about a year since early adopters started using cf-deployment, we suspect that some older environments in the CF community are fast approaching this issue as well. We hope to provide enough of a warning that folks in the community can address this.

Check your certificate expiration dates
This is pretty simple to do. You can copy a certificate -- service_cf_internal_ca is a good one to try -- and paste it into the form on this site: https://www.sslshopper.com/certificate-decoder.html. You'll find the expiration date in the "Valid To" section. If your certificates going to expire soon, continue to the process below.

How to rotate certificates
This is not an easy process, but it's doable. I'll warn you right now that, during the transition, your CF will experience `cf push` downtime, but apps should remain available. Also, if you're deploying with the windows-cell.yml or secure-service-credentials.yml ops-files, the process will be a bit more complicated, so please reach out to the RelInt team for help.
  1. Deploy with concatenated CA certificates
    1. Generate new certs by running
    2. bosh int cf-deployment.yml [-o ... ] --vars-store new-vars.yml -v system_domain=$SYSTEM_DOMAIN
    3. For each new CA cert, concatenate the new CA certificate to both the `ca` and `certificate` field.
    4. Deploy
  2. Deploy with new leaf certificates
    1. For each leaf certificate in your vars-store, replace with the corresponding certificate from new-vars.yml. These leaf certificates are signed by the new CA's.
    2. Deploy. When the api instances roll, users will no longer be able to push apps, until you remove the old CA certificates.
  3. Deploy without the old CA certificates.
    1. For each CA certificate in your vars-store, remove the first certificate in the `ca` and `certificate` fields. The result should be that only the new CA certificates created in step 1.1 should be included in your vars-store.
The RelInt team has also worked through a process for rotating certificates that have already expired. If you have any questions or concerns, jump into the #release-integration channel in the Cloud Foundry slack and feel free to get a hold of the team there.

Thanks!
CF Release Integration





Mike Youngstrom <youngm@...>
 

Thanks for the clarification Iryna!  I'll do some more studying and respond if I have further questions.

Mike

On Tue, Mar 6, 2018 at 12:07 PM, Iryna Shustava <ishustava@...> wrote:
Hey Mike,

Although all applications may remain up while re-deploying I imagine things like loggregator will stop working mid deploy when doppler and metron certs no longer match.  Perhaps reps will be unable to properly drain when their certs don't match?  Does that sound correct?

We expect no app routability or log availability downtime during the 3 step CA/cert rotation. That is because during Step 1 we make all components trust both CAs - the old one and the new one. During step 2, when we roll out new leaf certificates, all components should trust both CAs, so the certificate switch will happen without downtime. You will, however, see some cf push downtime, as David mentioned.

Is the expiration default the same for certificates created by credhub?  Are you aware of any way to increase the default expiration date for credhub or bosh-cli?

The default expiration is the same for CredHub and BOSH CLI. BOSH CLI does not allow you to change certificate expiration period, but CredHub does. You can do so by adding the duration property measured in days to your certificate or CA variable in the manifest.

Long term are core teams working towards zero downtime cert rotation capabilities?  Or do you foresee the need to rotate with some service impact an issue long term?

If you're interested, this doc describes reasons behind cf push downtime during CA rotation.

Thanks!
Iryna, CF Release Integration Team


On Tue, Mar 6, 2018 at 9:21 AM, Mike Youngstrom <youngm@...> wrote:
Thanks for the heads up David.  I have questions about the rotation process.

Although all applications may remain up while re-deploying I imagine things like loggregator will stop working mid deploy when doppler and metron certs no longer match.  Perhaps reps will be unable to properly drain when their certs don't match?  Does that sound correct?

Is the expiration default the same for certificates created by credhub?  Are you aware of any way to increase the default expiration date for credhub or bosh-cli?

Long term are core teams working towards zero downtime cert rotation capabilities?  Or do you foresee the need to rotate with some service impact an issue long term?

Thanks,
Mike

On Fri, Mar 2, 2018 at 11:32 AM, David Sabeti <dsabeti@...> wrote:
Hey cf-dev,

The Release Integration team has had a few reports from other CF engineering teams that their long-running environments have had their internal TLS certificates expire. Since certificates generated by the BOSH CLI get a one-year expiration date, and it's been about a year since early adopters started using cf-deployment, we suspect that some older environments in the CF community are fast approaching this issue as well. We hope to provide enough of a warning that folks in the community can address this.

Check your certificate expiration dates
This is pretty simple to do. You can copy a certificate -- service_cf_internal_ca is a good one to try -- and paste it into the form on this site: https://www.sslshopper.com/certificate-decoder.html. You'll find the expiration date in the "Valid To" section. If your certificates going to expire soon, continue to the process below.

How to rotate certificates
This is not an easy process, but it's doable. I'll warn you right now that, during the transition, your CF will experience `cf push` downtime, but apps should remain available. Also, if you're deploying with the windows-cell.yml or secure-service-credentials.yml ops-files, the process will be a bit more complicated, so please reach out to the RelInt team for help.
  1. Deploy with concatenated CA certificates
    1. Generate new certs by running
    2. bosh int cf-deployment.yml [-o ... ] --vars-store new-vars.yml -v system_domain=$SYSTEM_DOMAIN
    3. For each new CA cert, concatenate the new CA certificate to both the `ca` and `certificate` field.
    4. Deploy
  2. Deploy with new leaf certificates
    1. For each leaf certificate in your vars-store, replace with the corresponding certificate from new-vars.yml. These leaf certificates are signed by the new CA's.
    2. Deploy. When the api instances roll, users will no longer be able to push apps, until you remove the old CA certificates.
  3. Deploy without the old CA certificates.
    1. For each CA certificate in your vars-store, remove the first certificate in the `ca` and `certificate` fields. The result should be that only the new CA certificates created in step 1.1 should be included in your vars-store.
The RelInt team has also worked through a process for rotating certificates that have already expired. If you have any questions or concerns, jump into the #release-integration channel in the Cloud Foundry slack and feel free to get a hold of the team there.

Thanks!
CF Release Integration






Mike Youngstrom <youngm@...>
 

So, reading through the document you provided Iryna and re-reading David's rotation steps everything now makes sense when using bosh-cli generated certs.

Are there steps to do the same with credhub managed certificates?

Thanks,
Mike

On Tue, Mar 6, 2018 at 1:22 PM, Mike Youngstrom <youngm@...> wrote:
Thanks for the clarification Iryna!  I'll do some more studying and respond if I have further questions.

Mike

On Tue, Mar 6, 2018 at 12:07 PM, Iryna Shustava <ishustava@...> wrote:
Hey Mike,

Although all applications may remain up while re-deploying I imagine things like loggregator will stop working mid deploy when doppler and metron certs no longer match.  Perhaps reps will be unable to properly drain when their certs don't match?  Does that sound correct?

We expect no app routability or log availability downtime during the 3 step CA/cert rotation. That is because during Step 1 we make all components trust both CAs - the old one and the new one. During step 2, when we roll out new leaf certificates, all components should trust both CAs, so the certificate switch will happen without downtime. You will, however, see some cf push downtime, as David mentioned.

Is the expiration default the same for certificates created by credhub?  Are you aware of any way to increase the default expiration date for credhub or bosh-cli?

The default expiration is the same for CredHub and BOSH CLI. BOSH CLI does not allow you to change certificate expiration period, but CredHub does. You can do so by adding the duration property measured in days to your certificate or CA variable in the manifest.

Long term are core teams working towards zero downtime cert rotation capabilities?  Or do you foresee the need to rotate with some service impact an issue long term?

If you're interested, this doc describes reasons behind cf push downtime during CA rotation.

Thanks!
Iryna, CF Release Integration Team


On Tue, Mar 6, 2018 at 9:21 AM, Mike Youngstrom <youngm@...> wrote:
Thanks for the heads up David.  I have questions about the rotation process.

Although all applications may remain up while re-deploying I imagine things like loggregator will stop working mid deploy when doppler and metron certs no longer match.  Perhaps reps will be unable to properly drain when their certs don't match?  Does that sound correct?

Is the expiration default the same for certificates created by credhub?  Are you aware of any way to increase the default expiration date for credhub or bosh-cli?

Long term are core teams working towards zero downtime cert rotation capabilities?  Or do you foresee the need to rotate with some service impact an issue long term?

Thanks,
Mike

On Fri, Mar 2, 2018 at 11:32 AM, David Sabeti <dsabeti@...> wrote:
Hey cf-dev,

The Release Integration team has had a few reports from other CF engineering teams that their long-running environments have had their internal TLS certificates expire. Since certificates generated by the BOSH CLI get a one-year expiration date, and it's been about a year since early adopters started using cf-deployment, we suspect that some older environments in the CF community are fast approaching this issue as well. We hope to provide enough of a warning that folks in the community can address this.

Check your certificate expiration dates
This is pretty simple to do. You can copy a certificate -- service_cf_internal_ca is a good one to try -- and paste it into the form on this site: https://www.sslshopper.com/certificate-decoder.html. You'll find the expiration date in the "Valid To" section. If your certificates going to expire soon, continue to the process below.

How to rotate certificates
This is not an easy process, but it's doable. I'll warn you right now that, during the transition, your CF will experience `cf push` downtime, but apps should remain available. Also, if you're deploying with the windows-cell.yml or secure-service-credentials.yml ops-files, the process will be a bit more complicated, so please reach out to the RelInt team for help.
  1. Deploy with concatenated CA certificates
    1. Generate new certs by running
    2. bosh int cf-deployment.yml [-o ... ] --vars-store new-vars.yml -v system_domain=$SYSTEM_DOMAIN
    3. For each new CA cert, concatenate the new CA certificate to both the `ca` and `certificate` field.
    4. Deploy
  2. Deploy with new leaf certificates
    1. For each leaf certificate in your vars-store, replace with the corresponding certificate from new-vars.yml. These leaf certificates are signed by the new CA's.
    2. Deploy. When the api instances roll, users will no longer be able to push apps, until you remove the old CA certificates.
  3. Deploy without the old CA certificates.
    1. For each CA certificate in your vars-store, remove the first certificate in the `ca` and `certificate` fields. The result should be that only the new CA certificates created in step 1.1 should be included in your vars-store.
The RelInt team has also worked through a process for rotating certificates that have already expired. If you have any questions or concerns, jump into the #release-integration channel in the Cloud Foundry slack and feel free to get a hold of the team there.

Thanks!
CF Release Integration







Aaron Huber
 

This one-liner will grab all the certs out of the vars files used by the bosh-cli and print out the expiration dates which is useful for a quick check:

openssl crl2pkcs7 -nocrl -certfile <(sed -n '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' *vars.yml | sed -e 's/^[ \t]*//') | openssl pkcs7 -print_certs -text -noout | sed -e 's/^[ \t]*//' | grep -E "Issuer:|Subject:|Not\ After\ :" | awk '{ if ((NR % 3) == 1) printf("\n*******\n\n"); print; }'

Aaron


Iryna Shustava
 

Hey Mike,

Check out this doc regarding CA rotation with CredHub: https://github.com/pivotal-cf/credhub-release/blob/master/docs/ca-rotation.md.

Cheers,
Iryna


On Tue, Mar 6, 2018 at 2:44 PM, Aaron Huber <aaron.m.huber@...> wrote:
This one-liner will grab all the certs out of the vars files used by the bosh-cli and print out the expiration dates which is useful for a quick check:

openssl crl2pkcs7 -nocrl -certfile <(sed -n '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' *vars.yml | sed -e 's/^[ \t]*//') | openssl pkcs7 -print_certs -text -noout | sed -e 's/^[ \t]*//' | grep -E "Issuer:|Subject:|Not\ After\ :" | awk '{ if ((NR % 3) == 1) printf("\n*******\n\n"); print; }'

Aaron