Any tips on migrating an entire CF deployment to a new cloud?


Aaron Huber
 

We're just starting a project to migrate a CF deployment onto a different
cloud platform, including moving a new blobstore cluster (Swift to Swift),
and wanted to see if there are any tips on the best way to do this.
Obviously we'll test it extensively and our goal is to have minimal or no
downtime for running apps.

High level here is what we are thinking:

1) Deploy a new CF environment on the new cloud with the same configuration
and secrets, just changing cloud properties, IP addresses, etc. Test and
make sure everything works and then shut down all services.
2) Start syncing files from the old blobstore to the new one. We will be
looking into using rclone.
3) When ready to migrate, shut down the cloud controllers, UAA, and BBS/ETCD
on the old environment.
4) Back up the old CC and UAA databases and restore them to the new VMs.
5) Clear /var/vcap/store/etcd on all the new ETCD servers, copy the contents
of that folder one on one of the old ETCD servers to one of the new ones and
start it up to restore the cluster.
6) Run a final sync of the blobstore.
7) Bring up all services on the new environment and ensure all apps are
running.
8) Update load balancer pools to point to the new environment.

Has anyone had to do this before and have any recommendations on steps I'm
missing or improvements to the process? How long will apps on Diego stay
running with BBS and the cloud controller down during the migration?

Thanks in advance for any advice.

Aaron





--
View this message in context: http://cf-dev.70369.x6.nabble.com/Any-tips-on-migrating-an-entire-CF-deployment-to-a-new-cloud-tp5011.html
Sent from the CF Dev mailing list archive at Nabble.com.


Amit Kumar Gupta
 

You should not copy any of the etcd data (neither Diego's etcd cluster, nor
the etcd cluster used by Loggregator and Routing components). It's not
only unnecessary since CF is designed to seamlessly repopulate the data it
puts in etcd from scratch, it would actually be conflicting in the case of
Diego since the Garden container IDs will end up changing.

In case you didn't know bosh can do this now, there's bosh attach disk
<https://bosh.io/docs/sysadmin-commands.html#disks> which might be useful
for your exercise.



On Fri, May 27, 2016 at 9:43 PM, aaron_huber <aaron.m.huber(a)intel.com>
wrote:

We're just starting a project to migrate a CF deployment onto a different
cloud platform, including moving a new blobstore cluster (Swift to Swift),
and wanted to see if there are any tips on the best way to do this.
Obviously we'll test it extensively and our goal is to have minimal or no
downtime for running apps.

High level here is what we are thinking:

1) Deploy a new CF environment on the new cloud with the same configuration
and secrets, just changing cloud properties, IP addresses, etc. Test and
make sure everything works and then shut down all services.
2) Start syncing files from the old blobstore to the new one. We will be
looking into using rclone.
3) When ready to migrate, shut down the cloud controllers, UAA, and
BBS/ETCD
on the old environment.
4) Back up the old CC and UAA databases and restore them to the new VMs.
5) Clear /var/vcap/store/etcd on all the new ETCD servers, copy the
contents
of that folder one on one of the old ETCD servers to one of the new ones
and
start it up to restore the cluster.
6) Run a final sync of the blobstore.
7) Bring up all services on the new environment and ensure all apps are
running.
8) Update load balancer pools to point to the new environment.

Has anyone had to do this before and have any recommendations on steps I'm
missing or improvements to the process? How long will apps on Diego stay
running with BBS and the cloud controller down during the migration?

Thanks in advance for any advice.

Aaron





--
View this message in context:
http://cf-dev.70369.x6.nabble.com/Any-tips-on-migrating-an-entire-CF-deployment-to-a-new-cloud-tp5011.html
Sent from the CF Dev mailing list archive at Nabble.com.


Felix Friedrich
 

We've migrated once from one vsphere to another one. We created a new
bosh with the same network config on the target vsphere and created a
new manifest for this bosh with an instance count of zero for all jobs
at start. Then moved over the components one after another by generating
new manifests with instance count of zero on the source side and an
desired instance count on the destination side. (I forgot about the
actual order but we started with runners zone 1 the zone 2, and
proceeded with all other components). At a certain point we stopped the
UAAs and API servers in order to migrate the postgres database manually.
All apps had zero downtime during that process, just the API were down
for a few minutes.

This obviously only works if you're able to use the same IP range on the
destinations IaaS. As we are using NFS for the blobstore we did not need
to migrate that. But I guess it should be a similar process to the
postgres.

Hope that help a little,

Felix

On Sat, 28 May 2016, at 07:34, Amit Gupta wrote:
You should not copy any of the etcd data (neither Diego's etcd cluster,
nor
the etcd cluster used by Loggregator and Routing components). It's not
only unnecessary since CF is designed to seamlessly repopulate the data
it
puts in etcd from scratch, it would actually be conflicting in the case
of
Diego since the Garden container IDs will end up changing.

In case you didn't know bosh can do this now, there's bosh attach disk
<https://bosh.io/docs/sysadmin-commands.html#disks> which might be useful
for your exercise.



On Fri, May 27, 2016 at 9:43 PM, aaron_huber <aaron.m.huber(a)intel.com>
wrote:

We're just starting a project to migrate a CF deployment onto a different
cloud platform, including moving a new blobstore cluster (Swift to Swift),
and wanted to see if there are any tips on the best way to do this.
Obviously we'll test it extensively and our goal is to have minimal or no
downtime for running apps.

High level here is what we are thinking:

1) Deploy a new CF environment on the new cloud with the same configuration
and secrets, just changing cloud properties, IP addresses, etc. Test and
make sure everything works and then shut down all services.
2) Start syncing files from the old blobstore to the new one. We will be
looking into using rclone.
3) When ready to migrate, shut down the cloud controllers, UAA, and
BBS/ETCD
on the old environment.
4) Back up the old CC and UAA databases and restore them to the new VMs.
5) Clear /var/vcap/store/etcd on all the new ETCD servers, copy the
contents
of that folder one on one of the old ETCD servers to one of the new ones
and
start it up to restore the cluster.
6) Run a final sync of the blobstore.
7) Bring up all services on the new environment and ensure all apps are
running.
8) Update load balancer pools to point to the new environment.

Has anyone had to do this before and have any recommendations on steps I'm
missing or improvements to the process? How long will apps on Diego stay
running with BBS and the cloud controller down during the migration?

Thanks in advance for any advice.

Aaron





--
View this message in context:
http://cf-dev.70369.x6.nabble.com/Any-tips-on-migrating-an-entire-CF-deployment-to-a-new-cloud-tp5011.html
Sent from the CF Dev mailing list archive at Nabble.com.