Bosh cck - No option to recreate out of sync vm


William C Penrod
 

I manged to get my bosh vms out of sync. The update had failed at a point that left duplicate vms with the persistent disk attached to the 'new' vms but bosh was still registering the old vm name. I tried to shut the new vms down and attach the disk back to the old vms but the jobs had already drained from them so I turned to bosh cck to save me.

Bosh cck could not restart or recreate the vms because the job and index was incorrect. I did not want to delete the vm references as I did not want to lose the persistent disk. After digging around I discovered that I could un-mount the persistent disks and shut the vms down. Bosh cck then gave me the option to recreate the vms, which worked awesome.

It seems that it would make a less stressful experience if bosh cck would allow the user to select the recreate option when vms get out of sync. Can this be added for this scenario or are there other factors I missing?


Dmitriy Kalinin
 

We are actually planning to remove such cases when vm can be in
"out-of-sync" state. Hopefully that solves this problem.

I'm curious about how you ended up with duplicate vms. Any details?

On Tue, Nov 3, 2015 at 3:11 PM, William C Penrod <wcpenrod(a)gmail.com> wrote:

I manged to get my bosh vms out of sync. The update had failed at a point
that left duplicate vms with the persistent disk attached to the 'new' vms
but bosh was still registering the old vm name. I tried to shut the new vms
down and attach the disk back to the old vms but the jobs had already
drained from them so I turned to bosh cck to save me.

Bosh cck could not restart or recreate the vms because the job and index
was incorrect. I did not want to delete the vm references as I did not want
to lose the persistent disk. After digging around I discovered that I could
un-mount the persistent disks and shut the vms down. Bosh cck then gave me
the option to recreate the vms, which worked awesome.

It seems that it would make a less stressful experience if bosh cck would
allow the user to select the recreate option when vms get out of sync. Can
this be added for this scenario or are there other factors I missing?


William C Penrod
 

I had been working with migrating a micro bosh from the bosh_micro plugin
to bosh-init. The migration didn't work so I deleted it and re-attached a
backup I had of the persistent disk to the vm of the original micro bosh.
Once restored, the micro bosh could see everything about the deploys except
the vms which came up as unknown/unknown. At that point I ran bosh cck and
it gave me the option of recreating the vms, which I attempted.

Fail message:
Started applying problem resolutions > missing_vm 191: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:02:56)
Started applying problem resolutions > missing_vm 192: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:03:22)
Started applying problem resolutions > missing_vm 193: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:03:11)
Started applying problem resolutions > missing_vm 194: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:02:32)
Started applying problem resolutions > missing_vm 195: Recreate VM. Done
(00:01:36)

The task debug did not have any further details.

I didn't catch the error from the vmware side but when I tried to start
moving stuff around is when I realized I had duplicate vms as the disks
were locked to different id's than I was expecting.

Thanks for your time on this. Removing the case where the out-of-sync state
could happen would be great.

On Tue, Nov 3, 2015 at 5:10 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io> wrote:

We are actually planning to remove such cases when vm can be in
"out-of-sync" state. Hopefully that solves this problem.

I'm curious about how you ended up with duplicate vms. Any details?

On Tue, Nov 3, 2015 at 3:11 PM, William C Penrod <wcpenrod(a)gmail.com>
wrote:

I manged to get my bosh vms out of sync. The update had failed at a point
that left duplicate vms with the persistent disk attached to the 'new' vms
but bosh was still registering the old vm name. I tried to shut the new vms
down and attach the disk back to the old vms but the jobs had already
drained from them so I turned to bosh cck to save me.

Bosh cck could not restart or recreate the vms because the job and index
was incorrect. I did not want to delete the vm references as I did not want
to lose the persistent disk. After digging around I discovered that I could
un-mount the persistent disks and shut the vms down. Bosh cck then gave me
the option to recreate the vms, which worked awesome.

It seems that it would make a less stressful experience if bosh cck would
allow the user to select the recreate option when vms get out of sync. Can
this be added for this scenario or are there other factors I missing?


Dmitriy Kalinin
 

I see. One more question about migration not working. Do you remember what
was the problem? Some error?

On Wed, Nov 4, 2015 at 8:38 AM, William C Penrod <wcpenrod(a)gmail.com> wrote:

I had been working with migrating a micro bosh from the bosh_micro plugin
to bosh-init. The migration didn't work so I deleted it and re-attached a
backup I had of the persistent disk to the vm of the original micro bosh.
Once restored, the micro bosh could see everything about the deploys except
the vms which came up as unknown/unknown. At that point I ran bosh cck and
it gave me the option of recreating the vms, which I attempted.

Fail message:
Started applying problem resolutions > missing_vm 191: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:02:56)
Started applying problem resolutions > missing_vm 192: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:03:22)
Started applying problem resolutions > missing_vm 193: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:03:11)
Started applying problem resolutions > missing_vm 194: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:02:32)
Started applying problem resolutions > missing_vm 195: Recreate VM. Done
(00:01:36)

The task debug did not have any further details.

I didn't catch the error from the vmware side but when I tried to start
moving stuff around is when I realized I had duplicate vms as the disks
were locked to different id's than I was expecting.

Thanks for your time on this. Removing the case where the out-of-sync
state could happen would be great.


On Tue, Nov 3, 2015 at 5:10 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

We are actually planning to remove such cases when vm can be in
"out-of-sync" state. Hopefully that solves this problem.

I'm curious about how you ended up with duplicate vms. Any details?

On Tue, Nov 3, 2015 at 3:11 PM, William C Penrod <wcpenrod(a)gmail.com>
wrote:

I manged to get my bosh vms out of sync. The update had failed at a
point that left duplicate vms with the persistent disk attached to the
'new' vms but bosh was still registering the old vm name. I tried to shut
the new vms down and attach the disk back to the old vms but the jobs had
already drained from them so I turned to bosh cck to save me.

Bosh cck could not restart or recreate the vms because the job and index
was incorrect. I did not want to delete the vm references as I did not want
to lose the persistent disk. After digging around I discovered that I could
un-mount the persistent disks and shut the vms down. Bosh cck then gave me
the option to recreate the vms, which worked awesome.

It seems that it would make a less stressful experience if bosh cck
would allow the user to select the recreate option when vms get out of
sync. Can this be added for this scenario or are there other factors I
missing?


William C Penrod
 

The micro bosh I was migrating was deployed with all of the default
user/pass combos. If I tried to set any of those differently than default,
I would get in the situation where the bosh-init deployed director vm would
be able to see all of the migrated deploy data except the vms would come
back as unknown/unknown. If I used bosh-init and deployed with all of the
user/pass combos set as the default values, the director vm would be able
to see everything including the job indexes.

This last time I was recovering from using bosh-init to migrate the data
with the user/pass combos set with default values, which worked, then I
tried to add custom passwords to the manifest and run bosh-init deploy to
set them. This caused the director vm to get to the state where it could
see the migrated deployment details but the job index was unknown/unknown
again. Running bosh cck to recreate them would result in a message about
not being able to find a vm deployed from the sc named template.

Are you aware of any issues with bosh-init updating the passwords on a
previously deployed director vm?

On Wed, Nov 4, 2015 at 10:06 AM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

I see. One more question about migration not working. Do you remember what
was the problem? Some error?

On Wed, Nov 4, 2015 at 8:38 AM, William C Penrod <wcpenrod(a)gmail.com>
wrote:

I had been working with migrating a micro bosh from the bosh_micro plugin
to bosh-init. The migration didn't work so I deleted it and re-attached a
backup I had of the persistent disk to the vm of the original micro bosh.
Once restored, the micro bosh could see everything about the deploys except
the vms which came up as unknown/unknown. At that point I ran bosh cck and
it gave me the option of recreating the vms, which I attempted.

Fail message:
Started applying problem resolutions > missing_vm 191: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:02:56)
Started applying problem resolutions > missing_vm 192: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:03:22)
Started applying problem resolutions > missing_vm 193: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:03:11)
Started applying problem resolutions > missing_vm 194: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:02:32)
Started applying problem resolutions > missing_vm 195: Recreate VM. Done
(00:01:36)

The task debug did not have any further details.

I didn't catch the error from the vmware side but when I tried to start
moving stuff around is when I realized I had duplicate vms as the disks
were locked to different id's than I was expecting.

Thanks for your time on this. Removing the case where the out-of-sync
state could happen would be great.


On Tue, Nov 3, 2015 at 5:10 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

We are actually planning to remove such cases when vm can be in
"out-of-sync" state. Hopefully that solves this problem.

I'm curious about how you ended up with duplicate vms. Any details?

On Tue, Nov 3, 2015 at 3:11 PM, William C Penrod <wcpenrod(a)gmail.com>
wrote:

I manged to get my bosh vms out of sync. The update had failed at a
point that left duplicate vms with the persistent disk attached to the
'new' vms but bosh was still registering the old vm name. I tried to shut
the new vms down and attach the disk back to the old vms but the jobs had
already drained from them so I turned to bosh cck to save me.

Bosh cck could not restart or recreate the vms because the job and
index was incorrect. I did not want to delete the vm references as I did
not want to lose the persistent disk. After digging around I discovered
that I could un-mount the persistent disks and shut the vms down. Bosh cck
then gave me the option to recreate the vms, which worked awesome.

It seems that it would make a less stressful experience if bosh cck
would allow the user to select the recreate option when vms get out of
sync. Can this be added for this scenario or are there other factors I
missing?


Dmitriy Kalinin
 

Currently you cannot change the passwords for mbus without cck-ing VMs
since they would not be able to communicate with NATS.

We are planning to remove NATS and have a simpler point-to-point
communications over HTTPS which would fix this problem.

Btw if you have other environments to migrate if you use the same passwords
as used before you should have your environment migrated without this
problem.

On Wed, Nov 4, 2015 at 9:38 AM, William C Penrod <wcpenrod(a)gmail.com> wrote:

The micro bosh I was migrating was deployed with all of the default
user/pass combos. If I tried to set any of those differently than default,
I would get in the situation where the bosh-init deployed director vm would
be able to see all of the migrated deploy data except the vms would come
back as unknown/unknown. If I used bosh-init and deployed with all of the
user/pass combos set as the default values, the director vm would be able
to see everything including the job indexes.

This last time I was recovering from using bosh-init to migrate the data
with the user/pass combos set with default values, which worked, then I
tried to add custom passwords to the manifest and run bosh-init deploy to
set them. This caused the director vm to get to the state where it could
see the migrated deployment details but the job index was unknown/unknown
again. Running bosh cck to recreate them would result in a message about
not being able to find a vm deployed from the sc named template.

Are you aware of any issues with bosh-init updating the passwords on a
previously deployed director vm?

On Wed, Nov 4, 2015 at 10:06 AM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

I see. One more question about migration not working. Do you remember
what was the problem? Some error?

On Wed, Nov 4, 2015 at 8:38 AM, William C Penrod <wcpenrod(a)gmail.com>
wrote:

I had been working with migrating a micro bosh from the bosh_micro
plugin to bosh-init. The migration didn't work so I deleted it and
re-attached a backup I had of the persistent disk to the vm of the original
micro bosh. Once restored, the micro bosh could see everything about the
deploys except the vms which came up as unknown/unknown. At that point I
ran bosh cck and it gave me the option of recreating the vms, which I
attempted.

Fail message:
Started applying problem resolutions > missing_vm 191: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:02:56)
Started applying problem resolutions > missing_vm 192: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:03:22)
Started applying problem resolutions > missing_vm 193: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:03:11)
Started applying problem resolutions > missing_vm 194: Recreate VM.
Failed: Failed to add disk scsi0:2. (00:02:32)
Started applying problem resolutions > missing_vm 195: Recreate VM.
Done (00:01:36)

The task debug did not have any further details.

I didn't catch the error from the vmware side but when I tried to start
moving stuff around is when I realized I had duplicate vms as the disks
were locked to different id's than I was expecting.

Thanks for your time on this. Removing the case where the out-of-sync
state could happen would be great.


On Tue, Nov 3, 2015 at 5:10 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

We are actually planning to remove such cases when vm can be in
"out-of-sync" state. Hopefully that solves this problem.

I'm curious about how you ended up with duplicate vms. Any details?

On Tue, Nov 3, 2015 at 3:11 PM, William C Penrod <wcpenrod(a)gmail.com>
wrote:

I manged to get my bosh vms out of sync. The update had failed at a
point that left duplicate vms with the persistent disk attached to the
'new' vms but bosh was still registering the old vm name. I tried to shut
the new vms down and attach the disk back to the old vms but the jobs had
already drained from them so I turned to bosh cck to save me.

Bosh cck could not restart or recreate the vms because the job and
index was incorrect. I did not want to delete the vm references as I did
not want to lose the persistent disk. After digging around I discovered
that I could un-mount the persistent disks and shut the vms down. Bosh cck
then gave me the option to recreate the vms, which worked awesome.

It seems that it would make a less stressful experience if bosh cck
would allow the user to select the recreate option when vms get out of
sync. Can this be added for this scenario or are there other factors I
missing?