Note: lists.cloudfoundry.org will be down for maintenance on Wednesday, October 5th, starting at 9AM Pacific Time (4PM Wednesday October 5, 2022 UTC), for approximately one hour.
Bosh VM deployment priority
Dear cf-bosh group,
I am using Bosh to deploy VMs in a VSphere environment. The cloud manifest contains drs_rules specifying separate_vms.
When I request 5 VMs in a 5 host environment, those 5 VMs are deployed onto the 5 different hosts. I can see - in VCenter - that a DRS rule has been created which contains the 5 VMs. Furthermore, if I migrate one of the VMs to a host which already has a VM, the VMs are shortly after rebalanced to meet the anti-affinity rules. I also see in VCenter that all 5 VMs have custom attributes with drs_rule set to anti-affinity. Good.
However, when I attempt to deploy 6 VMs in a 5 host environment, Bosh deploys the first 5 VMs onto the 5 available hosts. Then the deployment fails for the 6th VM with the following error:
Task 15 | 22:02:15 | Updating instance worker: worker/bdcceaf4-cbc0-4238-87e9-9f6234273b80 (3) (00:01:41)
L Error: Unknown CPI error 'Unknown' with message 'Could not power on VM '<[Vim.VirtualMachine] vm-27888>': DRS cannot find a host to power on or migrate the virtual machine.' in 'create_vm' CPI method (CPI request ID: 'cpi-867860')
That error makes sense. The deployment fails with:
Expected task '15' to succeed but state is 'error'
Exit code 1
At this point I would have expected (perhaps incorrectly) that the 6th VM would remained powered off, but this is not the case. After a few minutes the VM is powered on and scheduled onto a host which already has another VM running on it - violating the anti-affinity rule specified in the cloud manifest. When I look at the 6th VM in VCenter, I see that the VM does NOT have the custom attribute with drs_rule set to anti-affinity. I believe this is what allows VCenter to schedule the VM onto a running host, because that VM is not in the anti-affinity group.
1) Does Bosh (design) prioritize starting the number of requested VMs (in my case 6) over the requested anti-affinity rules (which in my mind would prevent the 6th VM from being powered up)?
2) If "yes" to question 1), is there an option to prevent the 6th VM from being started?
3) If "no" to question 1), is this a bug?