VMWare, as of writing, has a nasty bug that means your backups that run utilising CBT (hint: if you have basically any enterprise backup product worth its salt, it’s got CBT enabled) it loses track of the changed blocks when the VMDK reaches any Power 2
value of 128GB (128, 256, 512, 1024, etc.) which may make your backup unrecoverable.
The VMWare bug is in KB:
kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2090639
The remedy for this is to disable and re-enable (reset) CBT on the affected machines, this can be done with the machine powered off or with it turned on by running PowerCLI commands and a snapshot, we will be doing the latter, no one likes downtime.
Download and install VMWare PowerCLI then run the following command:
Connect-VIServer -Server {VC-Address}
Enter Username and Password when prompted. Should display output like below:
Name Port User
---- ---- ----
vcsa.domain.com 443 username
The following will run and collect the VMs matching the conditions VMDK>=128GB and CBT enabled into the array $vms
:
[System.Collections.ArrayList]$vms = Get-VM| ?{$_.ExtensionData.Config.Hardware.Device.CapacityInKB -ge 128000000} | ?{$_.ExtensionData.Config.ChangeTrackingEnabled -eq $true}
To view the list of VMs run the following:
echo $vms
You should get a nice list of VMs that match the conditions and likely need CBT reset:
Name PowerState Num CPUs MemoryGB
---- ---------- -------- --------
Machine1.domain... PoweredOn 4 8.000
Machine2.domain... PoweredOn 4 8.000
Machine3.domain... PoweredOn 2 6.000
To reset CBT on these machines while they are live you need to create a VM spec that disables CBT and apply it to the affected machines:
$spec = New-Object VMware.Vim.VirtualMachineConfigSpec; $spec.ChangeTrackingEnabled = $false;
To disable CBT on all VMs affected we then have to apply the $spec
to each VM in the $vms
array:
foreach($vm in $vms){$vm.ExtensionData.ReconfigVM($spec);$snap=$vm | New-Snapshot -Name 'Disable CBT';$snap | Remove-Snapshot -confirm:$false;}
This will apply the $spec
to each VM affected, take a snapshot then remove it to commit the CBT param to turn off.
To check if your command ran successfully run:
get-vm | ?{$_.ExtensionData.Config.ChangeTrackingEnabled -eq $false}
This outputs a list of VMs with CBT disabled – you should see your full list of VMs from above here. If you are using a backup product that forces CBT to on, like Veeam, then you can leave it here, Veeam will re-enable CBT and run a full backup next time (because we have lost our CBT history).
However, if you run a product that doesn’t do this you will need to let your backup run once then run the following command to enable CBT in the spec again and apply to the VMs:
[System.Collections.ArrayList]$vms = Get-VM| ?{$_.ExtensionData.Config.Hardware.Device.CapacityInKB -ge 128000000} | ?{$_.ExtensionData.Config.ChangeTrackingEnabled -eq $false}
$spec = New-Object VMware.Vim.VirtualMachineConfigSpec; $spec.ChangeTrackingEnabled = $true;
foreach($vm in $vms){$vm.ExtensionData.ReconfigVM($spec);$snap=$vm | New-Snapshot -Name 'Disable CBT';$snap | Remove-Snapshot -confirm:$false;}
This is subtly different than the first set of commands; of note are:
.ChangeTrackingEnabled -eq $false
To only pull VMs with CBT disabled into the $vms array.
$spec.ChangeTrackingEnabled = $true;
To enable CBT on machines rather than disable.
This will resolve the problem until your machine crosses another Power 2
border of 128GB when this will need run again.
This bug is currently under research with VMWare and I am keeping an eye on the KB for updates on a hotfix available. Source for PowerShell code that has been adapted from: http://www.veeam.com/kb1940
Why not follow @mylesagray on Twitter for more like this!
Hi, I am following your steps and get stuck on the third step, perhaps I am a bit thick, but I am cutting and pasting your script and I get this error:
The term ‘Get-VM ‘ is not recognized as the name of a cmdlet, function, script
file, or operable program. Check the spelling of the name, or if a path was inc
luded, verify that the path is correct and try again.
At line:1 char:45
+ [System.Collections.ArrayList]$vms = Get-VM <<<< | ?{$_.ExtensionData.Config
.Hardware.Device.CapacityInKB -ge 128000000} | ?{$_.ExtensionData.Config.Change
TrackingEnabled -eq $true}
+ CategoryInfo : ObjectNotFound: (Get-VM :String) [], CommandNotF
oundException
+ FullyQualifiedErrorId : CommandNotFoundException
Are there variables I need to change specific to my environment?
Hi Ashley,
You’ve made sure to install PowerCLI as listed above and launched it instead of straight PowerShell?
Myles
Yes that is correct.
Can you try running the below just on it’s own after you’ve made a successful connection to the vCenter Server with Connect-VIServer:
`Get-VM`
You should get a list of all the VMs in your vCenter?
Correct, that command returns a list of vm’s in my Vcenter.
Ashley,
Got to the bottom of it, apologies, the error you’re seeing is: The term ‘Get-VM ‘ is not recognized – note the extra whitespace after the Get-VM, I have adjusted my code above by removing the space after the Get-VM command, it should work as expected now.
Myles
Thanks, it works now as expected. Cheers!
If its helpful, the version of Vcenter I am running is 5.1 1473063 and the host I am running the script against is running Esxi 5.0 441354
Folks,
so…. after apply above script, we doesn’t need to power cycle VM machine and this will fix the CBT issue temporary while waiting VM to come out a permanent fix?
Note: I’using NBU 6.0.2 making use of CBT for backup.
Correct, no power cycle is needed, your next backup should take a lot longer as it is a full rather than a differential but it should be okay after that.
Myles,
How do we verify that CBT was recreated after running the above script? do we just look at the size of it?
Thanks
I got the answer already, CBT file is deleted once the script is run
Yep thats correct, it is then recreates with zero size
Myles
I notice after run the below script
$spec = New-Object VMware.Vim.VirtualMachineConfigSpec
$spec.ChangeTrackingEnabled = $false
it change the “scsi0:1.deviceType = “disk”” TO “scsi0:1.deviceType = “scsi-hardDisk”” at vmx file and it basically changing the ‘SCSI controller’ from “LSI Logic SAS” to “LSI Logic Parallel”
Can we maintain the SCSI controller type? as I do have other disk is using “VMware Paravirtual” and would like to maintain all the SCSI controller type as what it is now.
Thanks
I’ve never seen that behaviour or that deviceType before, scsi-hardDisk as far as I am aware it doesn’t code for the scsi-controller, that is set with the `scsi*.virtualDev` param in the vmx:
http://faq.sanbarrow.com/index.php?action=artikel&cat=7&id=53&artlang=en
Did you upgrade your ESX instance from a very old version to a newer one or has the machine in question been imported from an OpenStack / Fusion / Workstation / Xen / OtherVirtSolutionHere instance?
More reference on vmx props:
http://sanbarrow.com/vmx/vmx-scsi.html
Myles
Yes, it was upgraded from ESX 4 to 5 and to 5.5
Thanks
Only references to deviceType=”disk” is with relation to IDE controllers.
Where have you found info suggesting deviceType converts from SAS to Parallel?
The SCSI controller should be unaffected by this operation.
From vm guest properties. The scsi controller type changed from lsi logic sas to parallel. This guest is not imported from any platform, it was build from vm itself.
Thanks
it change the “scsi0:1.deviceType = “disk”” TO “scsi0:1.deviceType = “scsi-hardDisk”” at vmx file
See below
http://faq.sanbarrow.com/index.php?action=artikel&cat=7&id=54&artlang=en&highlight=deviceType
It changed “disk” to “scsi-hardDisk” and this cause VM changed LSI Logic SAS to LSI Logic Parallel and this changes is not reversible, as once you try to change it back to LSI Logic SAS, OS drive fail to boot
Thanks
Sounds like one to take up with VMware support – this isn’t an expected behaviour.
Thanks Myles!
Any time :)