Difference between revisions of "Rebooting XCP VMs the hard way"
(rudimentary guide to rebooting VMs) |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 66: | Line 66: | ||
Kill the host | Kill the host | ||
− | + | /opt/xensource/debug/destroy_domain -domid 20 | |
− | To resurrect, the host can be restarted again with the high-level xe commands. '''However...''' sometimes the virtual block device remains locked, and the host won't restart. | + | To resurrect, the host can be restarted again with the high-level xe commands. '''However...''' sometimes the virtual block device remains locked, and the host won't restart. |
− | ( | + | == Step 5: forget the virtual disk, and find it again == |
+ | |||
+ | In case the VM won't restart, the VDIs (virtual disk images) that were attached to the machine need to be reset. The command 'vdi-forget' will make XCP forget all about a VDI, | ||
+ | including that it was ever associated with a VM! So note the UUID of the VDI and of the storage repository (SR) before forgetting it. | ||
+ | |||
+ | vmname=tbn05.nikhef.nl | ||
+ | xe vm-disk-list name-label=$vmname | ||
+ | Note the VDI(s), and their SRs. '''Store this information for later reference.''' | ||
+ | vmuuid=`xe vm-list name-label=$vmname | sed -n 's/^uuid.*: \(.*\)$/\1/p'` | ||
+ | vbduuid=`xe vbd-list vm-name-label=$vmname | sed -n 's/^uuid.*: \(.*\)$/\1/p'` | ||
+ | vdiuuid=`xe vbd-list vm-name-label=$vmname | sed -n 's/.*vdi-uuid.*: \(.*\)$/\1/p'` | ||
+ | sruuid=`xe vdi-list uuid=$vdiuuid | sed -n 's/.*sr-uuid.*: \(.*\)$/\1/p'` | ||
+ | xe vdi-forget uuid=$vdiuuid | ||
+ | |||
+ | Perform a rescan of the SR to make the VDI available again. | ||
+ | |||
+ | xe sr-scan uuid=<UUID-of-SR> | ||
+ | |||
+ | This step may complain about not being able to deactivate the SR because it is shared. But the | ||
+ | next step seems to work anyway. | ||
+ | |||
+ | Re-add the VDI to the VM. | ||
+ | |||
+ | vbduuid=`xe vbd-create vdi-uuid=$vdiuuid device=1 vm-uuid=$vmuuid` | ||
+ | |||
+ | The 'device' is one of the allowed vbd device numbers which can be obtained from | ||
+ | xe vm-param-get param-name=allowed-VBD-devices uuid=$vmuuid | ||
+ | but unless the machine has more than one disk, this is usually just '1'. | ||
+ | |||
+ | It might be necessary to set the bootable flag on the block device. | ||
+ | |||
+ | xe vbd-param-set uuid=$vbduuid bootable=true | ||
+ | |||
+ | Then start the machine and cross fingers | ||
+ | |||
+ | xe vm-start uuid=$vmuuid | ||
+ | |||
+ | |||
+ | '''Note''' there is a command vdi-unlock, don't know if it works. |
Latest revision as of 10:57, 30 September 2014
It could happen that a machine becomes unresponsive. Services appear down, sometimes ping still works but ssh doesn't. If the machine in question is a virtual machine, this guide will explain where the virtual power switch is and how to toggle it.
This guide strictly discusses the XCP cluster setup.
Step 1: find the host
Production machines run on the XCP pool 'piet', and logging in to the pool is done with
ssh root@pool-piet.inst.ipmi.nikhef.nl
The XCP commands all start with 'xe'. On-line help is available by typing
xe help --all
and
xe help <command>
To find the unresponsive host (called 'laars' for the sake of an example) type
xe vm-list name-label=laars.nikhef.nl uuid ( RO) : bc60eb6c-2d1c-4d24-18c5-38c1d525d5ea name-label ( RW): laars.nikhef.nl power-state ( RO): running
take note of the uuid; some commands require the uuid for reference.
Step 2: try the console
Machines are configured with a serial console, and sometimes it is possible to log in even when other services fail.
xe console name-label=laars.nikhef.nl
If this does not help, try a shutdown of the machine
xe vm-shutdown name-label=laars.nikhef.nl
followed (later) by a vm-start command.
In some cases the OOM killer has such a stranglehold over the system not even a shutdown comes through. In that case, the only way is a forced shutdown with a low-level command from the host where the VM is running.
Step 3: find the host of the VM
List the parameters of the vm (use the UUID here):
xe vm-param-list uuid=bc60eb6c-2d1c-4d24-18c5-38c1d525d5ea
Take note of the 'resident-on' value (the uuid of the host) and the dom-id, e.g.
resident-on ( RO): 7e6fe2e5-6d63-4865-8739-b50608a3e37a dom-id ( RO): 20
Find the host with the host-list command
xe host-list uuid=7e6fe2e5-6d63-4865-8739-b50608a3e37a uuid ( RO) : 7e6fe2e5-6d63-4865-8739-b50608a3e37a name-label ( RW): vms-piet-15.inst.ipmi.nikhef.nl name-description ( RW): Default install of XenServer
So in this example vms-piet-15.inst.ipmi.nikhef.nl is where we have to log on.
Step 4: kill and resurrect the VM
Ssh to root@vms-piet-15.inst.ipmi.nikhef.nl and find the host
xn list erf.nikhef.nl 17 8192 2 Running tbn08.nikhef.nl 19 2048 4 Running Control domain on host: vms-piet-15.inst.ipmi.nikhef.nl0 744 0 Running laars.nikhef.nl 20 8192 2 Running bosui.nikhef.nl 18 2048 2 Running gasbel.nikhef.nl 15 2048 1 Running
Kill the host
/opt/xensource/debug/destroy_domain -domid 20
To resurrect, the host can be restarted again with the high-level xe commands. However... sometimes the virtual block device remains locked, and the host won't restart.
Step 5: forget the virtual disk, and find it again
In case the VM won't restart, the VDIs (virtual disk images) that were attached to the machine need to be reset. The command 'vdi-forget' will make XCP forget all about a VDI, including that it was ever associated with a VM! So note the UUID of the VDI and of the storage repository (SR) before forgetting it.
vmname=tbn05.nikhef.nl xe vm-disk-list name-label=$vmname
Note the VDI(s), and their SRs. Store this information for later reference.
vmuuid=`xe vm-list name-label=$vmname | sed -n 's/^uuid.*: \(.*\)$/\1/p'` vbduuid=`xe vbd-list vm-name-label=$vmname | sed -n 's/^uuid.*: \(.*\)$/\1/p'` vdiuuid=`xe vbd-list vm-name-label=$vmname | sed -n 's/.*vdi-uuid.*: \(.*\)$/\1/p'` sruuid=`xe vdi-list uuid=$vdiuuid | sed -n 's/.*sr-uuid.*: \(.*\)$/\1/p'` xe vdi-forget uuid=$vdiuuid
Perform a rescan of the SR to make the VDI available again.
xe sr-scan uuid=<UUID-of-SR>
This step may complain about not being able to deactivate the SR because it is shared. But the next step seems to work anyway.
Re-add the VDI to the VM.
vbduuid=`xe vbd-create vdi-uuid=$vdiuuid device=1 vm-uuid=$vmuuid`
The 'device' is one of the allowed vbd device numbers which can be obtained from
xe vm-param-get param-name=allowed-VBD-devices uuid=$vmuuid
but unless the machine has more than one disk, this is usually just '1'.
It might be necessary to set the bootable flag on the block device.
xe vbd-param-set uuid=$vbduuid bootable=true
Then start the machine and cross fingers
xe vm-start uuid=$vmuuid
Note there is a command vdi-unlock, don't know if it works.