Difference between revisions of "Rebooting XCP VMs the hard way"

From PDP/Grid Wiki
Jump to navigationJump to search
(rudimentary guide to rebooting VMs)
 
Line 66: Line 66:
  
 
Kill the host
 
Kill the host
  xn shutdown 20
+
  /opt/xensource/debug/destroy_domain -domid 20
  
 
To resurrect, the host can be restarted again with the high-level xe commands. '''However...''' sometimes the virtual block device remains locked, and the host won't restart. In that case, the vbd needs to be removed with the command 'vdi-forget'. And re-attached to the same vm (note the UUIDs!)
 
To resurrect, the host can be restarted again with the high-level xe commands. '''However...''' sometimes the virtual block device remains locked, and the host won't restart. In that case, the vbd needs to be removed with the command 'vdi-forget'. And re-attached to the same vm (note the UUIDs!)
  
(Sorry, no examples yet.)
+
xe vdi-list | grep -C2 laars
 +
note the vdi.
 +
xe vdi-forget uuid=b93fe358-ec12-420b-b59b-e7de2cfa6dfe
 +
 
 +
Adding is left as an exercise for the HOD...

Revision as of 16:26, 8 July 2014

It could happen that a machine becomes unresponsive. Services appear down, sometimes ping still works but ssh doesn't. If the machine in question is a virtual machine, this guide will explain where the virtual power switch is and how to toggle it.

This guide strictly discusses the XCP cluster setup.

Step 1: find the host

Production machines run on the XCP pool 'piet', and logging in to the pool is done with

ssh root@pool-piet.inst.ipmi.nikhef.nl

The XCP commands all start with 'xe'. On-line help is available by typing

xe help --all

and

xe help <command>

To find the unresponsive host (called 'laars' for the sake of an example) type

xe vm-list name-label=laars.nikhef.nl
uuid ( RO)           : bc60eb6c-2d1c-4d24-18c5-38c1d525d5ea
     name-label ( RW): laars.nikhef.nl
    power-state ( RO): running

take note of the uuid; some commands require the uuid for reference.

Step 2: try the console

Machines are configured with a serial console, and sometimes it is possible to log in even when other services fail.

xe console name-label=laars.nikhef.nl

If this does not help, try a shutdown of the machine

xe vm-shutdown name-label=laars.nikhef.nl

followed (later) by a vm-start command.

In some cases the OOM killer has such a stranglehold over the system not even a shutdown comes through. In that case, the only way is a forced shutdown with a low-level command from the host where the VM is running.

Step 3: find the host of the VM

List the parameters of the vm (use the UUID here):

xe vm-param-list uuid=bc60eb6c-2d1c-4d24-18c5-38c1d525d5ea

Take note of the 'resident-on' value (the uuid of the host) and the dom-id, e.g.

resident-on ( RO): 7e6fe2e5-6d63-4865-8739-b50608a3e37a
dom-id ( RO): 20

Find the host with the host-list command

xe host-list uuid=7e6fe2e5-6d63-4865-8739-b50608a3e37a
uuid ( RO)                : 7e6fe2e5-6d63-4865-8739-b50608a3e37a
          name-label ( RW): vms-piet-15.inst.ipmi.nikhef.nl
    name-description ( RW): Default install of XenServer

So in this example vms-piet-15.inst.ipmi.nikhef.nl is where we have to log on.

Step 4: kill and resurrect the VM

Ssh to root@vms-piet-15.inst.ipmi.nikhef.nl and find the host

xn list

erf.nikhef.nl                                17   8192  2         Running 
tbn08.nikhef.nl                              19   2048  4         Running 
Control domain on host: vms-piet-15.inst.ipmi.nikhef.nl0    744   0          Running  
laars.nikhef.nl                              20   8192  2         Running 
bosui.nikhef.nl                              18   2048  2         Running 
gasbel.nikhef.nl                             15   2048  1         Running 

Kill the host

/opt/xensource/debug/destroy_domain -domid 20

To resurrect, the host can be restarted again with the high-level xe commands. However... sometimes the virtual block device remains locked, and the host won't restart. In that case, the vbd needs to be removed with the command 'vdi-forget'. And re-attached to the same vm (note the UUIDs!)

xe vdi-list | grep -C2 laars

note the vdi.

xe vdi-forget uuid=b93fe358-ec12-420b-b59b-e7de2cfa6dfe

Adding is left as an exercise for the HOD...