GSP Virtualisation with Xen
The Grid Server Park machines (general services used for Nikhef and BiG Grid) is going to be run with a centrally managed and controlled virtualisation environment. After the testing and evaluation period it is likely (but not yet certain) that the Open Source Xen Cloud Platform (XCP, version 1.5beta) is going to be chosen to run this infrastructure. The aim is to bring all systems under XCP control, managed in a set of clusters:
General information
- http://wiki.xen.org/xenwiki/XCP/XenServer_Feature_Matrix
- http://xen.org/download/xcp/index_1.5.0.html
Hardware
Cluster | qty | system type | VM server hostnames | current master |
Piet | 16 systems | M610: 12 cores, 24 SMT threads, 96 GiB, 2x600GB 10k SAS, dual 8G FC, dual 1GbE + dual 10GbE | vms-piet-*.inst.ipmi.nikhef.nl | |
Generic | 8 systems | PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE | vms-gen-*.inst.ipmi.nikhef.nl | vms-gen-05 |
BL0 | 5 systems | M610: 8 cores, 32 GiB RAM, 2x300GB 10k SAS, dual 1GbE (+dual 8G FC) | vms-bl0-*.inst.ipmi.nikhef.nl | |
Security | 2 systems | PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE | vms-sec-*.inst.ipmi.nikhef.nl |
Networking
Installation network
The management, heartbeat, and live migration domain run over the installation network "ve11", and hostnames by convention have names like "<host>.inst.ipmi.nikhef.nl" (with DNS master from teugel). The installation network is usually server untagged alone, or untagged over a hybrid trunk port.
Network configuration
IPv4 | IPv6 | ||
Network | 172.22.64.0/18 | 2001:0610:0120:E022::/64 | |
Gateway | 172.22.127.254 | 2001:0610:0120:E022::1/64 | |
Broadcast | 172.22.127.255 |
Server installation
Faking XenServer 6.0 for XenCenter Management
By default, the XCP hypervisor will present itself as a very-old-XenServer (v1) instance, and XenCenter will refure to do some of the more advanced features like dynamic memory, snapshots, and live migration. This can be fixed by manually 'editing' the XenServer version string in the xapi program, as describe don the Xen wiki. After each new installation and upgrade of the xapi binary:
cd /opt/xensource/bin /etc/init.d/xapi stop cp -vp xapi xapi.original sed -i 's/1\.4\.90/6.0.99/g' xapi /etc/init.d/xapi start
On previous versions (and maybe in the final version of XCP1.5), the following also worked and would be persistent:
echo "6.0.99" > /etc/xensource/xapi_version_override /etc/init.d/xapi restart
But now XenCenter will complain about the XenTools being out of date. Fix this for Linux guests by installing XenTools and subsequently:
sed -i /usr/sbin/xe-update-guest-attrs -e 's/MajorVersion" "1"/MajorVersion" "6"/' sed -i /usr/sbin/xe-update-guest-attrs -e 's/MinorVersion" "4"/MinorVersion" "0"/' sed -i /usr/sbin/xe-update-guest-attrs -e 's/MicroVersion" "90"/MicroVersion" "99"/' # wait 2 minutes and check XenCenter again (or just execute /usr/sbin/xe-update-guest-attrs )
On each Windows guest with Xen Tools installed, modify HKLM\SOFTWARE\Citrix\XenTools (Server 2003/WinXP), or HKLM\SOFTWARE\Wow6432Node\Citrix\XenTools (Server2008 64-bit)
Name | Data (hex) |
MajorVersion | 6 |
MicroVersion | 0 |
MinorVersion | 99 |
(Only decimal values shown here, Windows Registry will show both decimal and hex.)
Then reboot the Windows guest or restart the 'Citrix Tools for Virtual Machine Service' service. Restarting XenTools on Windows command line (cmd) works like this:
net stop xensvc net start xensvc
See also:
Management clients
Storage
Where are the VM disk images (VDIs)
Moving VM image files from outside
How to kill your SR:
You cannot move VM images between pools (since a single SR cannot be shared amongst more than one pool), and you cannot usually mount current (non-XCP) image repositories on a server inside a pool. One trick to play is to actually copy the contents if the images from the outside to a pre-prepared volumje inside an SR. For example:
- create a virtual disk with the size (at least) equal to the original disk image
- pick a XCP server and activsate the logical volume, e.g.:
lvchange -a y /dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625
- copy the data from remove into this activates LV, and wait for it to complete (you cannot use the LV in a distributed setup as long as it is active on a single XCP host):
dd if=/vm/mach/images/rooier.img bs=64M | ssh -A davidg@salado "ssh -A root@vms-piet-16.inst.ipmi.nikhef.nl 'dd of=/dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625 bs=64M'"
- de-activate the LV on the import host
lvchange -a n /dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625
- create a new VM (typo "Other" seems to be needed) and use the newly-populated disk image as the disk for the VM
- try it out ...
Connecting with FC
Resizing an FC based SR
On the Compellent, resize the volume which is mapped to the server group.
Then, on each of the XCP hosts,
- find the multipath map that is mapped to this volume, e.g. using xe sr-list name-label=your-SR-name-you-want-to-resize, and do a pvscan to actually find the PV in your dom0 (it is not there by default)
- from the device map name (e.g. "/dev/dm-1"), find the underlying SCSI devices representing each path:
multipath -l
- will give you the list of sd* devices. You can use cat /proc/partitions to see what the current (old) size appears to be.
- rescan all the underlying SCSI devices for size:
for i in /sys/block/dm-X/slaves/*/device/rescan ; do echo 1 > $i ; done
- check if the SCSI device size is now correct (cat /proc/partitions will do nicely)
- propagate the size changes to the multipath device map with the multipathd command
multipathd -k
- and find the maps & resize
list maps resize map nameofthemap
- check with cat /proc/partitions whether the mapped device is now the same size as the SCSI devices
- resize the physical volume for LVM
pvresize /dev/dm-1
- check the PV and VG:
pvdisplay -C vgdisplay -C
Finally, make sure XCP sees the new layout. Click on "rescan" in XenCenter, or us ethe CLIO
xe sr-scan uuid=<the uuid you noted in the previous step>
See also:
- http://forums.citrix.com/thread.jspa?threadID=243057
- http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/DM_Multipath/MPIO_admin-troubleshoot.html#online_device_resize
- http://comments.gmane.org/gmane.comp.emulators.xen.user/39615
Connecting with iSCSI
The iSCSI network is ve16, in range 172.21.0.0/16. The Compellent "Sint" is at 172.21.0.34, 35 en het Fault Domain 0 op .36
Connecting with NFS
About local disk
Troubleshooting
Dead VM server
Dead VM cluster master
Disable HA if it is enabled. We don't use HA (it's not part of XCP), but its harmless to try anyway. Login to any other node in the cluster and type
xe pool-ha-disable
and list all hosts in the pool to find the UUID of a slave host you want to become the new master
xe host-list
and make it happen with
xe pool-designate-new-master host-uuid=UUID
Now since the master is dead (that's why we started in the first place), we make it explicit to the old slave to start working:
xe pool-emergency-transition-to-master
and connect to the slaves again
xe pool-recover-slaves
(from http://blog.carlosgomez.net/2010/01/citrix-xen-server-changing-pool-master.html)
Incinerated VM servers
If the VM server is dead and will never come back, you can 'forget' the host from the CLI (on any server in the pool):
xe host-forget uuid=UUID
but if there are still running VMs assigned to it, it will refuse for forget the server. Remove the VMs first by forcing them off using
xe vm-reset-powerstate uuid=UUID --force
(from http://forums.citrix.com/thread.jspa?threadID=250603)
VMs on a dead server
You can declare a VM 'shut down' via the CLI on a dead VM server:
xe vm-reset-powerstate uuid=UUID --force