Revision as of 13:56, 8 March 2012

The Grid Server Park machines (general services used for Nikhef and BiG Grid) is going to be run with a centrally managed and controlled virtualisation environment. After the testing and evaluation period it is likely (but not yet certain) that the Open Source Xen Cloud Platform (XCP, version 1.5beta) is going to be chosen to run this infrastructure. The aim is to bring all systems under XCP control, managed in a set of clusters:

General information

Hardware

Cluster	qty	system type	VM server hostnames	current master
Piet	16 systems	M610: 12 cores, 24 SMT threads, 96 GiB, 2x600GB 10k SAS, dual 8G FC, dual 1GbE + dual 10GbE	vms-piet-*.inst.ipmi.nikhef.nl
Generic	8 systems	PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE	vms-gen-*.inst.ipmi.nikhef.nl	vms-gen-05
BL0	5 systems	M610: 8 cores, 32 GiB RAM, 2x300GB 10k SAS, dual 1GbE (+dual 8G FC)	vms-bl0-*.inst.ipmi.nikhef.nl
Security	2 systems	PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE	vms-sec-*.inst.ipmi.nikhef.nl

Networking

Installation network

The management, heartbeat, and live migration domain run over the installation network "ve11", and hostnames by convention have names like "<host>.inst.ipmi.nikhef.nl" (with DNS master from teugel). The installation network is usually server untagged alone, or untagged over a hybrid trunk port.

Network configuration

	IPv4	IPv6
Network	172.22.64.0/18	2001:0610:0120:E022::/64
Gateway	172.22.127.254	2001:0610:0120:E022::1/64
Broadcast	172.22.127.255

Server installation

Faking XenServer 6.0 for XenCenter Management

By default, the XCP hypervisor will present itself as a very-old-XenServer (v1) instance, and XenCenter will refure to do some of the more advanced features like dynamic memory, snapshots, and live migration. This can be fixed by manually 'editing' the XenServer version string in the xapi program, as describe don the Xen wiki. After each new installation and upgrade of the xapi binary:

cd /opt/xensource/bin
/etc/init.d/xapi stop
cp -vp xapi xapi.original
sed -i 's/1\.4\.90/6.0.99/g' xapi
/etc/init.d/xapi start

On previous versions (and maybe in the final version of XCP1.5), the following also worked and would be persistent:

echo "6.0.99" > /etc/xensource/xapi_version_override
/etc/init.d/xapi restart

But now XenCenter will complain about the XenTools being out of date. Fix this for Linux guests by installing XenTools and subsequently:

sed -i /usr/sbin/xe-update-guest-attrs -e 's/MajorVersion" "1"/MajorVersion" "6"/'
sed -i /usr/sbin/xe-update-guest-attrs -e 's/MinorVersion" "4"/MinorVersion" "0"/'
sed -i /usr/sbin/xe-update-guest-attrs -e 's/MicroVersion" "90"/MicroVersion" "99"/'
# wait 2 minutes and check XenCenter again (or just execute /usr/sbin/xe-update-guest-attrs )

On each Windows guest with Xen Tools installed, modify HKLM\SOFTWARE\Citrix\XenTools (Server 2003/WinXP), or HKLM\SOFTWARE\Wow6432Node\Citrix\XenTools (Server2008 64-bit)

Name	Data (hex)
MajorVersion	6
MicroVersion	0
MinorVersion	99

(Only decimal values shown here, Windows Registry will show both decimal and hex.)

Then reboot the Windows guest or restart the 'Citrix Tools for Virtual Machine Service' service. Restarting XenTools on Windows command line (cmd) works like this:

net stop xensvc
net start xensvc

Management clients

Storage

Where are the VM disk images (VDIs)

Moving VM image files from outside

How to kill your SR

You cannot move VM images between pools (since a single SR cannot be shared amongst more than one pool), and you cannot usually mount current (non-XCP) image repositories on a server inside a pool. One trick to play is to actually copy the contents if the images from the outside to a pre-prepared volumje inside an SR.

Raw image and LVM

http://forums.citrix.com/thread.jspa?threadID=283458

xe vdi-create sm-config:type=raw sr-uuid={SR_UUID} name-label="My Raw LVM VDI" virtual-size={size}GiB type=user

How NOT to do it

The default image format in an SR is based on "VHD" disk images, that are distributable across VMs and contain some (compressable) meta-data about sparseness. Overwriting an LVM which was supposed to contain an VHD image with a raw disk image will make the SR corrupt. So if you do:

create a virtual disk with the default (XenCenter) tools in a SR
then active the LV on an XCP server, e.g. with lvchange -a y /dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625
copy the data from remove into this activates LV, and wait for it to complete (you cannot use the LV in a distributed setup as long as it is active on a single XCP host): dd if=/vm/mach/images/rooier.img bs=64M | ssh -A davidg@salado "ssh -A root@vms-piet-16.inst.ipmi.nikhef.nl 'dd of=/dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625 bs=64M'"
de-activate the LV on the import host lvchange -a n /dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625
create a new VM (typo "Other" seems to be needed) and use the newly-populated disk image as the disk for the VM
try it out ...

it will break the SR and you loose all images on it (since the MGT data is support. Recover using http://support.citrix.com/article/CTX122001:

open a console on the master XCP server
Back up LVM metadata

vgcfgbackup

Run the following command to see the LV which is causing trouble and that is causing the SR not to be scanned.

lvscan

Remove the clone logical volume. Note: Make sure the correct Logical Volume is deleted.

lvremove /dev/VG_ XenStorage-8d418f1a-107e-472f-5453-b24c88e7428e/VDI_8e4b4263-f9af-45f3-b97e-afa5481ea2a1

.Run the following command to scan the SR, or use XenCenter

xe sr-scan uuid=<UUID of SR for the VM>.

You may need to forget about the SR first and then re-attach it (but DO NOT FORMAT the SR on attaching ;-)

Connecting with FC

Resizing an FC based SR

On the Compellent, resize the volume which is mapped to the server group.

Then, on each of the XCP hosts,

find the multipath map that is mapped to this volume, e.g. using xe sr-list name-label=your-SR-name-you-want-to-resize, and do a pvscan to actually find the PV in your dom0 (it is not there by default)
from the device map name (e.g. "/dev/dm-1"), find the underlying SCSI devices representing each path:

multipath -l

will give you the list of sd* devices. You can use cat /proc/partitions to see what the current (old) size appears to be.

rescan all the underlying SCSI devices for size:

for i in /sys/block/dm-X/slaves/*/device/rescan ; do echo 1 > $i ; done

check if the SCSI device size is now correct (cat /proc/partitions will do nicely)
propagate the size changes to the multipath device map with the multipathd command

multipathd -k

and find the maps & resize

list maps
resize map nameofthemap

check with cat /proc/partitions whether the mapped device is now the same size as the SCSI devices
resize the physical volume for LVM

pvresize /dev/dm-1

check the PV and VG:

pvdisplay -C
vgdisplay -C

Finally, make sure XCP sees the new layout. Click on "rescan" in XenCenter, or us ethe CLIO

xe sr-scan uuid=<the uuid you noted in the previous step>

Connecting with iSCSI

The iSCSI network is ve16, in range 172.21.0.0/16. The Compellent "Sint" is at 172.21.0.34, 35 en het Fault Domain 0 op .36

Connecting with NFS

About local disk

Troubleshooting

Dead VM server

Dead VM cluster master

Disable HA if it is enabled. We don't use HA (it's not part of XCP), but its harmless to try anyway. Login to any other node in the cluster and type

xe pool-ha-disable

and list all hosts in the pool to find the UUID of a slave host you want to become the new master

xe host-list

and make it happen with

xe pool-designate-new-master host-uuid=UUID

Now since the master is dead (that's why we started in the first place), we make it explicit to the old slave to start working:

xe pool-emergency-transition-to-master

and connect to the slaves again

xe pool-recover-slaves

(from http://blog.carlosgomez.net/2010/01/citrix-xen-server-changing-pool-master.html)

Incinerated VM servers

If the VM server is dead and will never come back, you can 'forget' the host from the CLI (on any server in the pool):

xe host-forget uuid=UUID

but if there are still running VMs assigned to it, it will refuse for forget the server. Remove the VMs first by forcing them off using

xe vm-reset-powerstate uuid=UUID --force

(from http://forums.citrix.com/thread.jspa?threadID=250603)

VMs on a dead server

You can declare a VM 'shut down' via the CLI on a dead VM server:

xe vm-reset-powerstate uuid=UUID --force