Revision as of 09:33, 8 March 2012

The Grid Server Park machines (general services used for Nikhef and BiG Grid) is going to be run with a centrally managed and controlled virtualisation environment. After the testing and evaluation period it is likely (but not yet certain) that the Open Source Xen Cloud Platform (XCP, version 1.5beta) is going to be chosen to run this infrastructure. The aim is to bring all systems under XCP control, managed in a set of clusters:

General information

Hardware

Cluster	qty	system type	VM server hostnames	current master
Piet	16 systems	M610: 12 cores, 24 SMT threads, 96 GiB, 2x600GB 10k SAS, dual 8G FC, dual 1GbE + dual 10GbE	vms-piet-*.inst.ipmi.nikhef.nl
Generic	8 systems	PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE	vms-gen-*.inst.ipmi.nikhef.nl	vms-gen-05
BL0	5 systems	M610: 8 cores, 32 GiB RAM, 2x300GB 10k SAS, dual 1GbE (+dual 8G FC)	vms-bl0-*.inst.ipmi.nikhef.nl
Security	2 systems	PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE	vms-sec-*.inst.ipmi.nikhef.nl

Networking

Installation network

The management, heartbeat, and live migration domain run over the installation network "ve11", and hostnames by convention have names like "<host>.inst.ipmi.nikhef.nl" (with DNS master from teugel). The installation network is usually server untagged alone, or untagged over a hybrid trunk port.

Network configuration

	IPv4	IPv6
Network	172.22.64.0/18	2001:0610:0120:E022::/64
Gateway	172.22.127.254	2001:0610:0120:E022::1/64
Broadcast	172.22.127.255

Server installation

Faking XenServer 6.0 for XenCenter Management

By default, the XCP hypervisor will present itself as a very-old-XenServer (v1) instance, and XenCenter will refure to do some of the more advanced features like dynamic memory, snapshots, and live migration. This can be fixed by manually 'editing' the XenServer version string in the xapi program, as describe don the Xen wiki. After each new installation and upgrade of the xapi binary:

cd /opt/xensource/bin
/etc/init.d/xapi stop
cp -vp xapi xapi.original
sed -i 's/1\.4\.90/6.0.99/g' xapi
/etc/init.d/xapi start

On previous versions (and maybe in the final version of XCP1.5), the following also worked and would be persistent:

echo "6.0.99" > /etc/xensource/xapi_version_override
/etc/init.d/xapi restart

But now XenCenter will complain about the XenTools being out of date. Fix this for Linux guests by installing XenTools and subsequently:

sed -i /usr/sbin/xe-update-guest-attrs -e 's/MajorVersion" "1"/MajorVersion" "6"/'
sed -i /usr/sbin/xe-update-guest-attrs -e 's/MinorVersion" "4"/MinorVersion" "0"/'
sed -i /usr/sbin/xe-update-guest-attrs -e 's/MicroVersion" "90"/MicroVersion" "99"/'
# wait 2 minutes and check XenCenter again (or just execute /usr/sbin/xe-update-guest-attrs )

On each Windows guest with Xen Tools installed, modify HKLM\SOFTWARE\Citrix\XenTools (Server 2003/WinXP), or HKLM\SOFTWARE\Wow6432Node\Citrix\XenTools (Server2008 64-bit)

Name	Data (hex)
MajorVersion	6
MicroVersion	0
MinorVersion	99

(Only decimal values shown here, Windows Registry will show both decimal and hex.)

Then reboot the Windows guest or restart the 'Citrix Tools for Virtual Machine Service' service. Restarting XenTools on Windows command line (cmd) works like this:

net stop xensvc
net start xensvc

Management clients

Storage

Where are the VM disk images (VDIs)

Connecting with FC

Resizing an FC based SR

on the Compellent, resize the volume whihc is mapped to the server group
on each of the XCP hosts, find the multipath map that is mapped to this volume, e.g. using xe sr-list name-label=your-SR-name-you-want-to-resize, and do a pvscan to actually find the PV in your dom0 (it is not there by default)
from the device map name (e.g. "/dev/dm-1"), find the underlying SCSI devices representing each path:

multipath -l

will give you the list of sd* devices. You can use cat /proc/partitions to see what the current (old) size appears to be.

rescan all the underlying SCSI devices for size:

for i in /sys/block/dm-X/slaved/*/device/rescan ; do echo 1 > $i ; done

check if the SCSI device size is now correct (cat /proc/partitions will do nicely)
propagate the size changes to the multipath device map with the multipathd command

multipathd -k

and find the maps & resize

list maps
resize map nameofthemap

check with cat /proc/partitions whether the mapped device is now the same size as the SCSI devices
resize the physical volume for LVM

pvresize /dev/dm-1

check the PV and VG:

pvdisplay -C
vgdisplay -C

Connecting with iSCSI

The iSCSI network is ve16, in range 172.21.0.0/16. The Compellent "Sint" is at 172.21.0.34, 35 en het Fault Domain 0 op .36

Connecting with NFS

About local disk

Troubleshooting

Dead VM server

Dead VM cluster master

Disable HA if it is enabled. We don't use HA (it's not part of XCP), but its harmless to try anyway. Login to any other node in the cluster and type

xe pool-ha-disable

and list all hosts in the pool to find the UUID of a slave host you want to become the new master

xe host-list

and make it happen with

xe pool-designate-new-master host-uuid=UUID

Now since the master is dead (that's why we started in the first place), we make it explicit to the old slave to start working:

xe pool-emergency-transition-to-master

and connect to the slaves again

xe pool-recover-slaves

(from http://blog.carlosgomez.net/2010/01/citrix-xen-server-changing-pool-master.html)

Incinerated VM servers

If the VM server is dead and will never come back, you can 'forget' the host from the CLI (on any server in the pool):

xe host-forget uuid=UUID

but if there are still running VMs assigned to it, it will refuse for forget the server. Remove the VMs first by forcing them off using

xe vm-reset-powerstate uuid=UUID --force

(from http://forums.citrix.com/thread.jspa?threadID=250603)

VMs on a dead server

You can declare a VM 'shut down' via the CLI on a dead VM server:

xe vm-reset-powerstate uuid=UUID --force