Difference between revisions of "GSP Virtualisation with Xen"
From PDP/Grid Wiki
Jump to navigationJump to search(28 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | The Grid Server Park machines (general services used for Nikhef and BiG Grid) is | + | The Grid Server Park machines (general services used for Nikhef and BiG Grid) is run with a centrally managed and controlled virtualisation environment. After the testing and evaluation period the Open Source ''Xen Cloud Platform'' (XCP, version 1.5beta, now: 1.6 final) was chosen to run this infrastructure. The aim is to bring all systems under XCP control, managed in a set of four clusters: GSP "Piet", Nikhef's own "NDPF BL0", the EUGridPMA and Security cluster "SEC", and the test/verification cluster using the older Generics 2008A systems, "GEN". |
= General information = | = General information = | ||
Line 5: | Line 5: | ||
* http://wiki.xen.org/xenwiki/XCP/XenServer_Feature_Matrix | * http://wiki.xen.org/xenwiki/XCP/XenServer_Feature_Matrix | ||
* http://xen.org/download/xcp/index_1.5.0.html | * http://xen.org/download/xcp/index_1.5.0.html | ||
+ | * http://downloads.xen.org/XCP/61809c/ (for XCP1.6) | ||
= Hardware = | = Hardware = | ||
Line 12: | Line 13: | ||
| Cluster || qty || system type || VM server hostnames || current master | | Cluster || qty || system type || VM server hostnames || current master | ||
|- | |- | ||
− | | Piet || 16 systems || M610: 12 cores, 24 SMT threads, 96 GiB, 2x600GB 10k SAS, dual 8G FC, dual 1GbE + dual 10GbE || vms-piet-*.inst.ipmi.nikhef.nl || | + | | Piet || 16 systems || M610: 12 cores, 24 SMT threads, 96 GiB, 2x600GB 10k SAS, dual 8G FC, dual 1GbE + dual 10GbE || vms-piet-*.inst.ipmi.nikhef.nl || vms-piet-16 |
|- | |- | ||
− | | Generic || | + | | Generic || 6 systems || PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE || vms-gen-*.inst.ipmi.nikhef.nl || vms-gen-05 |
|- | |- | ||
− | | BL0 || 5 systems || M610: 8 cores, 32 GiB RAM, 2x300GB 10k SAS, dual 1GbE (+dual 8G FC) || vms-bl0-*.inst.ipmi.nikhef.nl || | + | | BL0 || 5 systems || M610: 8 cores, 32 GiB RAM, 2x300GB 10k SAS, dual 1GbE (+dual 8G FC) || vms-bl0-*.inst.ipmi.nikhef.nl || vms-bl0s5 |
|- | |- | ||
− | | Security || 2 systems || PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE || vms-sec-*.inst.ipmi.nikhef.nl || | + | | Security || 2 systems || PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE || vms-sec-*.inst.ipmi.nikhef.nl || vms-sec-01 |
|} | |} | ||
+ | = Upgrade notes XCP1.6 = | ||
+ | |||
+ | The move to XCP 1.6 has eased a lot of things. For one, you can now live-migrate VDIs between the local storage SRs of the VM hosts, XenCenter works without the version hack. Also good: upgrading from 1.5 works OK, and can be done through PXE and the XCP XML configuration file. | ||
+ | |||
+ | A few hints: | ||
+ | * Get the latest XenCenter 6.1 (download the XenServer 6.1 install CD ISO and extract the [http://stal.nikhef.nl/mirror/XenServer/XenServer6.1.0/client_install/XenCenter.msi XenCenter.MSI] from it or the XE CLI) | ||
+ | * ALWAYS upgrade the master first, WITHOUT putting it in maintenance mode | ||
+ | * The upgrade will re-set any OS level tuning (login, ssh, iptables, pam_ldap). | ||
+ | ** You need to re-apply these as per below, e.g. with [http://stal.nikhef.nl/cfg/XCP16-config.sh XCP16-config.sh]. | ||
+ | ** The built-in repo with Xen stuff is now called "xcp" and no longer "citrix" (the script above takes that into account) | ||
+ | ** Not all tweaks for 1.5beta will be needed. Try the new Open vSwitch - it may work nicely and do much better with 802.1q vlans (performance tests still to be done) | ||
+ | * Unpack all of the XCP-1.6-61809c.iso file and put it up on a [http://stal.nikhef.nl/mirror/XenServer/XCP16/install/ web site] | ||
+ | * Create an upgrade XML file and put it on a web site as well, e.g. [http://stal.nikhef.nl/cfg/XCP16up.xml XCP16up.xml] | ||
+ | * Install PXE linux, and make sure the mboot.c32 and pxelinux.0 file belong together. Create a PXE config file as | ||
+ | |||
+ | default xenserver | ||
+ | label xenserver | ||
+ | kernel mboot.c32 | ||
+ | append XCP16/xen.gz dom0_max_vcpus=2 dom0_mem=752M com1=115200,8n1 console=com1,vga --- XCP16/vmlinuz xencons=hvc console=hvc0 console=tty0 answerfile=http://194.171.97.240/cfg/XCP16up.xml install --- XCP16/install.img | ||
+ | |||
+ | * on Dell IDRAC systems you can force PXE boot on next boot, by doring the following TWICE (that's an iDRAC bug): | ||
+ | ipmitool -H <hostname>.ipmi.nikhef.nl -U root chassis bootdev pxe | ||
+ | ipmitool -H <hostname>.ipmi.nikhef.nl -U root chassis bootdev pxe | ||
+ | |||
+ | * boot to install/upgrade | ||
+ | ln -sf XCP16-install.cfg `hexaddrbyname vms-sec-01.inst.ipmi.nikhef.nl` | ||
+ | * reset to disk via obvious | ||
+ | ipmitool -H <hostname>.ipmi.nikhef.nl -U root chassis bootparam set bootflag disk | ||
+ | * also reset the PXE | ||
+ | ln -sf localboot.cfg `hexaddrbyname vms-sec-01.inst.ipmi.nikhef.nl` | ||
+ | |||
+ | |||
+ | The following is no longer needed: | ||
+ | * NO need to change the version number of xapi | ||
+ | * NO need to apply the security fix (obviously) | ||
+ | |||
+ | == On the xapi version hack == | ||
+ | |||
+ | If you have applied the xapi version hack to 1.5beta (setting the xapi version to 6.0.99), you may run into limited trouble after the upgrade as the new master claims to be older than the slaves (and thus cannot control it or migrate VMs to it). Not being able to migrate VMs is a bummer, since then you cannot do a hitless rolling upgrade. So we need to fix it: download the [http://stal.nikhef.nl/cfg/XCP16up.sh post-install script from stal] (it writes out the two utility scripts) or do the manual process. Anyway, the solution is: | ||
+ | |||
+ | * COPY the original (XCP16) xapi file from <tt>/opt/xensource/bin/xapi</tt> to a safe place | ||
+ | * edit the version number to be 6.0.99 (like it was a hacked XCP15beta: | ||
+ | /etc/init.d/xapi stop | ||
+ | sed -i 's/1\.6\.10/6.0.99/g' xapi | ||
+ | /etc/init.d/xapi start | ||
+ | echo "Switched FROM original XCP16 1.6.10 TO XCP15-hack version 6.0.99" | ||
+ | * now you can migrate to and from the slaves again. Upgrade the slaves, and then switch back by copying the saved xapi binary and restarting the service. | ||
+ | |||
+ | You can restart the service while VMs are running. You may also be able to switch back after upgrading just the master and migrate VMs from the target slaves to this master. | ||
+ | |||
+ | == Useful pages == | ||
+ | |||
+ | * http://lists.xen.org/archives/html/xen-api/2012-10/msg00120.html on openvswitch | ||
+ | * http://lists.xen.org/archives/html/xen-api/2013-01/msg00051.html on performance issues over NFS with 1.6 (dismal NFS writes due to sync mounting) | ||
+ | * http://docs.vmd.citrix.com/XenServer/6.0.0/1.0/en_gb/installation.html#pxe_boot_install PXE boot documentation for XenServer 6 | ||
+ | * http://www.citrix.com/go/products/xenserver/download.html XenServer download page from Citrix (for the XenCenter client) | ||
+ | * http://lists.xen.org/archives/html/xen-api/2013-01/threads.html#00058 XCP1.5beta to 1.6 rolling upgrade hints | ||
= Networking = | = Networking = | ||
Line 63: | Line 121: | ||
Then connect to the server in XenCenter, and in the Properties enable multipathing (for server that can do DM). If necessary, attach the iSCSI network to managent over vlan 16 (using the "iSCSI (over management)" network defined for the pool as a secondary management network). | Then connect to the server in XenCenter, and in the Properties enable multipathing (for server that can do DM). If necessary, attach the iSCSI network to managent over vlan 16 (using the "iSCSI (over management)" network defined for the pool as a secondary management network). | ||
− | == Applying critical patches to XCP == | + | == Applying critical patches to XCP (1.5beta only) == |
− | Some key patches neeed to be applied to new server installations, in particular for CVS-2012-0217. For these no updated XCP packages are available (yet), but fortunately the patches publicly published for XenServer 6.0 ('''not''' 6.0.2) fit perfectly into a XCP1.5 (1.4.90) configuration, updating xapi-core-0.2 | + | Some key patches neeed to be applied to new server installations, in particular for CVS-2012-0217. For these no updated XCP packages are available (yet), but fortunately the patches publicly published for XenServer 6.0 ('''not''' 6.0.2) fit perfectly into a XCP1.5 (1.4.90) configuration, updating xapi-core to release 0.2-3299 (from xapi-core-0.2-3293.i686.rpm). The actual xapi-core RPM can easily be extracted from the xsupdate file at [http://support.citrix.com/article/CTX133165 http://support.citrix.com/article/CTX133165]. To apply this patch to a host, the script "apply.sh" to get to XS60E013 is available inside Nikhef at http://stal.nikhef.nl/mirror/XenServer/XCP15/patch-XS60E013/. download to the newly installed node and execute apply.sh: |
wget -q http://stal.nikhef.nl/mirror/XenServer/XCP15/patch-XS60E013/apply.sh | wget -q http://stal.nikhef.nl/mirror/XenServer/XCP15/patch-XS60E013/apply.sh | ||
sh apply.sh | sh apply.sh | ||
Line 79: | Line 137: | ||
In order to ensure consistency for live migraton, please make sure that all hosts in the pool have ''the same network configuration', and that all data networks are offered to all hosts! | In order to ensure consistency for live migraton, please make sure that all hosts in the pool have ''the same network configuration', and that all data networks are offered to all hosts! | ||
− | == Faking XenServer 6.0 for XenCenter Management == | + | == Faking XenServer 6.0 for XenCenter Management (1.5beta only) == |
By default, the XCP hypervisor will present itself as a very-old-XenServer (v1) instance, and XenCenter will refure to do some of the more advanced features like dynamic memory, snapshots, and live migration. This can be fixed by manually 'editing' the XenServer version string in the xapi program, as describe don the Xen wiki. After '''each new installation and upgrade''' of the xapi binary: | By default, the XCP hypervisor will present itself as a very-old-XenServer (v1) instance, and XenCenter will refure to do some of the more advanced features like dynamic memory, snapshots, and live migration. This can be fixed by manually 'editing' the XenServer version string in the xapi program, as describe don the Xen wiki. After '''each new installation and upgrade''' of the xapi binary: | ||
Line 148: | Line 206: | ||
sudo aii-shellfe --configure ''host.name'' | sudo aii-shellfe --configure ''host.name'' | ||
# provision the VM (and start it if so desired) | # provision the VM (and start it if so desired) | ||
− | aii-provision-vm -n ''host.name'' [--start] [--autoboot] [--sr=''name-of-SR''] [--vmtemplate=''name-of-template''] | + | aii-provision-vm -n ''host.name'' [--start] [--autoboot] [--sr=''name-of-SR''] [--vmtemplate=''name-of-template-pcre-match''] |
: where the default template name is <tt>CentOS 5.*64-bit</tt>, and the default SR is taken from the pool default config. | : where the default template name is <tt>CentOS 5.*64-bit</tt>, and the default SR is taken from the pool default config. | ||
# start the VM on your preferred machine - and assign it a preferred VM server if you want autoboot to work. | # start the VM on your preferred machine - and assign it a preferred VM server if you want autoboot to work. | ||
Line 159: | Line 217: | ||
; -cdbxml=s ($cdbxmldir) : directory where the XMLCDB profiles are stored, defaults to <tt>/project/quattor/conf/build/xml</tt> | ; -cdbxml=s ($cdbxmldir) : directory where the XMLCDB profiles are stored, defaults to <tt>/project/quattor/conf/build/xml</tt> | ||
; -n|name=s ($VMname) : name of the VM, defaults to <tt>example.nikhef.nl</tt> so ''please change this'' | ; -n|name=s ($VMname) : name of the VM, defaults to <tt>example.nikhef.nl</tt> so ''please change this'' | ||
− | ; --vmtemplate=s ($vmtplname) : name of the template from which VMs are derived. | + | ; --vmtemplate=s ($vmtplname) : Perl regular expression to match the name of the template from which VMs are derived, e.g. <code>CentOS 6.*64-bit</code>. |
− | ; --sr=s ($srname) : name of the SR on which to place the disk images and | + | ; --sr=s ($srname) : name of the SR on which to place the disk images and snapshot. Must be a shared SR. |
; -a|autoboot ($autoboot) : make the VM start on VM host boot | ; -a|autoboot ($autoboot) : make the VM start on VM host boot | ||
; --start : start the VM after provisioning on any host in the pool | ; --start : start the VM after provisioning on any host in the pool | ||
; --user=s ($xapi_user) : name of the management user, defaults to <tt>root</tt> | ; --user=s ($xapi_user) : name of the management user, defaults to <tt>root</tt> | ||
+ | |||
+ | == Re-installation of an existing VM == | ||
+ | |||
+ | To re-install an existing VM, set the bootloader to "eliloader": | ||
+ | xe vm-param-set uuid=7dff5ebc-f348-718e-72d7-850670897469 PV-bootloader=pygrub | ||
+ | |||
+ | The re-installation uses an ISO from the OS distribution. Use the following command to find out which location is used: | ||
+ | xe vm-param-list uuid=c2f112e8-4a1f-0072-8d5a-99e91d132bac | grep other-config | ||
+ | other-config (MRW): last_shutdown_time: 20121107T16:07:02Z; last_shutdown_action: Restart; last_shutdown_initiator: internal; last_shutdown_reason: rebooted; install-repository: http://stal.nikhef.nl/centos/5.7/os/x86_64/; install-methods: cdrom,nfs,http,ftp; auto_poweron: true | ||
+ | |||
+ | To change the location from which the ISO is downloaded, execute: | ||
+ | xe vm-param-set uuid=c2f112e8-4a1f-0072-8d5a-99e91d132bac other-config:install-repository=http://stal.nikhef.nl/centos/5.8/os/x86_64/ | ||
== XAPI XML-RPC management == | == XAPI XML-RPC management == | ||
Line 304: | Line 374: | ||
= Troubleshooting = | = Troubleshooting = | ||
+ | |||
+ | == Moving from HVM to PV == | ||
+ | |||
+ | See [http://support.citrix.com/article/CTX121875 http://support.citrix.com/article/CTX121875]. | ||
+ | |||
+ | The page suggests to manually create an initrd file with the Xen drivers (and minus the SCSI drivers). This needs some explaining. | ||
+ | |||
+ | The normal process when installing a kernel is that the initrd is populated with whatever is relevant for the ''current'' context of the system. So a system that is not running paravirtualized will normally not install an initrd with Xen drivers. Also, the Grub menu settings are weird, containing a 'kernel' that points to a xen.gz file and 'modules' for the real kernel and initrd. These must be edited to look like ordinary entries. The default entry number must be set to 0, to use the Xen kernel at next boot. | ||
+ | |||
+ | Manually creating an initrd is a maintenance nightmare, but once the system is running paravirtualized, subsequent kernel upgrades will generate the correct initrd automatically so this is a one-time action only. | ||
== Wrong VM type - or resetting the boot loader == | == Wrong VM type - or resetting the boot loader == | ||
Line 347: | Line 427: | ||
See | See | ||
* http://blog.403labs.com/post/1546501840/paravirtulization-with-citrix-xenserver-5-5-and-ubuntu | * http://blog.403labs.com/post/1546501840/paravirtulization-with-citrix-xenserver-5-5-and-ubuntu | ||
+ | |||
+ | === Booting from a DVD ISO (e.g. rescue mode) === | ||
+ | |||
+ | In XenCenter it is sometimes not possible to choose any other boot device than the Hard Disk. To change this, edit the following parameters on the command line. | ||
+ | xe vm-param-set uuid=''<uuid of VM>'' HVM-boot-policy="BIOS order" | ||
+ | xe vm-param-set uuid=''<uuid of VM>'' HVM-boot-params:"order=dc" | ||
+ | |||
+ | Now the boot parameters can be edited again in XenCenter. | ||
+ | |||
+ | == SR corruption == | ||
+ | |||
+ | If you suffer from errors like "<tt>Error: Deleting disk '' from SR 'SRNAME' - Error in Metadata volume operation for SR.</tt>", and in the log on the pool master shows things like | ||
+ | [23889] 2012-07-16 12:06:49.549273 Exception getting metadata length.Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error | ||
+ | [23889] 2012-07-16 12:06:49.549392 Exception getting metadata with params{'vdi_uuid': '81fac4d4-d273-4b3e-81b9-7960f615b20e'}. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error | ||
+ | [23889] 2012-07-16 12:06:49.549517 Error deleting vdi 81fac4d4-d273-4b3e-81b9-7960f615b20e from the metadata. Error: VDI delete operation failed for parameters: /dev/VG_XenStorage-790dc413-883f-f49e-a0e8-8410778145aa/MGT, 81fac4d4-d273-4b3e-81b9-7960f615b20e. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error | ||
+ | [23889] 2012-07-16 12:06:49.587649 Raising exception [181, Error in Metadata volume operation for SR. [opterr=VDI delete operation failed for parameters: /dev/VG_XenStorage-790dc413-883f-f49e-a0e8-8410778145aa/MGT, 81fac4d4-d273-4b3e-81b9-7960f615b20e. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error]] | ||
+ | |||
+ | the MGT management volume on the SR may have become sorruption. Why this would happen is unclear, but once it happens this error will prevent you from removing nameless and label-less volumes, and appear to eat disk space form the repo (they don't, but the management volume says so). | ||
+ | |||
+ | What worked is the following: | ||
+ | * try removign the offending volume. The log on the pool master will show the abse name of the offending VG after the lvchange -ay command line log. Use the basename only, like "/dev/VG_XenStorage-''UUID-OF-SR''/" | ||
+ | * with ls, look if the MGT volume is available (it should be): | ||
+ | ls -l /dev/VG_XenStorage-''UUID-OF-SR''/MGT | ||
+ | * rescan the SR first (triggerign the error) | ||
+ | xe sr-scan uuid=''UUID-OF-SR'' | ||
+ | * move the MGT volume out of the way ('forgetting' the content for a while). This ''can'' be done with the VDI's on it being attached to running VMs | ||
+ | lvrename /dev/VG_XenStorage-''UUID-OF-SR''/MGT /dev/VG_XenStorage-''UUID-OF-SR''/oldMGT | ||
+ | * re-scan the SR, e.g. using the GUI or CLI | ||
+ | xe sr-scan uuid=''UUID-OF-SR'' | ||
+ | * rescan it again, for as long as the used space is not equal to the sum of the VDIs on it | ||
+ | * see if the LVs are actaully gone (so this on the pool master): | ||
+ | lvdisplay -C | grep ''UUID-OF-SR'' | ||
+ | * remove stale VDIs by uuid (if they are still there) or via the GUI | ||
+ | * rescan it again, for as long as the used space is not equal to the sum of the VDIs on it | ||
+ | |||
+ | See also: | ||
+ | * http://forums.citrix.com/thread.jspa?threadID=295713&start=15&tstart=0 | ||
+ | * http://forums.citrix.com/thread.jspa?threadID=306400 | ||
+ | * http://support.citrix.com/article/CTX131660 | ||
== Multipathing not complete == | == Multipathing not complete == | ||
Line 515: | Line 634: | ||
See also: | See also: | ||
* http://www.gossamer-threads.com/lists/xen/api/245286?do=post_view_threaded | * http://www.gossamer-threads.com/lists/xen/api/245286?do=post_view_threaded | ||
+ | |||
+ | == Structure of the xsupdate file of XenServer == | ||
+ | |||
+ | The "xsupdate" file with patches and updates that you download from xen.org for XenServer is a zip file, which contains a signed set of RPMs in what is essentially a shar archive. Use this recipe to extract the RPM from the xsupdate file, cq. convert an xsupdate file to RPM: | ||
+ | |||
+ | * get or download the ZIP file with the patch. | ||
+ | * extract the ZIP file, and locate the ".xsupdate" file. | ||
+ | * use PGP or GPG to extract the binary contents in shar format from it: | ||
+ | gpg --output shar.sh --verify XS60E013.xsupdate | ||
+ | * extract the binary contents form the created shar-ish archive, thereby creating in /tmp/tmp.XXXXXX the binary contents of the package | ||
+ | sh shar.sh unpack | ||
+ | * locate the binary RPMs to be updated in /tmp/tmp.XXXXXX, these should all be installed via rpm -Uvh (use rpm -Uvh --test first) | ||
+ | * look at install.sh for any additional magic to be done to the nodes. Using install.sh itself will likely not work since it checks the installed software version, which for us is XCP and not XenServer) | ||
+ | * in var/patches/applied in the unpacked directory is a set of uuid-named file. These should be copied onto the nodes in /var/patches/applied | ||
+ | * restart xapi on the nodes afterwards | ||
+ | * the /var/patches/applied files will result in the "updates" field in XenCenter being updated. This window will also show the reboot-required status if the patches/applied meta-data so indicates | ||
+ | * reboot nodes if needed (rotating and moving VMs as needed) | ||
+ | |||
+ | |||
+ | == Connecting to the graphical console of a VM via VNC == | ||
+ | |||
+ | When a VM becomes unreachable, but is still running, as a last resort one could try to connect to the graphical console that every machine still has. It's nicely tucked away in the innards of XenServer, but the following little script should help. It's available on pool-piet.inst.ipmi.nikhef.nl as connect-to-vm-console.sh. | ||
+ | |||
+ | #!/bin/sh | ||
+ | |||
+ | # locate the host with the graphical console for the given vm | ||
+ | # find the vnc port and connect the vnc viewer | ||
+ | |||
+ | vm="$1" | ||
+ | |||
+ | vmuuid=`xe vm-list name-label="$1" --minimal` | ||
+ | |||
+ | if [ -z $vmuuid ]; then | ||
+ | echo "Could not find VM named $vm" >&2 | ||
+ | exit 1 | ||
+ | fi | ||
+ | |||
+ | residenton=`xe vm-param-get uuid=$vmuuid param-name=resident-on` | ||
+ | |||
+ | residenthost=`xe host-list uuid=$residenton params=name-label --minimal` | ||
+ | |||
+ | vncport=`ssh $residenthost xn console-list $vm | awk '$1 == "RFB" { print $2 }'` | ||
+ | |||
+ | if [ -z $vncport ]; then | ||
+ | echo "Could not find VNC console for $vm" | ||
+ | exit 1 | ||
+ | fi | ||
+ | |||
+ | echo "Issue the following command from your workstation: | ||
+ | |||
+ | ssh -L 5901:localhost:$vncport -f -N root@$residenthost | ||
+ | vncviewer localhost:5901 | ||
+ | " |