Difference between revisions of "GSP Virtualisation with Xen"

From PDP/Grid Wiki
Jump to navigationJump to search
 
(99 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The Grid Server Park machines (general services used for Nikhef and BiG Grid) is going to be run with a centrally managed and controlled virtualisation environment. After the testing and evaluation period it is likely (but not yet certain) that the Open Source ''Xen Cloud Platform'' (XCP, version 1.5beta) is going to be chosen to run this infrastructure. The aim is to bring all systems under XCP control, managed in a set of clusters:
+
The Grid Server Park machines (general services used for Nikhef and BiG Grid) is run with a centrally managed and controlled virtualisation environment. After the testing and evaluation period the Open Source ''Xen Cloud Platform'' (XCP, version 1.5beta, now: 1.6 final) was chosen to run this infrastructure. The aim is to bring all systems under XCP control, managed in a set of four clusters: GSP "Piet", Nikhef's own "NDPF BL0", the EUGridPMA and Security cluster "SEC", and the test/verification cluster using the older Generics 2008A systems, "GEN".
  
 
= General information =
 
= General information =
Line 5: Line 5:
 
* http://wiki.xen.org/xenwiki/XCP/XenServer_Feature_Matrix
 
* http://wiki.xen.org/xenwiki/XCP/XenServer_Feature_Matrix
 
* http://xen.org/download/xcp/index_1.5.0.html
 
* http://xen.org/download/xcp/index_1.5.0.html
 +
* http://downloads.xen.org/XCP/61809c/ (for XCP1.6)
  
 
= Hardware =
 
= Hardware =
Line 12: Line 13:
 
| Cluster || qty || system type || VM server hostnames || current master
 
| Cluster || qty || system type || VM server hostnames || current master
 
|-
 
|-
| Piet || 16 systems || M610: 12 cores, 24 SMT threads, 96 GiB, 2x600GB 10k SAS, dual 8G FC, dual 1GbE + dual 10GbE || vms-piet-*.inst.ipmi.nikhef.nl ||
+
| Piet || 16 systems || M610: 12 cores, 24 SMT threads, 96 GiB, 2x600GB 10k SAS, dual 8G FC, dual 1GbE + dual 10GbE || vms-piet-*.inst.ipmi.nikhef.nl || vms-piet-16
 
|-
 
|-
| Generic || 8 systems || PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE || vms-gen-*.inst.ipmi.nikhef.nl || vms-gen-05
+
| Generic || 6 systems || PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE || vms-gen-*.inst.ipmi.nikhef.nl || vms-gen-05
 
|-
 
|-
| BL0 || 5 systems || M610: 8 cores, 32 GiB RAM, 2x300GB 10k SAS, dual 1GbE (+dual 8G FC) || vms-bl0-*.inst.ipmi.nikhef.nl ||  
+
| BL0 || 5 systems || M610: 8 cores, 32 GiB RAM, 2x300GB 10k SAS, dual 1GbE (+dual 8G FC) || vms-bl0-*.inst.ipmi.nikhef.nl || vms-bl0s5
 
|-
 
|-
| Security || 2 systems || PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE || vms-sec-*.inst.ipmi.nikhef.nl ||
+
| Security || 2 systems || PE2950: 8 cores, 24 GiB, 4x500GB 7k2 SATA, dual 1GbE || vms-sec-*.inst.ipmi.nikhef.nl || vms-sec-01
 
|}
 
|}
  
 +
= Upgrade notes XCP1.6 =
 +
 +
The move to XCP 1.6 has eased a lot of things. For one, you can now live-migrate VDIs between the local storage SRs of the VM hosts, XenCenter works without the version hack. Also good: upgrading from 1.5 works OK, and can be done through PXE and the XCP XML configuration file.
 +
 +
A few hints:
 +
* Get the latest XenCenter 6.1 (download the XenServer 6.1 install CD ISO and extract the [http://stal.nikhef.nl/mirror/XenServer/XenServer6.1.0/client_install/XenCenter.msi XenCenter.MSI] from it or the XE CLI)
 +
* ALWAYS upgrade the master first, WITHOUT putting it in maintenance mode
 +
* The upgrade will re-set any OS level tuning (login, ssh, iptables, pam_ldap).
 +
** You need to re-apply these as per below, e.g. with [http://stal.nikhef.nl/cfg/XCP16-config.sh XCP16-config.sh].
 +
** The built-in repo with Xen stuff is now called "xcp" and no longer "citrix" (the script above takes that into account)
 +
** Not all tweaks for 1.5beta will be needed. Try the new Open vSwitch - it may work nicely and do much better with 802.1q vlans (performance tests still to be done)
 +
* Unpack all of the XCP-1.6-61809c.iso file and put it up on a [http://stal.nikhef.nl/mirror/XenServer/XCP16/install/ web site]
 +
* Create an upgrade XML file and put it on a web site as well, e.g. [http://stal.nikhef.nl/cfg/XCP16up.xml XCP16up.xml]
 +
* Install PXE linux, and make sure the mboot.c32 and pxelinux.0 file belong together. Create a PXE config file as
 +
 +
default xenserver
 +
label xenserver
 +
        kernel mboot.c32
 +
        append XCP16/xen.gz dom0_max_vcpus=2 dom0_mem=752M com1=115200,8n1 console=com1,vga --- XCP16/vmlinuz xencons=hvc console=hvc0 console=tty0  answerfile=http://194.171.97.240/cfg/XCP16up.xml install --- XCP16/install.img
 +
 +
* on Dell IDRAC systems you can force PXE boot on next boot, by doring the following TWICE (that's an iDRAC bug):
 +
ipmitool -H <hostname>.ipmi.nikhef.nl -U root chassis bootdev pxe
 +
ipmitool -H <hostname>.ipmi.nikhef.nl -U root chassis bootdev pxe
 +
 +
* boot to install/upgrade
 +
ln -sf XCP16-install.cfg `hexaddrbyname vms-sec-01.inst.ipmi.nikhef.nl`
 +
* reset to disk via obvious
 +
ipmitool -H <hostname>.ipmi.nikhef.nl -U root chassis bootparam set bootflag disk
 +
* also reset the PXE
 +
ln -sf localboot.cfg `hexaddrbyname vms-sec-01.inst.ipmi.nikhef.nl`
 +
 +
 +
The following is no longer needed:
 +
* NO need to change the version number of xapi
 +
* NO need to apply the security fix (obviously)
 +
 +
== On the xapi version hack ==
 +
 +
If you have applied the xapi version hack to 1.5beta (setting the xapi version to 6.0.99), you may run into limited trouble after the upgrade as the new master claims to be older than the slaves (and thus cannot control it or migrate VMs to it). Not being able to migrate VMs is a bummer, since then you cannot do a hitless rolling upgrade. So we need to fix it: download the [http://stal.nikhef.nl/cfg/XCP16up.sh post-install script from stal] (it writes out the two utility scripts) or do the manual process. Anyway, the solution is:
 +
 +
* COPY the original  (XCP16) xapi file from <tt>/opt/xensource/bin/xapi</tt> to a safe place
 +
* edit the version number to be 6.0.99 (like it was a hacked XCP15beta:
 +
/etc/init.d/xapi stop
 +
sed -i 's/1\.6\.10/6.0.99/g' xapi
 +
/etc/init.d/xapi start
 +
echo "Switched FROM original XCP16 1.6.10 TO XCP15-hack version 6.0.99"
 +
* now you can migrate to and from the slaves again. Upgrade the slaves, and then switch back by copying the saved xapi binary and restarting the service.
 +
 +
You can restart the service while VMs are running. You may also be able to switch back after upgrading just the master and migrate VMs from the target slaves to this master.
 +
 +
== Useful pages ==
 +
 +
* http://lists.xen.org/archives/html/xen-api/2012-10/msg00120.html on openvswitch
 +
* http://lists.xen.org/archives/html/xen-api/2013-01/msg00051.html on performance issues over NFS with 1.6 (dismal NFS writes due to sync mounting)
 +
* http://docs.vmd.citrix.com/XenServer/6.0.0/1.0/en_gb/installation.html#pxe_boot_install PXE boot documentation for XenServer 6
 +
* http://www.citrix.com/go/products/xenserver/download.html XenServer download page from Citrix (for the XenCenter client)
 +
* http://lists.xen.org/archives/html/xen-api/2013-01/threads.html#00058 XCP1.5beta to 1.6 rolling upgrade hints
  
 
= Networking =
 
= Networking =
Line 43: Line 101:
 
= Server installation =
 
= Server installation =
  
== Faking XenServer 6.0 for XenCenter Management ==
+
The key settings when installing a server are:
 +
* keyboard: US
 +
* no device drivers
 +
* installl on sda (make sure it's a RAID-1), '''and ''enable'' thin provisioning on the local SR'''
 +
* no supplemental packs
 +
* skip media verification
 +
* set the Dom0 password for the pool (they are different per pool)
 +
* pick the management interface (untagged ve11, on the GSP Piet this is eth2)
 +
* static IP configuration, see teugel for hostname. IP: 172.22.64.XX, Subnet: 255.255.192.0, Gateway: 172.22.127.254
 +
* hostname: put the right one "vms-<pool>-<seq>.inst.ipmi.nikhef.nl", DNS: 194.171.97.225, 194.171.97.224, 192.16.199.17
 +
* TZ: Europe/Amsterdam
 +
* Use NTP: 194.171.97.240, 172.22.64.2, 194.171.97.224
 +
And install XCP
 +
 
 +
To configure the server for Nikhef/NDPF, with all performance/config tweaks below, all commands are collected in a single script to download from stal and run on a fresh Dom0.
 +
wget http://stal.nikhef.nl/cfg/XCP15-config.sh
 +
sh XCP15-config.sh
 +
 
 +
Then connect to the server in XenCenter, and in the Properties enable multipathing (for server that can do DM). If necessary, attach the iSCSI network to managent over vlan 16 (using the "iSCSI (over management)" network defined for the pool as a secondary management network).
 +
 
 +
== Applying critical patches to XCP (1.5beta only) ==
 +
 
 +
Some key patches neeed to be applied to new server installations, in particular for CVS-2012-0217. For these no updated XCP packages are available (yet), but fortunately the patches publicly published for XenServer 6.0 ('''not''' 6.0.2) fit perfectly into a XCP1.5 (1.4.90) configuration, updating xapi-core to release 0.2-3299 (from xapi-core-0.2-3293.i686.rpm). The actual xapi-core RPM can easily be extracted from the xsupdate file at [http://support.citrix.com/article/CTX133165 http://support.citrix.com/article/CTX133165]. To apply this patch to a host, the script "apply.sh" to get to XS60E013 is available inside Nikhef at http://stal.nikhef.nl/mirror/XenServer/XCP15/patch-XS60E013/. download to the newly installed node and execute apply.sh:
 +
wget -q http://stal.nikhef.nl/mirror/XenServer/XCP15/patch-XS60E013/apply.sh
 +
sh apply.sh
 +
 
 +
== VM Host network configuration ==
 +
 
 +
By default, VM hosts should be attached to the network in the following way:
 +
* first high-speed ethernet (eth0, or eth2 on blades with a dual 10G mezzanine board): untagged ve11 installnet, and tagged ve16 iSCSI
 +
* second interface (eth1 or eth3): tagged-only, the data networks: 2,3,4,6,7,8,9,14,66.
 +
: additional networks can be added for hosts that process sFlow data (ve12), or p4ctb private data (ve17)
 +
 
 +
Only if no secondary network interface is avaialble, should data traffic be directed over the management link (and then, only tagged). Having data traffic on the installnet may interfere with heartbeat and live migration functions.
 +
In order to ensure consistency for live migraton, please make sure that all hosts in the pool have ''the same network configuration', and that all data networks are offered to all hosts!
 +
 
 +
== Faking XenServer 6.0 for XenCenter Management (1.5beta only) ==
  
 
By default, the XCP hypervisor will present itself as a very-old-XenServer (v1) instance, and XenCenter will refure to do some of the more advanced features like dynamic memory, snapshots, and live migration. This can be fixed by manually 'editing' the XenServer version string in the xapi program, as describe don the Xen wiki. After '''each new installation and upgrade''' of the xapi binary:
 
By default, the XCP hypervisor will present itself as a very-old-XenServer (v1) instance, and XenCenter will refure to do some of the more advanced features like dynamic memory, snapshots, and live migration. This can be fixed by manually 'editing' the XenServer version string in the xapi program, as describe don the Xen wiki. After '''each new installation and upgrade''' of the xapi binary:
Line 60: Line 154:
 
But now XenCenter will complain about the XenTools being out of date. Fix this for Linux guests by installing XenTools and subsequently:
 
But now XenCenter will complain about the XenTools being out of date. Fix this for Linux guests by installing XenTools and subsequently:
  
 +
rpm -Uvh /mnt/Linux/xe-guest-utilities*x86_64.rpm
 
  sed -i /usr/sbin/xe-update-guest-attrs -e 's/MajorVersion" "1"/MajorVersion" "6"/'
 
  sed -i /usr/sbin/xe-update-guest-attrs -e 's/MajorVersion" "1"/MajorVersion" "6"/'
 
  sed -i /usr/sbin/xe-update-guest-attrs -e 's/MinorVersion" "4"/MinorVersion" "0"/'
 
  sed -i /usr/sbin/xe-update-guest-attrs -e 's/MinorVersion" "4"/MinorVersion" "0"/'
 
  sed -i /usr/sbin/xe-update-guest-attrs -e 's/MicroVersion" "90"/MicroVersion" "99"/'
 
  sed -i /usr/sbin/xe-update-guest-attrs -e 's/MicroVersion" "90"/MicroVersion" "99"/'
 +
/usr/sbin/xe-update-guest-attrs
 
  # wait 2 minutes and check XenCenter again (or just execute /usr/sbin/xe-update-guest-attrs )
 
  # wait 2 minutes and check XenCenter again (or just execute /usr/sbin/xe-update-guest-attrs )
  
Line 91: Line 187:
 
* http://wiki.xen.org/wiki/XenCenterXCP
 
* http://wiki.xen.org/wiki/XenCenterXCP
  
= Management clients =
+
= Client installation and configuration =
 +
 
 +
== Creating template-based PV guests using existing disk images ==
 +
 
 +
By default, the VM templates will install a new machine, and the PV templates in particular will want to retrieve an install image from HTTP, NFS or CD-ROM. This may not be what you want if you have an existing image. The difference is in the PV-bootloader which is set after creating the VM configuration. By default, it's the 'eliloader', but to just start the VM it should be 'pygrub'. To create a VM based on an imported (raw or VDI) disk image, use a template ''but do not start it''. Instead first modify the bootloader.
 +
 
 +
To obtain the current PV boot loader:
 +
xe vm-param-get uuid=7dff5ebc-f348-718e-72d7-850670897469 param-name=PV-bootloader
 +
will return <tt>eliloader</tt>, and to set it to boot the system:
 +
xe vm-param-set uuid=7dff5ebc-f348-718e-72d7-850670897469 PV-bootloader=pygrub
 +
 
 +
== Creating Quattor-managed VMs ==
 +
 
 +
Quattor-managed VMs are created in almost the standard way using the [http://www.nikhef.nl/grid/ndpf/files/tmp/aii-provision-vm aii-provision-vm] tool (use [http://www.nikhef.nl/grid/ndpf/files/tmp/aii-provision-vm.newXML aii-provision-vm.newXML] if you have new gibberish that panc v9+ now produces), except that the final "--install" command given via aii-shellfe has been replaved by this VM provisioning script. Do the following
 +
 
 +
# create a machine in the usual way, run <tt>pushxprof -f ''facility'' ''host.name''</tt>. '''Please keep in mind that the hardware and system configuration are taken as is from the template. Make sure to configure enough RAM (at least 1024MB for EL5), and a moderate amount of disk (enough to host the partitions, but not an arbitrary value like 120 GB please)'''
 +
# configure the DHCP settings, needed for kickstart (but not needed for the bare VM):
 +
sudo aii-shellfe --configure ''host.name''
 +
# provision the VM (and start it if so desired)
 +
aii-provision-vm -n ''host.name'' [--start] [--autoboot] [--sr=''name-of-SR''] [--vmtemplate=''name-of-template-pcre-match'']
 +
: where the default template name is <tt>CentOS 5.*64-bit</tt>, and the default SR is taken from the pool default config.
 +
# start the VM on your preferred machine - and assign it a preferred VM server if you want autoboot to work.
 +
 
 +
The configuration is in <tt>$HOME/.xapirc</tt> on stal in the home of ndpfmgr. You can override all settings except the network map.
 +
Other (command-line) arguments:
 +
; -c|config=s : file name of alternate configuration file
 +
; -h|pool=s ($xapi_host) : URL of the pool master to contact (use https please)
 +
; -p=s ($xapi_pw) : password of the admin on the host - no ''not'' use this options please
 +
; -cdbxml=s ($cdbxmldir) : directory where the XMLCDB profiles are stored, defaults to <tt>/project/quattor/conf/build/xml</tt>
 +
; -n|name=s ($VMname) : name of the VM, defaults to <tt>example.nikhef.nl</tt> so ''please change this''
 +
; --vmtemplate=s ($vmtplname) : Perl regular expression to match the name of the template from which VMs are derived, e.g. <code>CentOS 6.*64-bit</code>.
 +
; --sr=s ($srname) : name of the SR on which to place the disk images and snapshot. Must be a shared SR.
 +
; -a|autoboot ($autoboot) : make the VM start on VM host boot
 +
; --start : start the VM after provisioning on any host in the pool
 +
; --user=s ($xapi_user) : name of the management user, defaults to <tt>root</tt>
 +
 
 +
== Re-installation of an existing VM ==
 +
 
 +
To re-install an existing VM, set the bootloader to "eliloader":
 +
xe vm-param-set uuid=7dff5ebc-f348-718e-72d7-850670897469 PV-bootloader=pygrub
 +
 
 +
The re-installation uses an ISO from the OS distribution. Use the following command to find out which location is used:
 +
xe vm-param-list uuid=c2f112e8-4a1f-0072-8d5a-99e91d132bac | grep other-config
 +
other-config (MRW): last_shutdown_time: 20121107T16:07:02Z; last_shutdown_action: Restart; last_shutdown_initiator: internal; last_shutdown_reason: rebooted; install-repository: http://stal.nikhef.nl/centos/5.7/os/x86_64/; install-methods: cdrom,nfs,http,ftp; auto_poweron: true
 +
 
 +
To change the location from which the ISO is downloaded, execute:
 +
xe vm-param-set uuid=c2f112e8-4a1f-0072-8d5a-99e91d132bac other-config:install-repository=http://stal.nikhef.nl/centos/5.8/os/x86_64/
 +
 
 +
== XAPI XML-RPC management ==
 +
 
 +
An example script in Perl is locaed on stal in bin/xapi-test.pl. It uses RPC::XML::Client to communicate, and lists the VM templates and connects to the various pools. In the end, it should be fairly simple to convert aii-shellfe to actually instantiatei a new VM automatically if you run "--install" :-)
 +
If you want to look at a ''really dirty'' piece of PERL, have a look at the [http://www.nikhef.nl/grid/ndpf/files/tmp/xapi-createVM.pl PoC code].
 +
 
 +
Some API doc and examples in Python are also available from the Xen site:
 +
* http://community.citrix.com/display/xs/XenServer+install.py
 +
* http://docs.vmd.citrix.com/XenServer/6.0.0/1.0/en_gb/api/
 +
* http://pastie.org/pastes/993571/reply
  
 
= Storage =
 
= Storage =
  
== Where are the VM disk images (VDIs) ==
+
== Managing VM disk images (VDIs) ==
 +
 
 +
VM disk images come in two flavours: the default is VHD (even over LVs), or raw (over LVs) for legacy/import stuff. The Virtual Disk Images (VDIs) are bound to a specific Storage Repository (SR), but you can move them using the GUI, and you can copy them using the CLI. Note that copy is not found in the XenCenter GUI for some reason. Cop from RO to RW SRs is possible, and a good way to import/migrate VM disk images bwteen pools. Or export your file based old VM disk images over NFS and then attach the NFS SR over the installnet to the DOM0 and cop from there. The target SR can be specified by uuid:
 +
xe vdi-copy uuid=95cc590d-56b7-4fb5-ad44-8d78fb2921cf sr-uuid=e5d0e83a-7e70-3d28-31ab-ed98bfb68368
 +
 
 +
Notes:
 +
* by default we want them stored on shared FC/iSCSI storage, which can even (in R/O mode) be used to exchange VM images between pools. Only use NFS for transient storage, and not for any mission-critical stuff please. And 'salado' is not the proper host to put them on, sinc eit only has 100Mbps to the installnet.
 +
* copy ''from'' NFS/local disk ''to'' FC is quick (30 sec for 8GB), but the reverse is very slow (10 min for 8 GByte)!
 +
 
 +
== Moving VM image files from outside - import existing Xen (PV) VMs from CentOS or RHEL 5 ==
 +
 
 +
Note: you cannot move VM images between pools - since a single Storage Repository (SR) cannot be shared amongst more than one pool - and you cannot usually mount current (non-XCP) image repositories on a server inside a pool. One trick to play is to actually copy the contents of the images from the outside to a pre-prepared volume inside an SR, and that pre-prepared VDI should be of the same (or larger), and be of type 'raw'. Into your fresh placeholder VDI, you can then copy the raw disk image from your RHEL5/CentOS5 Xen VM.
 +
 
 +
==== Raw image and LVM ====
 +
 
 +
# create a RAW disk image within the SR. You can get the SR UUID from the management console or from the sr-list CLI command. Then create the raw diskimage LV on the master XCP server:
 +
xe vdi-create sm-config:type=raw sr-uuid={SR_UUID} name-label="My Raw LVM VDI" virtual-size={size}GiB type=user
 +
# rescan the contents on the SR on the management console. You should see it now.
 +
# plug the new VDI on a control domain (preferably the pool master) where you can see it
 +
## find the control domain with <tt>xe vm-list</tt>, and note the UUID
 +
## create an autodetect VBD, and note the UUID you'll get
 +
xe vbd-create vm-uuid=''UUID-of-control-domain'' device=autodetect vdi-uuid=''UUID-of-your-new-VDI''
 +
: plug the VBD into the control domain
 +
xe vbd-plug uuid=''UUID-of-your-VBD-out-of-the-previous-step''
 +
: write the stuff to /dev/xvda (look at the size the find the proper one)
 +
# copy the contenhts of the raw image into the LV - use a broker host if the old and new server cannot see each other:
 +
[salado]# ssh -A root@keerder "dd if=/vm/mach/images/rooier.img" | ssh root@pool-piet.inst.ipmi.nikhef.nl "dd of=/dev/xvdXX"
 +
# unplug the device and remove the VBD (but NOT the VDI!)
 +
xe vbd-unplug uuid=''UUID-of-your-VBD-out-of-the-previous-3b''
 +
xe vbd-destroy uuid=''UUID-of-your-VBD-out-of-the-previous-3b''
 +
# create a VM that has this VDI as its local disk, and start it.
 +
 
 +
See
 +
* http://forums.citrix.com/thread.jspa?threadID=283458
 +
 
 +
The live migration of this VM still has to be tested - you may need to copy/clone the disk image to get it into VHD format. To upgrade all LVMs on an SR, use the <tt>xe-lvm-upgrade</tt> tool, as described in the "Upgrading LVM storage from XenServer 5.0 or earlier" of the 5.5+ documentation. Copyies taken of a raw VDI will become VHD VDIs by default.
 +
 
 +
See
 +
* http://docs.vmd.citrix.com/XenServer/5.5.0/1.0/en_gb/installation.html
 +
* http://docs.vmd.citrix.com/XenServer/5.5.0/1.0/en_gb/reference.html#upgrading_to_lvhd
 +
* http://www.ndchost.com/wiki/server-administration/netcat-over-ssh (to speed up transfers using netcat/nc)
  
 
== Connecting with FC ==
 
== Connecting with FC ==
 +
 +
=== Multipath configuration ===
 +
 +
See
 +
* http://support.citrix.com/article/CTX118791
  
 
=== Resizing an FC based SR ===
 
=== Resizing an FC based SR ===
  
* on the Compellent, resize the volume which is mapped to the server group
+
On the Compellent, resize the volume which is mapped to the server group.
* on '''each of the XCP hosts''', find the multipath map that is mapped to this volume, e.g. using <tt>xe sr-list name-label=''your-SR-name-you-want-to-resize''</tt>, and do a pvscan to actually find the PV in your dom0 (it is not there by default)
+
 
** from the device map name (e.g. "/dev/dm-1"), find the underlying SCSI devices representing each path:
+
Then, on '''each of the XCP hosts''',  
 +
* find the multipath map that is mapped to this volume, e.g. using <tt>xe sr-list name-label=''your-SR-name-you-want-to-resize''</tt>, and do a pvscan to actually find the PV in your dom0 (it is not there by default)
 +
* from the device map name (e.g. "/dev/dm-1"), find the underlying SCSI devices representing each path:
 
  multipath -l
 
  multipath -l
 
: will give you the list of sd* devices. You can use <tt>cat /proc/partitions</tt> to see what the current (old) size appears to be.
 
: will give you the list of sd* devices. You can use <tt>cat /proc/partitions</tt> to see what the current (old) size appears to be.
** rescan all the underlying SCSI devices for size:
+
* rescan all the underlying SCSI devices for size:
 
  for i in /sys/block/dm-X/slaves/*/device/rescan ; do echo 1 > $i ; done
 
  for i in /sys/block/dm-X/slaves/*/device/rescan ; do echo 1 > $i ; done
** check if the SCSI device size is now correct (cat /proc/partitions will do nicely)
+
* check if the SCSI device size is now correct (cat /proc/partitions will do nicely)
** propagate the size changes to the multipath device map with the multipathd command
+
* propagate the size changes to the multipath device map with the multipathd command
 
  multipathd -k
 
  multipathd -k
 
: and find the maps & resize
 
: and find the maps & resize
 
  list maps
 
  list maps
 
  resize map ''nameofthemap''
 
  resize map ''nameofthemap''
** check with cat /proc/partitions whether the mapped device is now the same size as the SCSI devices
+
* check with cat /proc/partitions whether the mapped device is now the same size as the SCSI devices
** resize the physical volume for LVM
+
* resize the physical volume for LVM
 
  pvresize /dev/dm-1
 
  pvresize /dev/dm-1
** check the PV and VG:
+
* check the PV and VG:
 
  pvdisplay -C
 
  pvdisplay -C
 
  vgdisplay -C
 
  vgdisplay -C
* make sure XCP sees the new layout. Click on "rescan" in XenCenter, or us ethe CLIO
+
 
 +
Finally, make sure XCP sees the new layout. Click on "rescan" in XenCenter, or us ethe CLIO
 
  xe sr-scan uuid=<the uuid you noted in the previous step>
 
  xe sr-scan uuid=<the uuid you noted in the previous step>
  
Line 131: Line 331:
 
== Connecting with iSCSI ==
 
== Connecting with iSCSI ==
  
The [[NDPF_System_Locations#Network_Connection_in_the_NDPF|iSCSI network is ve16]], in range 172.21.0.0/16. The Compellent "Sint" is at 172.21.0.34, 35 en het Fault Domain 0 op .36
+
The [[NDPF_System_Locations#Network_Connection_in_the_NDPF|iSCSI network is ve16]], in range 172.21.0.0/16. The Compellent "Sint" is at 172.21.0.34, 35 en the  Fault Domain 0 (the one to use) is at 172.21.0.36.
 +
 
 +
To attached an iSCSI SR is far less stable than using FC, so if you have an FC path available, use that! Otherwise:
 +
* create a volume on the Compellent
 +
* create a server group on the Compellent if not already there
 +
* on the XenServer host, click "Add new storage" -> "Software iSCSI" and name the SR
 +
* as the target host, ping 172.21.0.36 (sint.iscsi.ipmi.nikhef.nl), port 3260
 +
* click "Discover IQNs"
 +
* now, on the compellent, create a server and try to identify the proper iSCSI initiator. It may not be there yet, in which case you can retry on a per-XenServer basis.
 +
* click "Discover LUNs". If there are no LUNs, go back to the Compellent and try again. If needed, click on Discover IQNs again and retry
 +
If you never find the target, you can login on to the VM host and try
 +
iscsiadm -m node -l
 +
to log in, but only if the iSCSI initiator name has been set in <tt>/etc/iscsi/initiatorname.iscsi</tt>. Sometime XCP does not do that, and you just need to try again.
 +
 
 +
See also:
 +
* http://forums.citrix.com/message.jspa?messageID=1436337
 +
* http://groups.google.com/group/open-iscsi/msg/38f948648ccfe68e
  
 
== Connecting with NFS ==
 
== Connecting with NFS ==
 +
 +
Create an NFS server in the installnet (or in the iSCSI net if you really need to), and export it to all servers in the XCP pool. It should be mountable by all, and mounting will be done as root (so put "no_root_squash" in the export options). The NFS SR is a shared repository that can be used for live migration, but it will obviously not have multipath support.
 +
NFS is also nice for proving ISO libraries for install images, and can be used effectively for importing 'external' legacy VMs that are file-based (not LVM).
 +
For production, it is not of much use. But it's simple and effective.
  
 
== About local disk ==
 
== About local disk ==
 +
 +
The recommended configuation for VM hosts will create local disk with a file system and write sparse VDIs in VHD format to it. Copying images ''to'' local disk is slow, and VMs using local VDIs cannot be migrated.  Also remember that you cannot copy VDIs between local storage.
 +
Moving stuff back to share storage SRs is quick and easy.
 +
 +
== Coalescing disk images ==
 +
 +
At some point, VM images with a lot of snaphots on FC or iSCSI storage may start taking up a lot of space. This can be 'coalesced' when the VM is suspended or shut down using
 +
xe host-call-plugin host-uuid=<host-UUID> plugin=coalesce-leaf fn=leaf-coalesce args:vm_uuid=<VM-UUID>
 +
 +
See:
 +
* http://support.citrix.com/article/CTX123400
 +
 +
== Other considerations and hints ==
 +
 +
See also:
 +
* http://www.markround.com/archives/63-Citrix-XenServer-5.6-Review.html (Citrix XenServer 5.6 Review, storage section)
  
 
= Troubleshooting =
 
= Troubleshooting =
 +
 +
== Moving from HVM to PV ==
 +
 +
See [http://support.citrix.com/article/CTX121875 http://support.citrix.com/article/CTX121875].
 +
 +
The page suggests to manually create an initrd file with the Xen drivers (and minus the SCSI drivers). This needs some explaining.
 +
 +
The normal process when installing a kernel is that the initrd is populated with whatever is relevant for the ''current'' context of the system. So a system that is not running paravirtualized will normally not install an initrd with Xen drivers. Also, the Grub menu settings are weird, containing a 'kernel' that points to a xen.gz file and 'modules' for the real kernel and initrd. These must be edited to look like ordinary entries. The default entry number must be set to 0, to use the Xen kernel at next boot.
 +
 +
Manually creating an initrd is a maintenance nightmare, but once the system is running paravirtualized, subsequent kernel upgrades will generate the correct initrd automatically so this is a one-time action only.
 +
 +
== Wrong VM type - or resetting the boot loader ==
 +
 +
=== Set the VM image type to PV ===
 +
 +
The HVM configuration of a VM is obtained (via the CLI) using
 +
  xe vm-param-list uuid=
 +
where the uid is obtained using <tt>xe vm-list</tt>. The HVM configuration looks like:
 +
              HVM-boot-policy ( RW): BIOS order
 +
              HVM-boot-params (MRW): order: cn
 +
        HVM-shadow-multiplier ( RW): 1.000
 +
                    PV-kernel ( RW):
 +
                    PV-ramdisk ( RW):
 +
                      PV-args ( RW):
 +
                PV-legacy-args ( RW):
 +
                PV-bootloader ( RW):
 +
            PV-bootloader-args ( RW):
 +
 +
but we want to clear the HVM settings amd make it boot PV via pygrub. Try
 +
xe vm-param-set uuid=<vm uuid> HVM-boot-policy="" ''(clear the HVM boot mode)''
 +
xe vm-param-set uuid=<vm uuid> PV-bootloader=pygrub ''(set pygrub as the boot loader)''
 +
xe vm-param-set uuid=<vm uuid> PV-args="text utf8" ''(set the display arguments)''
 +
 +
 +
Followed by
 +
xe vm-disk-list uuid=<vm uuid> ''(this is to discover the UUID of the interface of the virtual disk)''
 +
xe vbd-param-set uuid=<vbd uuid> bootable=true ''(this sets the disk device as bootable)''
 +
 +
making sure to pick the boot device (VDB):
 +
Disk 0 VBD:
 +
uuid ( RO)            : 2cc81283-5a83-90c8-deff-68fef2af9529
 +
    vm-name-label ( RO): rooier.nikhef.nl
 +
        userdevice ( RW): 0
 +
 +
See also
 +
* http://itproctology.blogspot.com/2009/06/pv-enable-hvm-on-xenserver.html
 +
 +
=== Editing the grub  menu of a guest ===
 +
 +
xe-edit-bootloader -n "vm-name" -p 1 -f /grub/grub.cfg
 +
 +
See
 +
* http://blog.403labs.com/post/1546501840/paravirtulization-with-citrix-xenserver-5-5-and-ubuntu
 +
 +
=== Booting from a DVD ISO (e.g. rescue mode) ===
 +
 +
In XenCenter it is sometimes not possible to choose any other boot device than the Hard Disk. To change this, edit the following parameters on the command line.
 +
xe vm-param-set uuid=''<uuid of VM>'' HVM-boot-policy="BIOS order"
 +
xe vm-param-set uuid=''<uuid of VM>'' HVM-boot-params:"order=dc"
 +
 +
Now the boot parameters can be edited again in XenCenter.
 +
 +
== SR corruption ==
 +
 +
If you suffer from errors like "<tt>Error: Deleting disk '' from SR 'SRNAME' - Error in Metadata volume operation for SR.</tt>", and in the log on the pool master shows things like
 +
[23889] 2012-07-16 12:06:49.549273 Exception getting metadata length.Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error
 +
[23889] 2012-07-16 12:06:49.549392 Exception getting metadata with params{'vdi_uuid': '81fac4d4-d273-4b3e-81b9-7960f615b20e'}. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error
 +
[23889] 2012-07-16 12:06:49.549517 Error deleting vdi 81fac4d4-d273-4b3e-81b9-7960f615b20e from the metadata. Error: VDI delete operation failed for parameters: /dev/VG_XenStorage-790dc413-883f-f49e-a0e8-8410778145aa/MGT, 81fac4d4-d273-4b3e-81b9-7960f615b20e. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error
 +
[23889] 2012-07-16 12:06:49.587649 Raising exception [181, Error in Metadata volume operation for SR. [opterr=VDI delete operation failed for parameters: /dev/VG_XenStorage-790dc413-883f-f49e-a0e8-8410778145aa/MGT, 81fac4d4-d273-4b3e-81b9-7960f615b20e. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error]]
 +
 +
the MGT management volume on the SR may have become sorruption. Why this would happen is unclear, but once it happens this error will prevent you from removing nameless and label-less volumes, and appear to eat disk space form the repo (they don't, but the management volume says so).
 +
 +
What worked is the following:
 +
* try removign the offending volume. The log on the pool master will show the abse name of the offending VG after the lvchange -ay command line log. Use the basename only, like "/dev/VG_XenStorage-''UUID-OF-SR''/"
 +
* with ls, look if the MGT volume is available (it should be):
 +
  ls -l /dev/VG_XenStorage-''UUID-OF-SR''/MGT
 +
* rescan the SR first (triggerign the error)
 +
xe sr-scan uuid=''UUID-OF-SR''
 +
* move the MGT volume out of the way ('forgetting' the content for a while). This ''can'' be done with the VDI's on it being attached to running VMs
 +
lvrename /dev/VG_XenStorage-''UUID-OF-SR''/MGT /dev/VG_XenStorage-''UUID-OF-SR''/oldMGT
 +
* re-scan the SR, e.g. using the GUI or CLI
 +
xe sr-scan uuid=''UUID-OF-SR''
 +
* rescan it again, for as long as the used space is not equal to the sum of the VDIs on it
 +
* see if the LVs are actaully gone (so this on the pool master):
 +
lvdisplay -C | grep ''UUID-OF-SR''
 +
* remove stale VDIs by uuid (if they are still there) or via the GUI
 +
* rescan it again, for as long as the used space is not equal to the sum of the VDIs on it
 +
 +
See also:
 +
* http://forums.citrix.com/thread.jspa?threadID=295713&start=15&tstart=0
 +
* http://forums.citrix.com/thread.jspa?threadID=306400
 +
* http://support.citrix.com/article/CTX131660
 +
 +
== Multipathing not complete ==
 +
 +
In case not all paths are found (there should be 4 out of four for each server and each volume), you can try and rescan the devices on the host.
 +
 +
* '''put the host in maintenance mode'''
 +
* find the SCSI hosts corresponding to the FC cards ("QME2572")
 +
cat /sys/class/scsi_host/host*/model_name
 +
echo /sys/class/scsi_host/host*/model_name
 +
* rescan the bus for those hosts
 +
echo "- - -" > /sys/class/scsi_host/host''X''/scan
 +
* restart t he multipathd
 +
/etc/init.d/multipathd restart
 +
* check pathing
 +
cat /proc/partitions |sort -k 3 -n
 +
* put host in production again
 +
  
 
== Dead VM server ==
 
== Dead VM server ==
Line 171: Line 516:
  
 
(from http://forums.citrix.com/thread.jspa?threadID=250603)
 
(from http://forums.citrix.com/thread.jspa?threadID=250603)
 +
 +
== Killing obnoxious VMs ==
 +
 +
If a VM refuses to die, you can kill it via the Dom0 on the host on which it runs. Login to the host, and
 +
 +
/opt/xensource/debug/xenops list_domains
 +
/opt/xensource/debug/xenops destroy_domain -domid ''<DOMID>''
 +
/opt/xensource/debug/xenops hard_shutdown_domain -domid ''<DOMID>'' -halt
 +
 +
and clean any state afterwards if needed with
 +
xe vm-reset-powerstate uuid=''UUID'' --force
 +
 +
 +
== Slow network throughput ==
 +
 +
This is not yet done - the max throughput for a plain VM is now around 1.1 Gbps as seen earlier at SARA. And this
 +
is after having set the network to bridge mode (from openvswitch) on all Dom0s and rebooting all boxes:
 +
xe-switch-network-backend bridge
 +
 +
Then, configure
 +
sysctl -w net.core.rmem_max=134217728 # BDP
 +
sysctl -w net.core.wmem_max=134217728 # BDP
 +
sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728 " # _ _ BDP
 +
sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728 " # _ _ BDP
 +
sysctl -w net.core.netdev_max_backlog=300000
 +
 +
It also helped significantly to increase the txqueuelen on the '''vif''' interface to the VM.
 +
ifconfig vif3.0 txqueuelen 300000
 +
almost doubles the network throughput. Of course the txqueuelen was already set to 300000 on the physical interfaces. There is no way to make this permanent yet, so a cronjob doing
 +
ifconfig | grep -P '^vif\d+\.\d+' |  awk '{system("ifconfig "$1" txqueuelen 300000")}'
 +
may help there.
 +
 +
Trond Eirik Haavarstein's blog post (see below) also had this nice script to be executed on each dom0, which got another 100Mbps:
 +
 +
#! /bin/sh
 +
echo Setting checksum off on VIFs
 +
VIFLIST=`xe vif-list | grep "uuid ( RO) " | awk '{print $5}'`
 +
for VIF in $VIFLIST
 +
do
 +
echo Setting ethtool-tx=off and ethtool-rx=off on $VIF
 +
xe vif-param-set uuid=$VIF other-config:ethtool-tx="off"
 +
xe vif-param-set uuid=$VIF other-config:ethtool-rx="off"
 +
done
 +
echo Setting checksum off on PIFs
 +
PIFLIST=`xe pif-list | grep "uuid ( RO) " | awk '{print $5}'`
 +
for PIF in $PIFLIST
 +
do
 +
echo Setting ethtool-tx=off and ethtool-rx=off on $PIF
 +
xe pif-param-set uuid=$PIF other-config:ethtool-tx="off"
 +
xe pif-param-set uuid=$PIF other-config:ethtool-rx="off"
 +
done
 +
 +
Some references:
 +
* http://wiki.xen.org/xenwiki/Network_Throughput_Guide.html
 +
* http://www.xenappblog.com/2010/citrix-xenserver-slow-network-performance by Trond Eirik Haavarstein
 +
* http://djlab.com/2011/05/dropped-vif-tx-packets-on-xenserver/ (on the txqueue length on vifs)
 +
* http://forums.citrix.com/thread.jspa?threadID=299817
 +
* http://forums.citrix.com/thread.jspa?threadID=298429
 +
* http://forums.citrix.com/thread.jspa?threadID=295021
 +
 +
== SR meta-data corrupted ==
 +
 +
For example, newly created LVs have an empty string as name and description, and you see input/output errors like "DATE TIME Error: Save settings - Error in Metadata volume operation for SR."
 +
 +
See:
 +
* http://virtualdesktopninja.com/VDINinja/2012/xenserver-metadata-corrupt-workaround/
 +
 +
== How NOT to copy your LVMs from RHEL5 ==
 +
 +
The default image format in an SR is based on "VHD" disk images, that are distributable across VMs and contain some (compressable) meta-data about sparseness. Overwriting an LVM which was ''supposed to contain'' an VHD image with a raw disk image will make the SR corrupt. So '''if''' you do:
 +
* create a virtual disk with the default (XenCenter) tools in a SR
 +
* then active the LV on an XCP server, e.g. with <tt>lvchange -a y /dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625</tt>
 +
* copy the data from remove into this activates LV, and wait for it to complete (you '''cannot''' use the LV in a distributed setup as long as it is active on a single XCP host): <tt>dd if=/vm/mach/images/rooier.img bs=64M | ssh -A davidg@salado "ssh -A root@vms-piet-16.inst.ipmi.nikhef.nl 'dd of=/dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625 bs=64M'"</tt>
 +
* de-activate the LV on the import host <tt>lvchange -a n /dev/VG_XenStorage-e5d0e83a-7e70-3d28-31ab-ed98bfb68368/VHD-74e94bbc-b0e5-4e76-b507-12897b9b2625</tt>
 +
* create a new VM (typo "Other" seems to be needed) and use the newly-populated disk image as the disk for the VM
 +
* try it out ...
 +
'''it will break the SR''' and you loose all images on it (since the MGT data is support.
 +
Recover using http://support.citrix.com/article/CTX122001:
 +
# open a console on the master XCP server
 +
# Back up LVM metadata
 +
vgcfgbackup
 +
# Run the following command to see the LV which is causing trouble and that is causing the SR not to be scanned.
 +
lvscan
 +
# Remove the clone logical volume. Note: Make sure the correct Logical Volume is deleted.
 +
lvremove /dev/VG_ XenStorage-8d418f1a-107e-472f-5453-b24c88e7428e/VDI_8e4b4263-f9af-45f3-b97e-afa5481ea2a1
 +
#.Run the following command to scan the SR, or use XenCenter
 +
xe sr-scan uuid=<UUID of SR for the VM>.
 +
 +
You may need to forget about the SR first and then re-attach it (but DO NOT FORMAT the SR on attaching ;-)
 +
 +
== Disaster Recovery scenarios ==
 +
 +
* http://support.citrix.com/servlet/KbServlet/download/17141-102-671564/XenServer_Pool_Replication_-_Disaster_Recovery.pdf (XenServer Pool Replication: Disaster Recovery, Citrix)
 +
 +
== Autostart and restart on boot ==
 +
 +
Through the CLI you can have XCP do it for you:
 +
 +
Find the pool uuid (from the GUI or with <tt>xe pool-list</tt>), and then
 +
xe pool-param-set uuid=beb34025-c4c3-020b-7144-848030179faa other-config:auto_poweron=true
 +
 +
and for each VM you want to autostart (find UUIDs again from GUI or xe vm-list command):
 +
xe vm-param-set uuid=[uuid-vm] other-config:auto_poweron=true
 +
 +
See also:
 +
* http://www.virtues.it/2011/10/xenserver6-vm-auto-start-feature/
 +
 +
PS:
 +
The "ha-always-run" option is depricated, and XCP1.5 and XenServer 6 don't offer autostart. We can script it for hosts aiuth an afficity by starting all VMs who's afficility is the local host (of course apart from teh control domain. This should run some time  after boot, e.g. in /rc.local, or it will continue to restart servers we shut down on purpose. And then: servers that ''should not'' autostart should not have an affinity ;-)
 +
 +
== Updating XCP ==
 +
 +
Apart from a major upgrade by re-install, upgrading XCP using the CentOS repos may be dangenrous. At least, the following RPMs ''must be excluded'' from the update:
 +
rpm -qa --queryformat "%{NAME}\t%{VENDOR}\n" | grep -v CentOS | awk 'BEGIN {a=""} {split($0,array,"\t"); a=a" "array[1]"*" } END {print "exclude="a"\n"}'
 +
and this added to the CentOS base Yum configuration file. In particular lvm2 must never be upgraded.
 +
 +
See also:
 +
* http://www.gossamer-threads.com/lists/xen/api/245286?do=post_view_threaded
 +
 +
== Structure of the xsupdate file of XenServer ==
 +
 +
The "xsupdate" file with patches and updates that you download from xen.org for XenServer is a zip file, which contains a signed set of RPMs in what is essentially a shar archive. Use this recipe to extract the RPM from the xsupdate file, cq. convert an xsupdate file to RPM:
 +
 +
* get or download the ZIP file with the patch.
 +
* extract the ZIP file, and locate the ".xsupdate" file.
 +
* use PGP or GPG to extract the binary contents in shar format from it:
 +
gpg --output shar.sh --verify XS60E013.xsupdate
 +
* extract the binary contents form the created shar-ish archive, thereby creating in /tmp/tmp.XXXXXX the binary contents of the package
 +
sh shar.sh unpack
 +
* locate the binary RPMs to be updated in /tmp/tmp.XXXXXX, these should all be installed via rpm -Uvh (use rpm -Uvh --test first)
 +
* look at install.sh for any additional magic to be done to the nodes. Using install.sh itself will likely not work since it checks the installed software version, which for us is XCP and not XenServer)
 +
* in var/patches/applied in the unpacked directory is a set of uuid-named file. These should be copied onto the nodes in /var/patches/applied
 +
* restart xapi on the nodes afterwards
 +
* the /var/patches/applied files will result in the "updates" field in XenCenter being updated. This window will also show the reboot-required status if the patches/applied meta-data so indicates
 +
* reboot nodes if needed (rotating and moving VMs as needed)
 +
 +
 +
== Connecting to the graphical console of a VM via VNC ==
 +
 +
When a VM becomes unreachable, but is still running, as a last resort one could try to connect to the graphical console that every machine still has. It's nicely tucked away in the innards of XenServer, but the following little script should help. It's available on pool-piet.inst.ipmi.nikhef.nl as connect-to-vm-console.sh.
 +
 +
#!/bin/sh
 +
 +
# locate the host with the graphical console for the given vm
 +
# find the vnc port and connect the vnc viewer
 +
 +
vm="$1"
 +
 +
vmuuid=`xe vm-list name-label="$1" --minimal`
 +
 +
if [ -z $vmuuid ]; then
 +
    echo "Could not find VM named $vm" >&2
 +
    exit 1
 +
fi
 +
 +
residenton=`xe vm-param-get uuid=$vmuuid param-name=resident-on`
 +
 +
residenthost=`xe host-list uuid=$residenton params=name-label --minimal`
 +
 +
vncport=`ssh $residenthost xn console-list $vm | awk '$1 == "RFB" { print $2 }'`
 +
 +
if [ -z $vncport ]; then
 +
    echo "Could not find VNC console for $vm"
 +
    exit 1
 +
fi
 +
 +
echo "Issue the following command from your workstation:
 +
 +
ssh -L 5901:localhost:$vncport -f -N root@$residenthost
 +
vncviewer localhost:5901
 +
"

Latest revision as of 12:21, 14 December 2015