Xen 3.2, CentOS 5.1 and NAT HOWTO

From PDP/Grid Wiki
Revision as of 14:31, 9 May 2008 by Dennisvd@nikhef.nl (talk | contribs) (→‎Xen 3.2)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This HOWTO documents the exploring work for using Xen to manage ad-hoc virtual clusters. It brings together two completele independent ideas.

  • a way to automate the handling of IP addresses, DHCP and DNS in a NAT setup
  • a method of generating paravirtualized machines in a dynamic way

Neither of which was entirely straightforward.

An additional (soft) constraint was to stay as close as possible to a supportable configuration (aka the Little Red Riding Hood Constraint).

Xen 3.2

The plan was to use paravirtualized CentOS >= 4 as DomUs, in both i386 and x86_64 flavours. If other operating systems are to be used, in particular anything non-Linux, full virtualization is required. This possibility has not been investigated here.

The base system (Dom0) originally was CentOS 5.1, which ships with Xen 3.1; however, the first trials showed that the combination of a x86_64 Dom0 and an i386 DomU led to kernel panics in the DomUs during the installation. While the root cause for this was not discovered, an upgrade to Xen 3.2 solved the problem.

Update: Red Hat released a patch that may fix this issue.

Virtual machines

A virtual machine description is a snippet of Python code that describes the components of the machine. Here is an example

name = "ce"
bootloader = "/usr/bin/pygrub"
vcpus=1
on_reboot = 'restart'
on_crash = 'destroy'
memory = "1024"
disk = [ 'phy:VolGroup00/ce,xvda,w' ]
vif = [ 'mac=00:16:3E:C2:AA:77' ]
name
each vm should have a unique name, for the sake of Xen's administration.
bootloader
an alternative to handing Xen a straight kernel/initrd to boot is to use a bootloader script that will do something smart. For instance, the pypxeboot bootloader mimics the pxeboot behaviour by asking the DHCP server which kernel to download, while the pygrub bootloader (included in Xen) behaves much like GRUB. This mimicry is necessary because the machines, which are paravirtualized, do not have a real BIOS system underneath.
vcpus
The number of CPU cores to assign to the machine.
on_reboot/crash
What to do in certain situations. Although 'destroy' sounds rather spectacular, it is only the internal machine description that is being dropped, including the memory contents; the disk contents persists and thus it is more like a powerdown.
memory
The amount of RAM (in MB) to assign to the machine.
disk
The assignment of block devices; in this case a mapping is made from a physical block device to an internal name, with write access (w). The alternative is to map (image) files, which has a little more overhead because of the file system.
vif
The virtual network devices. The number of options here is endless, but a 'real' machine only cares about its MAC address, and that is all we need. Some care goes in to keeping these unique. The first three bytes are fixed; this is the IEEE Organizationally Unique Identifier (OUI) handed out to Xensource, Inc.

The Automated generation of Xen VMs takes care of most of the hassle of creating these descriptions.

NAT

The default configuration for networking with Xen DomUs is bridging the network interface to the Dom0. This will make a DomU network interface appear on the LAN just like any other machine.

The following text explains how you can change the configuration to masquerade all the network interfaces of the DomUs behind the Dom0. This will turn the Dom0 into a NAT box, which may be useful if you want to build a cluster that doesn't expose itself to the network as much.

The IP adresses behind the NAT box are assigned dynamically with DHCP from the private 10.x.x.x range. A lightweight DHCP/DNS server called dnsmasq is used to manage the assignments. Configuration files for dnsmasq can be placed in /etc/dnsmasq.d/ and must have the extension .conf.

This is the full text of /etc/dnsmasq.d/xencluster.conf.

dhcp-range=testbed,10.0.0.1,static,255.0.0.0,infinite
read-ethers
leasefile-ro
except-interface=eth0

Here's the configuration explained.

dhcp-range
every configuration must have at least one range of IP addresses to hand out, even if only static assignments are being done. The 'static' keyword replaces the end address of the range and means that only static addresses will be given. The lease time is infinite.
read-ethers
this directive tells dnsmasq to read the file /etc/ethers, to find out which IP addresses belong to which MAC addresses. This file will play an role later on.
leasefile-ro
don't bother with any leases; useful for just doing the static thing.
except-interface
tells dnsmasq to ignore DHCP request originating from the physical interface of the machine (so as not to conflict with other DHCP servers on the net).

With this configuration dnsmasq will respond to DHCP requests coming from the virtual interfaces, by handing out addresses as listed in /etc/ethers. Here's a snippet of what this file looks like.

00:16:3E:0D:B0:96 10.0.5.1
00:16:3E:96:71:4E 10.0.10.1
00:16:3E:CF:26:36 10.0.1.1
00:16:3E:0C:65:7F 10.0.11.1
00:16:3E:96:CD:76 10.0.3.1
00:16:3E:19:39:B9 10.0.12.1
00:16:3E:C2:AA:77 10.0.2.1
00:16:3E:48:88:75 10.0.6.1

It is up to the manager of the virtual machines to assign unique mac addresses. Looking at the example description above, you can see the the machine called 'ce' will get IP address 10.0.2.1.

Dnsmasq also acts as a caching DNS server, and in particular it looks at /etc/hosts to resolve hostnames on its local network. So a line in /etc/hosts like

10.0.2.1 ce.testbed ce

will make the 'ce' virtual machine find its own DNS name ce.testbed.

Automating hosts and ethers

Keeping the VM descriptions, /etc/ethers and /etc/hosts in sync is an arduous task. Thankfully this can be fully automated with some scripting.

Network interfaces in Xen are generated on the fly. The line

vif = [ 'mac=00:16:3E:C2:AA:77' ]

will make eth0 appear with said MAC address. In the Dom0, a virtual interface is created as its counterpart. The two are logically connected by an imaginary crossover cable.

On the DomU side, you get

eth0      Link encap:Ethernet  HWaddr 00:16:3E:C2:AA:77  
          inet addr:10.0.2.1  Bcast:10.255.255.255  Mask:255.0.0.0

And in the Dom0, you see

vif5.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet addr:10.0.2.128  Bcast:0.0.0.0  Mask:255.255.255.255

Two scripts handle the management of these interfaces. One is called the network-script, which is just run once upon starting the xend daemon; the other is the vif-script, which is called each time a virtual interface needs to be created or torn down. The file names of the scripts to use are defined in the Xen daemon configuration file, /etc/xen/xend-config.xsp.

Install the new scripts network-nat-dns, vif-nat-dns and the replacement vif-common.sh. in /etc/xen/scripts/

Edit /etc/xen/xend-config.xsp and replace the lines

(network-script network-bridge)
(vif-script vif-bridge)

by

(network-script network-nat-dns)
(vif-script     vif-nat-dns)

and restart the xend daemon

/etc/init.d/xend restart

What vif-nat-dns does

In short, here's what the magic vif script does when an interface is brought up.

  1. derive the hostname for the interface from the virtual machine
  2. find the hostname and IP address in /etc/hosts (if it is not already in there, think up a new IP address and write a line to /etc/hosts. This is done just once ever for any machine.)
  3. write the mac address and IP address to /etc/ethers
  4. kick the dnsmasq daemon to force re-reading hosts and ethers
  5. bring up the virtual interface with the proper addressing and routing

When it's brought down again, the following steps ensue:

  1. bring down the virtual interface
  2. remove the ip address from /etc/ethers (but not from /etc/hosts!)

Addressing scheme

The IP range is divided as follows:

10.x1.x2.y

where x1 and x2 are taken together as a two-byte number to uniquely identify the virtual machine, and y is the interface number of that machine. The values of 0 < y < 128 are used on the VM side, and their virtual counterparts in the Dom0 are the same, but with the 8th bit set. This scheme allows an addressable range of 65536 machines, with 127 possible interfaces per machine.