Difference between revisions of "Agile testbed"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 155: Line 155:
 
  bind_timelimit 120
 
  bind_timelimit 120
 
  nss_initgroups_ignoreusers root
 
  nss_initgroups_ignoreusers root
 +
 +
Still TODO: the pam_ldap stuff...it doesn't work.

Revision as of 22:26, 5 July 2011

Diagram of the agile test bed

Agile testbed

The state of the testbed is going to change, as we are planning to integrate several more machines and change the overall setup of systems and services. See Testbed_Update_Plan.

The Agile testbed is a setup of easy-come easy-go virtual machines for quickly trying out new and experimental software.

It is used in the context of the P4 activity of the VL-e project, the SA3 activity in the EGEE III project and Application support in the BiG Grid project.

The test bed is hosted at the Nikhef data processing facility, and managed by Dennis van Dok, Jan Just Keijser, Mischa Sallé and Willem van Engen.

A new setup using OpenNebula is work in progress.


Hardware

The testbed currently consists of four physical machines: bleek, toom, kudde and span.

name type #cores mem OS disk remarks
bleek Intel 5150 @ 2.66GHz 4 8GB CentOS4-64 software raid1 2×500GB disks High Availability, dual power supply
toom Intel E5440 @ 2.83GHz 8 16GB CentOS5-64 Hardware raid1 2×715GB disks
kudde Intel E5440 @ 2.83GHz 8 16GB CentOS5-64 Hardware raid1 2×715GB disks
span Intel E5440 @ 2.83GHz 8 24GB CentOS5-64 Hardware raid10 on 4×470GB disks (950GB net) DHCP,DNS,NFS,LDAP


Network

The network between these machines is a bit particular: They all live in the same VLAN (194.171.96.16/28) but they all have an extra alias interface in the 10.198.0.0/16 range. The Xen DomUs on each of the Xen machines that live in that address range are given connectivity to the other DomUs in the same VLAN without using NAT, and connectivity to the outside with SNAT. Here's an example of the iptables on span:

Chain POSTROUTING (policy ACCEPT 58M packets, 3693M bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  any    any     10.198.0.0/16        194.171.96.16/28    
  436 63986 ACCEPT     all  --  any    any     10.198.0.0/16        10.198.0.0/16       
    1   190 SNAT       all  --  any    any     10.198.0.0/16        anywhere            to:194.171.96.28

So all traffic from a DomU on span will appear to have come from span to the outside.

Note that DomUs that have interfaces in the public address range do not need SNAT at all, they simply connect to the hosts xen bridge.

There is a separate network attached to each machine to allow IPMI management and Serial-Over-Lan.

Software Installation

The central machine in the network is span, it runs

  • dnsmasq for DNS and DHCP based on /etc/hosts and /etc/ethers
  • NFS server for the home directories and ssh and pem host keys

The other Xen machines, toom and kudde, run Xen 3.1. On these machines the creation and destruction of virtual machines is best left to the generate-machine and destroy-machine scripts, part of the nl.vl-e.poc.ctb.mktestbed software package.


Operational procedures

The testbed is not too tightly managed, but here's an attempt to keep our sanity.

Logging of changes

All changes need to be communicated by e-mail to CTB-changelog@nikhef.nl.

(This replaces the earlier CTB Changelog.)

adding a new machine

  • edit
/etc/hosts
/etc/ethers

to add the new machine, and hardware address.

  • Restart dnsmasq
/etc/init.d/dnsmasq restart
  • on span.nikhef.nl, run
/usr/local/bin/keygen <hostname>

to pre-generate ssh keys.

  • on span, run
/var/local/hostkeys/generate-knownhosts.sh
  • on all machines, do
cp /var/local/hostkeys/ssh_known_hosts /etc/ssh/ssh_known_hosts
  • (optional) generate or request an X509 host certificate. For local machines in the .testbed domain, Dutchgrid certificates won't be issued, but a testbed-wide CA is in use, ask Dennis. The certificate and key are stored in
/var/local/hostkeys/pem/<hostname>/hostcert.pem
/var/local/hostkeys/pem/<hostname>/hostkey.pem
  • place a 'firstboot' script on span in
/var/local/xen/firstboot/<hostname>

(it will be downloaded and run the first time after installation of the machine.)

  • run generate-machine on the Dom0 of choice.

Automatic configuration of machines

The default kickstart scripts for testbed VMs will download a 'firstboot' script at the end of the install cycle, based on the name they've been given by DHCP. Look in span.nikhef.nl:/usr/local/mktestbed/firstboot for the files that are used, but be aware that these are managed with git (gitosis on span).

Configuration of LDAP authentication

Fedora Core 14

The machine fc14.testbed is configured for LDAP authn against ldap.nikhef.nl. Some notes:

  • /etc/nslcd.conf:
uri ldaps://ldap.nikhef.nl ldaps://hooimijt.nikhef.nl
base dc=farmnet,dc=nikhef,dc=nl
ssl on
tls_cacertdir /etc/openldap/cacerts
  • /etc/openldap/cacerts is symlinked to /etc/grid-security/certificates.

Debian 'Squeeze'

Debian is a bit different; the nslcd daemon is linked against GnuTLS instead of OpenSSL. Due to a bug (so it would seem) one cannot simply point to a directory of certificates. Debian provides a script to collect all the certificates in one big file. Here is the short short procedure:

mkdir /usr/share/ca-certificates/igtf
for i in /etc/grid-security/certificates/*.0 ; do ln -s $i /usr/share/ca-certificates/igtf/`basename $i`.crt; done
update-ca-certificates

The resulting file can be used in /etc/nslcd.conf:

uid nslcd
gid nslcd
base dc=farmnet,dc=nikhef,dc=nl
ldap_version 3
ssl on
uri ldaps://ldap.nikhef.nl ldaps://hooimijt.nikhef.nl
tls_cacertfile /etc/ssl/certs/ca-certificates.crt
timelimit 120
bind_timelimit 120
nss_initgroups_ignoreusers root

Still TODO: the pam_ldap stuff...it doesn't work.