Managing the security training sites
These are the quick notes about how to set up and run a bunch of virtual grid sites for training purposes.
Overview
The virtual machines for these sites are managed with Xen Cloud Platform (XCP) on blade 0, partition b. Log in as root@pool-bl0b.inst.ipmi.nikhef.nl. to manage these virtual machines or use a client tool such as Xen Centre.
The sites live on vlan 41, which is only available on bl0b. There is only one host with an interface to the outside, called melkstal.nikhef.nl (in the Open/Experimental network). This host serves as the gateway for all the participants in the training and the site administrators. It also serves as a NAT box. Participants won't log into melkstal directly; port forwarding has been set up so using ssh to a specific port on melkstal will land the user on the root account on a machine in the virtual domain, using their ssh public key.
On the inside of vlan 41, the network addressing is divided up by virtual site (to connect make sure to have your ssh-keys in your keyring):
IP range | domain name | login method | details |
---|---|---|---|
10.1.0.0/16 | darknet | ssh root@melkstal.nikhef.nl | management systems and example site |
10.1.1.0/16 | frogstar | ssh -p 2201 -A root@melkstal.nikhef.nl | |
10.1.2.0/16 | traal | ssh -p 2202 -A root@melkstal.nikhef.nl | |
10.1.3.0/16 | krikkit | ssh -p 2203 -A root@melkstal.nikhef.nl | |
10.1.4.0/16 | megadodo | ssh -p 2204 -A root@melkstal.nikhef.nl | |
10.1.5.0/16 | magrathea | ssh -p 2205 -A root@melkstal.nikhef.nl | |
10.1.6.0/16 | vogsphere | ssh -p 2206 -A root@melkstal.nikhef.nl |
Each site runs a number of machines to represent what is typical for a Grid site:
machine name | machine type | metapackage |
---|---|---|
ui | User interface | emi-ui |
wms | Workload management system and site BDII (currently not functioning) | emi-wms, emi-lb |
ce | CREAM Compute Element | emi-cream-ce |
headnode | batch system head node (HTCondor) | condor |
wn | Worker node | emi-wn |
jeep | general purpose machine | none |
There is one management host to help install and configure all other machines: cobbler.darknet. This system runs cobbler to help systems install with DHCP, DNS and kickstart files. It also runs saltstack to manage state on each system.
Installing and re-installing machines
Installation of new machines is done on the XCP master node. Log in as
root@pool-bl0b.inst.ipmi.nikhef.nl
In the home directory you will find this script which will create basic machine definitions from a template. It will give the machines a new interface with a generated MAC address.
The next step is to take the list of machines and their mac addresses (using another script) to cobbler.darknet and to define the systems in cobbler. There is a script for that. For example:
echo 22:05:e5:52:19:cc wms.darknet | ./cobbler-add-machine.sh
Right now, the script only adds machines consecutively to the darknet site.
It is also possible (but more tedious) to add machines via cobbler's web interface.
Once the machines are defined in cobbler it is time to start them. This is again done on the XCP node with the command
xe vm-start vm=wms.darknet
This will install a basic system, using cobbler for DHCP and for downloading the kickstart file.
As part of the basic installation, the package salt-minion will be installed with cobbler.darknet as the master. Once the installation is done, accept the key on cobbler with
salt-key -a <hostname>
One the link between master and minion is established, run the high state on the minion to implement all the state modules defined for it.
cobbler# salt 'machine.domain' state.highstate
Be patient as this command can take a long time to complete.
re-installation
It may be necessary to start from scratch with a machine. This is easier than a new installation, as the definition already exists in XCP and cobbler. The only thing that needs to be reset is the bootloader. The script Media:Vm-reinstall1.sh does just that. It also reboots the machine, so it will go into installing right away.
After the re-installation it is still necessary to run the state.highstate command to make sure the machine is in proper working condition (and ssh public keys are installed, otherwise logging in becomes quite difficult!).
cobbler# salt 'machine.domain' state.highstate
The previous ssh and salt keys are preserved during the installation, so it should not be necessary to re-initialize. Should this fail, check the following:
- Does the minion ping from the salt master?
cobbler# salt 'ce.vogsphere' test.ping
- If so, everything is ok.
- If not, check the logfile of the master to see if a different key was presented
/var/log/salt/master
- Log on to the console of the VM and restart the minion:
pool-bl0b# xe console name-label=ce.vogsphere
- Restart the minion. Check the minion logfile. If the key is rejected because it changed, the minion will halt again.
- In that case, make sure to remove the minion key from cobbler
salt-key -d ce.vogsphere
More troubleshooting
A machine may not be entirely configured after running the High State. This could especially be true of the CE's which run CREAM. You may find that a consecutive run will not invoke YAIM again; to force its run just remove
/etc/siteinfo/site-info.def
So it will be reloaded and trigger the running of YAIM.
Especially since not all dependencies have been figured out between the state items, it may be necessary to run highstate a few times. This can also be done from the minion, by invoking
salt-call state.highstate
Darknet CA
The test sites will need certificates, so a local simple CA is set up on cobbler.darknet in /srv/ca.
New certificates can be generated with the gen-host-cert.sh script. This automatically places the cert and key in /srv/salt/host_keys.
Issuing CRLs can be done by calling
./make-crl /var/www/html/7140638d.r0
Saltstack setup
After systems are installed with Cobbler, saltstack takes over. The machine cobbler.darknet is the salt master. Test the connection to the minions by running a ping test.
salt '*' test.ping
All machines should report in. If not, log in to the machine and check the salt minion log
/var/log/salt/minion
The saltstack description of the system is kept in a collection of YAML files under
/srv /srv/pillar /srv/salt
The pillar is just static data. The salt tree contains references to various machine types and modules, als well as files that are copied over to the minions such as ssh keys.
git repository
The salt tree is maintained with git. Set up a remote: git clone ssh://root@cobbler.darknet/root/salt
and check out the 'useyaim' branch. Whenever this branch gets pushed, the repository under /srv will be automatically updated (via a post-receive hook).
Adding users
Adding administrators for one of the sites mentioned above is done by adding their ssh public keys to the corresponding file /srv/salt/ssh_keys/domain.pub and running the salt state command:
salt '*.<domain>' state.highstate
Adding ordinary users for a site is done by adding their details in the pillar, much like the examples already given in /srv/pillar/users/frogstar.sls:
users: dent: fullname: Arthur Dent shell: /bin/bash home: /home/dent uid: 604 groups: - users
Take care to keep the uids unique. After adding the user, put the ssh public key in /srv/salt/ssh_keys/dent.pub. Again running
salt '*.frogstar' state.highstate
will update all frogstar machines to include the new user.
All these users will be present on all of the training sites.
list of current state files
Warning: this list is always outdated.
pillar/top.sls + set environments (just 'base') pillar/versions.sls + selects tomcat5 or tomcat6, based on OS pillar/vos.sls + static list of VOs to support, including pool account settings and vomses pillar/users/darknet.sls + (admin) users in the darknet domain pillar/users/frogstar.sls + (admin) users in the frogstar domain salt/top.sls + top file, defines what's what salt/condor/init.sls + settings common to all types of condor machines salt/condor/worker.sls + settings for a condor worker node (WN) salt/condor/submitter.sls + settings for a condor submitter (CE) salt/condor/manager.sls + settings for the condor manager (LRMS) salt/condor/config.local/manager + the literal configuration files salt/condor/config.local/worker + salt/condor/config.local/submitter + salt/users/frogstar.sls + adds the ssh keys for the users to root's authorized_keys salt/users/init.sls + create users according to the pillar salt/users/poolaccounts.sls + create poolaccounts according to the vos pillar salt/grid/bdii/init.sls + setup of a (top level) BDII salt/grid/yaim/init.sls + downloads a templated version of site-info.def salt/grid/yaim/siteinfo/users.conf + the siteinfo files salt/grid/yaim/siteinfo/wn-list.conf + salt/grid/yaim/siteinfo/vo.d + salt/grid/yaim/siteinfo/vo.d/tutor + salt/grid/yaim/siteinfo/vo.d/pvier + salt/grid/yaim/siteinfo/vo.d/dteam + salt/grid/yaim/siteinfo/groups.conf + salt/grid/yaim/siteinfo/site-info.def + salt/grid/repositories/emi3.sls + yum repositories for EMI-3 (for WMS) salt/grid/repositories/init.sls + yum repositories UMD-3 salt/grid/wn/init.sls + setup of WN salt/grid/gridmapdir.sls + populate the gridmapdir according to pillar data salt/grid/trustanchors.sls + yum repositories of certificate repositories, plus extra CAs salt/grid/ui/init.sls + setup of UI salt/grid/ui/vo-glite_wms.conf + templated files for the UI salt/grid/ui/vo-glite_wmsui.conf + salt/grid/ui/glite_wmsui_cmd_var.conf + salt/grid/cream/init.sls + CREAM setup (with condor batch system) salt/grid/cream/blah.config + templated version of blah.config salt/grid/hostcert/init.sls + install host certificates salt/grid/wms/init.sls + configure a WMS salt/network-gateway/init.sls + just for melkstal salt/sysctl.sls + adds sysctl settings to improve ZMQ performance salt/ca + CA files for the DARKNET CA and the e-infra tutor CA salt/timezone.sls + Sets the timezone to whereever.