Managing the security training sites

From PDP/Grid Wiki
Revision as of 19:16, 18 May 2014 by Msalle@nikhef.nl (talk | contribs) (Fix XCP hostname)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

These are the quick notes about how to set up and run a bunch of virtual grid sites for training purposes.

Overview

The virtual machines for these sites are managed with Xen Cloud Platform (XCP) on blade 0, partition b. Log in as root@pool-bl0b.inst.ipmi.nikhef.nl. to manage these virtual machines or use a client tool such as Xen Centre.

The sites live on vlan 41, which is only available on bl0b. There is only one host with an interface to the outside, called melkstal.nikhef.nl (in the Open/Experimental network). This host serves as the gateway for all the participants in the training and the site administrators. It also serves as a NAT box. Participants won't log into melkstal directly; port forwarding has been set up so using ssh to a specific port on melkstal will land the user on the root account on a machine in the virtual domain, using their ssh public key.

On the inside of vlan 41, the network addressing is divided up by virtual site (to connect make sure to have your ssh-keys in your keyring):

IP range domain name login method details
10.1.0.0/16 darknet ssh root@melkstal.nikhef.nl management systems and example site
10.1.1.0/16 frogstar ssh -p 2201 -A root@melkstal.nikhef.nl
10.1.2.0/16 traal ssh -p 2202 -A root@melkstal.nikhef.nl
10.1.3.0/16 krikkit ssh -p 2203 -A root@melkstal.nikhef.nl
10.1.4.0/16 megadodo ssh -p 2204 -A root@melkstal.nikhef.nl
10.1.5.0/16 magrathea ssh -p 2205 -A root@melkstal.nikhef.nl
10.1.6.0/16 vogsphere ssh -p 2206 -A root@melkstal.nikhef.nl

Each site runs a number of machines to represent what is typical for a Grid site:

machine name machine type metapackage
ui User interface emi-ui
wms Workload management system and site BDII (currently not functioning) emi-wms, emi-lb
ce CREAM Compute Element emi-cream-ce
headnode batch system head node (HTCondor) condor
wn Worker node emi-wn
jeep general purpose machine none

There is one management host to help install and configure all other machines: cobbler.darknet. This system runs cobbler to help systems install with DHCP, DNS and kickstart files. It also runs saltstack to manage state on each system.

Installing and re-installing machines

Installation of new machines is done on the XCP master node. Log in as

root@pool-bl0b.inst.ipmi.nikhef.nl

In the home directory you will find this script which will create basic machine definitions from a template. It will give the machines a new interface with a generated MAC address.

The next step is to take the list of machines and their mac addresses (using another script) to cobbler.darknet and to define the systems in cobbler. There is a script for that. For example:

echo 22:05:e5:52:19:cc wms.darknet | ./cobbler-add-machine.sh

Right now, the script only adds machines consecutively to the darknet site.

It is also possible (but more tedious) to add machines via cobbler's web interface.

Once the machines are defined in cobbler it is time to start them. This is again done on the XCP node with the command

xe vm-start vm=wms.darknet

This will install a basic system, using cobbler for DHCP and for downloading the kickstart file.

As part of the basic installation, the package salt-minion will be installed with cobbler.darknet as the master. Once the installation is done, accept the key on cobbler with

salt-key -a <hostname>

One the link between master and minion is established, run the high state on the minion to implement all the state modules defined for it.

cobbler# salt 'machine.domain' state.highstate

Be patient as this command can take a long time to complete.

re-installation

It may be necessary to start from scratch with a machine. This is easier than a new installation, as the definition already exists in XCP and cobbler. The only thing that needs to be reset is the bootloader. The script Media:Vm-reinstall1.sh does just that. It also reboots the machine, so it will go into installing right away.

After the re-installation it is still necessary to run the state.highstate command to make sure the machine is in proper working condition (and ssh public keys are installed, otherwise logging in becomes quite difficult!).

cobbler# salt 'machine.domain' state.highstate

The previous ssh and salt keys are preserved during the installation, so it should not be necessary to re-initialize. Should this fail, check the following:

  • Does the minion ping from the salt master?
cobbler# salt 'ce.vogsphere' test.ping
  • If so, everything is ok.
  • If not, check the logfile of the master to see if a different key was presented
/var/log/salt/master
  • Log on to the console of the VM and restart the minion:
pool-bl0b# xe console name-label=ce.vogsphere
  • Restart the minion. Check the minion logfile. If the key is rejected because it changed, the minion will halt again.
  • In that case, make sure to remove the minion key from cobbler
salt-key -d ce.vogsphere

More troubleshooting

A machine may not be entirely configured after running the High State. This could especially be true of the CE's which run CREAM. You may find that a consecutive run will not invoke YAIM again; to force its run just remove

/etc/siteinfo/site-info.def

So it will be reloaded and trigger the running of YAIM.

Especially since not all dependencies have been figured out between the state items, it may be necessary to run highstate a few times. This can also be done from the minion, by invoking

salt-call state.highstate

Darknet CA

The test sites will need certificates, so a local simple CA is set up on cobbler.darknet in /srv/ca.

New certificates can be generated with the gen-host-cert.sh script. This automatically places the cert and key in /srv/salt/host_keys.

Issuing CRLs can be done by calling

./make-crl /var/www/html/7140638d.r0

Saltstack setup

After systems are installed with Cobbler, saltstack takes over. The machine cobbler.darknet is the salt master. Test the connection to the minions by running a ping test.

salt '*' test.ping

All machines should report in. If not, log in to the machine and check the salt minion log

/var/log/salt/minion

The saltstack description of the system is kept in a collection of YAML files under

/srv
/srv/pillar
/srv/salt

The pillar is just static data. The salt tree contains references to various machine types and modules, als well as files that are copied over to the minions such as ssh keys.

git repository

The salt tree is maintained with git. Set up a remote: git clone ssh://root@cobbler.darknet/root/salt

and check out the 'useyaim' branch. Whenever this branch gets pushed, the repository under /srv will be automatically updated (via a post-receive hook).


Adding users

Adding administrators for one of the sites mentioned above is done by adding their ssh public keys to the corresponding file /srv/salt/ssh_keys/domain.pub and running the salt state command:

salt '*.<domain>' state.highstate

Adding ordinary users for a site is done by adding their details in the pillar, much like the examples already given in /srv/pillar/users/frogstar.sls:

users:
  dent:
    fullname: Arthur Dent
    shell: /bin/bash
    home: /home/dent
    uid: 604
    groups:
      - users

Take care to keep the uids unique. After adding the user, put the ssh public key in /srv/salt/ssh_keys/dent.pub. Again running

salt '*.frogstar' state.highstate

will update all frogstar machines to include the new user.

All these users will be present on all of the training sites.

list of current state files

Warning: this list is always outdated.

pillar/top.sls                                            + set environments (just 'base')
pillar/versions.sls                                       + selects tomcat5 or tomcat6, based on OS
pillar/vos.sls                                            + static list of VOs to support, including pool account settings and vomses
pillar/users/darknet.sls                                  + (admin) users in the darknet domain
pillar/users/frogstar.sls                                 + (admin) users in the frogstar domain
salt/top.sls                                              + top file, defines what's what
salt/condor/init.sls                                      + settings common to all types of condor machines
salt/condor/worker.sls                                    + settings for a condor worker node (WN)
salt/condor/submitter.sls                                 + settings for a condor submitter (CE)
salt/condor/manager.sls                                   + settings for the condor manager (LRMS)
salt/condor/config.local/manager                          + the literal configuration files
salt/condor/config.local/worker                           +
salt/condor/config.local/submitter                        +
salt/users/frogstar.sls                                   + adds the ssh keys for the users to root's authorized_keys
salt/users/init.sls                                       + create users according to the pillar
salt/users/poolaccounts.sls                               + create poolaccounts according to the vos pillar
salt/grid/bdii/init.sls                                   + setup of a (top level) BDII
salt/grid/yaim/init.sls                                   + downloads a templated version of site-info.def
salt/grid/yaim/siteinfo/users.conf                        + the siteinfo files
salt/grid/yaim/siteinfo/wn-list.conf                      +
salt/grid/yaim/siteinfo/vo.d                              +
salt/grid/yaim/siteinfo/vo.d/tutor                        +
salt/grid/yaim/siteinfo/vo.d/pvier                        +
salt/grid/yaim/siteinfo/vo.d/dteam                        +
salt/grid/yaim/siteinfo/groups.conf                       +
salt/grid/yaim/siteinfo/site-info.def                     +
salt/grid/repositories/emi3.sls                           + yum repositories for EMI-3 (for WMS)
salt/grid/repositories/init.sls                           + yum repositories UMD-3
salt/grid/wn/init.sls                                     + setup of WN
salt/grid/gridmapdir.sls                                  + populate the gridmapdir according to pillar data
salt/grid/trustanchors.sls                                + yum repositories of certificate repositories, plus extra CAs
salt/grid/ui/init.sls                                     + setup of UI
salt/grid/ui/vo-glite_wms.conf                            + templated files for the UI
salt/grid/ui/vo-glite_wmsui.conf                          +
salt/grid/ui/glite_wmsui_cmd_var.conf                     +
salt/grid/cream/init.sls                                  + CREAM setup (with condor batch system)
salt/grid/cream/blah.config                               + templated version of blah.config
salt/grid/hostcert/init.sls                               + install host certificates
salt/grid/wms/init.sls                                    + configure a WMS
salt/network-gateway/init.sls                             + just for melkstal
salt/sysctl.sls                                           + adds sysctl settings to improve ZMQ performance
salt/ca                                                   + CA files for the DARKNET CA and the e-infra tutor CA
salt/timezone.sls                                         + Sets the timezone to whereever.