VO-specific software and modules
Introduction
VOs sometimes have a need to deploy their own software at a particular site. For this the Software Group Manager VOMS role is available. Not all VOs have this role and not all VOs with this role are supported at each site. If you are a Software Group Manager for a particular VO then the VOMS role will have been assigned to you and you can generate a SGM-specific proxy.
For these Software Group Managers this HOWTO will explain how you can deploy your own software and add your own module to the available modules for the modules command.
VOMS Role
The HEP VOs have an SGM role, usually of the form
/Role=lcgadmin
However, in this HOWTO the VO vlemed was chosen as the example. This VO has a role
/Role=sgm
available, which gives users who possess that role the right to install software in the VO specific software area. To generate an SGM-proxy use
$ voms-proxy-init --voms vlemed:/vlemed/Role=sgm
You can view your current VOMS roles using
$ voms-proxy-info -all | sed 's/^/ /' subject : /O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser/CN=proxy issuer : /O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser identity : /O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser type : proxy strength : 1024 bits path : /tmp/x509up_u7651 timeleft : 11:59:33 === VO vlemed extension information === VO : vlemed subject : /O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser issuer : /O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl attribute : /vlemed/Role=sgm/Capability=NULL attribute : /vlemed/Role=NULL/Capability=NULL timeleft : 11:59:33 uri : voms.grid.sara.nl:30003
VO-specific software area
The VO-specific software area is denoted using the environment variable
VO_<VO>_SW_DIR
e.g. for vlemed it is VO_VLEMED_SW_DIR.
At Nikhef the $VO_VLEMED_SW_DIR has the following permissions:
$ cd $VO_VLEMED_SW_DIR $ ls -ald . drwxrwsr-t 4 root vlemedsm 4096 Dec 11 16:17 .
This means the directory is writable only for members of the Unix group vlemedsm. When submitting a job to Nikhef using the SGM-proxy a special pool-account is chosen:
$ id vlemsm00 uid=70960(vlemsm00) gid=2058(vlemedsm) groups=2024(vlemed),2058(vlemedsm)
So the SGM-proxy causes a mapping to a pool-account with access rights to install software in $VO_VLEMED_SW_DIR.
Just to verify: a "regular" vlemed poolaccount does not have these permissions:
$ id vlemed00 uid=53200(vlemed00) gid=2024(vlemed) groups=2024(vlemed)
Note Checking this before proceeding is a very good practice, as sites misconfigurations at this level occur quite frequently.
Building the software
Before deploying new software we need to build and package it first on a local system:
- Notes on building
- The target platform is a RHEL5 64bit compatible system. The easiest approach is to build the software on such a system (e.g. ui.grid.sara.nl)
- It is not possible to hardcode paths into the software: the VO_<VO>_SW_DIR points to different directories on different clusters. Try to ensure that your software is relocatable using environment variables. Most software allows for this.
- Notes on packaging
- In this HOWTO all software is packaged as a single tarball
- The installation job then only has to download and extract the tarball in the right location.
For this HOWTO guide we will package a non-existing version 5.0 of the fsl package. The following directory structure has been set up for this package:
<local-dir>/fsl-5.0 <local-dir>/fsl-5.0/bin <local-dir>/fsl-5.0/bin/fsl
where fsl-5.0/bin/fsl is a dummy script.
Packaging this software is very easy
$ cd <local-dir> $ tar czvf ~/my-fsl-5.0.tar.gz fsl-5.0
We then upload this tarball to a public webserver (or gridftp server):
$ scp ~/my-fsl-5.0.tar.gz ~/public_html
Deploying the software
To deploy the software we use this JDL file:
Executable = "deploy.sh"; Stdoutput = "stdout"; StdError = "stderr"; InputSandbox = {"deploy.sh"}; OutputSandbox = {"stdout","stderr"};
with this deploy.sh script:
#!/bin/bash # Set a sane umask, just in case umask 0022 cd $VO_VLEMED_SW_DIR wget http://www.nikhef.nl/~janjust/my-fsl-5.0.tar.gz || exit 1 tar xzf my-fsl-5.0.tar.gz || exit 2 # List the directory afterwards for inspection ls -l
Run this job with an SGM-proxy like any other job at the cluster where you want to install it, e.g.
$ glite-wms-job-submit -d janjust.sgm -r gazon.nikhef.nl:2119/jobmanager-pbs-short deploy.jdl
Wait for completion and check the stdout and stderr files for any errors.
Adding a module
All VO-specific modules need to be installed in
VO_<VO>_SW_DIR/modules
Only modulefiles installed in this directory will be automagically picked up by the worker node login scripts.
Here is the listing of a sample modulefile
#%Module1.0##################################################################### ## ## fsl 5.0 modulefile ## proc ModulesHelp { } { global fslversion puts stderr "\tSet up the environment for FSL" puts stderr "\n\tVersion $fslversion\n" } module-whatis "sets FSL environment" # for Tcl script use only set fslversion 5.0 set fsldir "$env(VO_VLEMED_SW_DIR)/fsl-5.0" prepend-path PATH "$fsldir/bin" setenv FSLDIR "$fsldir"
which adds support for a (non-existing) version 5.0 of the package fsl.
Note especially the
$env(VO_VLEMED_SW_DIR)
in this modulefile: all software needs be installed (and thus, relocatable!) relative to this directory. The $env(...) command is the modules (TCL) method to import an environment variable.
It is by far the easiest to develop and test new modulefiles on a local system. After making sure that the modulefile works you can then include it in the software tarball or create a new tarball, e.g. mypackage-X.Y-module.tar.gz.
For this HOWTO we will do the latter:
$ cd <local-dir> $ ls fsl-5.0 $ mkdir -p modules/fsl $ cp .../my-new-module-file modules/fsl/5.0
Before packaging it we test it first
$ export MODULEPATH=$PWD/modules $ module avail $ module load fsl/5.0
When finished we package it:
$ tar czf ~/my-fsl-5.0-module.tar.gz modules $ cp ~/my-fsl-5.0-module.tar.gz ~/public_html
And we deploy it in exactly the same manner as the actual software (see above).
Use the module
In order to use our shiny new module we launch a normal job. Here is a listing of a sample job which checks the MODULEPATH parameter and a few other modules-related things:
#!/bin/bash -l id ls -l $VO_VLEMED_SW_DIR echo "MODULEPATH=$MODULEPATH" echo "## Listing available modules:" module avail 2>&1 echo "## Loading fsl" module load fsl 2>&1 echo "## Which modules are now loaded:" module list 2>&1 module unload fsl 2>&1 echo "## Loading MY fsl module" module load fsl/5.0 2>&1 echo "## Which modules are now loaded:" module list 2>&1 echo "## which fsl:" which fsl
The output of this job when run at the Nikhef cluster is:
uid=53207(vlemed07) gid=2024(vlemed) groups=2024(vlemed) total 16 drwxr-sr-x 3 vlemsm05 vlemedsm 4096 Dec 11 16:17 fsl-5.0 drwxr-sr-x 4 vlemsm05 vlemedsm 4096 Dec 11 16:22 modules MODULEPATH=/opt/vl-e/modules/Modules/versions:/opt/vl-e/modules/Modules/$MODULE_VERSION/modulefiles: /etc/opt/vl-e/modulefiles::/data/esia/vlemed/modules ## Listing available modules: ---------------------- /opt/vl-e/modules/Modules/versions ---------------------- 3.2.6
----------------- /opt/vl-e/modules/Modules/3.2.6/modulefiles ------------------ dot module-cvs module-info modules null use.own
-------------------------- /etc/opt/vl-e/modulefiles --------------------------- fsl/4.0 javagat/1.7.1 pl/5.6.64 fsl/4.0.4 lam/7.1 r/2.6 fsl/4.1 lam/7.1.4 r/2.6.2 fsl/4.1.4 mcr/7.11 r/2.9 gat/1.8 mesa3d/6.4 r/2.9.2 gat/1.8.2 mesa3d/6.4.2 rmpi/0.5 graphviz/2.18 mpitb/2.1 srb/3.4 gt/4.0 mpitb/2.1.73 srb/3.4.2 gt/4.0.8 mricro/1.39 vlet/1.0 ibis/1.4 mricro/1.39.3 vlet/1.0.2 itk/3.14 octave/2.1 vtk/4.4 itk/3.14.0 octave/2.1.73 vtk/4.4.2 itk/3.4 openmpi/1.3.2 vtk/5.4 itk/3.4.0 openrdf-sesame/2.0 vtk/5.4.0 java/1.6 openrdf-sesame/2.0.1 weka/3.4 javagat/1.7 pl/5.6 weka/3.4.12
-------------------------- /data/esia/vlemed/modules --------------------------- fsl/5.0 mypackage/5.0 ## Loading fsl ## Which modules are now loaded: Currently Loaded Modulefiles: 1) fsl/4.1.4 ## Loading MY fsl module ## Which modules are now loaded: Currently Loaded Modulefiles: 1) fsl/5.0 ## which fsl: /data/esia/vlemed/fsl-5.0/bin/fsl
Notes
- when adding a new version (e.g. 5.0) of an existing system-wide package (e.g. fsl) this version does not become the default, as can be seen in the script output. To use the new version you have to explicitly specify the version number.
- Completely new packages are picked up automatically:
$ module load mypackage $ module list Currently Loaded Modulefiles: 1) fsl/4.1.4 2) mypackage/5.0
--
- launch normal job
- check MODULEPATH
$ echo $MODULEPATH [SNIP]:/etc/opt/vl-e/modulefiles::/data/esia/vlemed/modules
- check available modules
$ module avail [SNIP] ------------------------------------ /data/esia/vlemed/modules ------------------------------------- fsl/5.0 mypackage/5.0
- use it:
$ module load fsl
show that it does not pick up the vlemed version:
$ module list Currently Loaded Modulefiles: 1) fsl/4.1.4
use
module load fsl/5.0
for that
- new packages are picked up automatically:
$ module load mypackage $ module list Currently Loaded Modulefiles: 1) fsl/4.1.4 2) mypackage/5.0