VO-specific software and modules

From PDP/Grid Wiki
Revision as of 13:26, 7 September 2011 by Janjust@nikhef.nl (talk | contribs) (→‎Deploying the software)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

VOs sometimes have a need to deploy their own software at a particular site. For this the Software Group Manager VOMS role is available. Not all VOs have this role and not all VOs with this role are supported at each site. If you are a Software Group Manager for a particular VO then the VOMS role will have been assigned to you and you can generate a SGM-specific proxy.

For these Software Group Managers this HOWTO will explain how you can deploy your own software and add your own module to the available modules for the modules command.

VOMS Role

The HEP VOs have an SGM role, usually of the form

 /Role=lcgadmin

However, in this HOWTO the VO vlemed was chosen as the example. This VO has a role

/Role=sgm

available, which gives users who possess that role the right to install software in the VO specific software area. To generate an SGM-proxy use

$ voms-proxy-init --voms vlemed:/vlemed/Role=sgm

You can view your current VOMS roles using

$ voms-proxy-info -all | sed 's/^/ /'
subject   : /O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser/CN=proxy
issuer    : /O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser
identity  : /O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser
type      : proxy
strength  : 1024 bits
path      : /tmp/x509up_u7651
timeleft  : 11:59:33
=== VO vlemed extension information ===
VO        : vlemed
subject   : /O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser
issuer    : /O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl
attribute : /vlemed/Role=sgm/Capability=NULL
attribute : /vlemed/Role=NULL/Capability=NULL
timeleft  : 11:59:33
uri       : voms.grid.sara.nl:30003

VO-specific software area

The VO-specific software area is denoted using the environment variable

VO_<VO>_SW_DIR

e.g. for vlemed it is VO_VLEMED_SW_DIR.

At Nikhef the $VO_VLEMED_SW_DIR has the following permissions:

$ cd $VO_VLEMED_SW_DIR
$ ls -ald .
drwxrwsr-t 4 root vlemedsm 4096 Dec 11 16:17 .

This means the directory is writable only for members of the Unix group vlemedsm. When submitting a job to Nikhef using the SGM-proxy a special pool-account is chosen:

$ id vlemsm00
uid=70960(vlemsm00) gid=2058(vlemedsm) groups=2024(vlemed),2058(vlemedsm)

So the SGM-proxy causes a mapping to a pool-account with access rights to install software in $VO_VLEMED_SW_DIR.

Just to verify: a "regular" vlemed poolaccount does not have these permissions:

$ id vlemed00
uid=53200(vlemed00) gid=2024(vlemed) groups=2024(vlemed)

Note Checking this before proceeding is a very good practice, as sites misconfigurations at this level occur quite frequently.

Building the software

Before deploying new software we need to build and package it first on a local system:

  • Notes on building
    • The target platform is a RHEL5 64bit compatible system. The easiest approach is to build the software on such a system (e.g. ui.grid.sara.nl)
    • It is not possible to hardcode paths into the software: the VO_<VO>_SW_DIR points to different directories on different clusters. Try to ensure that your software is relocatable using environment variables. Most software allows for this.
  • Notes on packaging
    • In this HOWTO all software is packaged as a single tarball
    • The installation job then only has to download and extract the tarball in the right location.

For this HOWTO guide we will package a non-existing version 5.0 of the fsl package. The following directory structure has been set up for this package:

<local-dir>/fsl-5.0
<local-dir>/fsl-5.0/bin
<local-dir>/fsl-5.0/bin/fsl

where fsl-5.0/bin/fsl is a dummy script.

Packaging this software is very easy

$ cd <local-dir>
$ tar czvf ~/my-fsl-5.0.tar.gz fsl-5.0

We then upload this tarball to a public webserver (or gridftp server):

$ scp ~/my-fsl-5.0.tar.gz ~/public_html

Deploying the software

To deploy the software we use this JDL file:

Executable = "deploy.sh";
Stdoutput = "stdout";
StdError = "stderr";
InputSandbox = {"deploy.sh"};
OutputSandbox = {"stdout","stderr"};

with this deploy.sh script:

#!/bin/bash
# Set a sane umask, just in case
umask 0022

cd $VO_VLEMED_SW_DIR
wget http://www.nikhef.nl/~janjust/my-fsl-5.0.tar.gz || exit 1
tar xzf my-fsl-5.0.tar.gz || exit 2
# make sure the permissions are right
chmod -R u+rw,g+rw,o+r-w fsl-5.0
# List the directory afterwards for inspection
ls -l

Run this job with an SGM-proxy like any other job at the cluster where you want to install it, e.g.

$ glite-wms-job-submit -d janjust.sgm -r gazon.nikhef.nl:2119/jobmanager-pbs-short deploy.jdl

Wait for completion and check the stdout and stderr files for any errors.

Adding a module

All VO-specific modules need to be installed in

VO_<VO>_SW_DIR/modules

Only modulefiles installed in this directory will be automagically picked up by the worker node login scripts.

Here is the listing of a sample modulefile

#%Module1.0#####################################################################
##
## fsl 5.0 modulefile
##

proc ModulesHelp { } {
        global fslversion

        puts stderr "\tSet up the environment for FSL"
        puts stderr "\n\tVersion $fslversion\n"
}

module-whatis   "sets FSL environment"

# for Tcl script use only
set     fslversion      5.0

set fsldir      "$env(VO_VLEMED_SW_DIR)/fsl-5.0"

prepend-path    PATH    "$fsldir/bin"
setenv          FSLDIR  "$fsldir"

which adds support for a (non-existing) version 5.0 of the package fsl.

Note especially the

$env(VO_VLEMED_SW_DIR)

in this modulefile: all software needs be installed (and thus, relocatable!) relative to this directory. The $env(...) command is the modules (TCL) method to import an environment variable.

It is by far the easiest to develop and test new modulefiles on a local system. After making sure that the modulefile works you can then include it in the software tarball or create a new tarball, e.g. mypackage-X.Y-module.tar.gz.

For this HOWTO we will do the latter:

$ cd <local-dir>
$ ls 
fsl-5.0
$ mkdir -p modules/fsl
$ cp .../my-new-module-file modules/fsl/5.0

Before packaging it we test it first

$ export MODULEPATH=$PWD/modules
$ module avail
$ module load fsl/5.0

When finished we package it:

$ tar czf ~/my-fsl-5.0-module.tar.gz modules
$ cp ~/my-fsl-5.0-module.tar.gz ~/public_html

And we deploy it in exactly the same manner as the actual software (see above).

Using the module

In order to use our shiny new module we launch a normal job. Here is a listing of a sample job which checks the MODULEPATH parameter and a few other modules-related things:

#!/bin/bash -l

id
ls -l $VO_VLEMED_SW_DIR
echo "MODULEPATH=$MODULEPATH"
echo "## Listing available modules:"
module avail 2>&1
echo "## Loading fsl"
module load fsl 2>&1
echo "## Which modules are now loaded:"
module list 2>&1

module unload fsl 2>&1
echo "## Loading MY fsl module"
module load fsl/5.0 2>&1
echo "## Which modules are now loaded:"
module list 2>&1

echo "## which fsl:"
which fsl

The output of this job when run at the Nikhef cluster is:

uid=53207(vlemed07) gid=2024(vlemed) groups=2024(vlemed)
total 16
drwxr-sr-x 3 vlemsm05 vlemedsm 4096 Dec 11 16:17 fsl-5.0
drwxr-sr-x 4 vlemsm05 vlemedsm 4096 Dec 11 16:22 modules
MODULEPATH=/opt/vl-e/modules/Modules/versions:/opt/vl-e/modules/Modules/$MODULE_VERSION/modulefiles:
           /etc/opt/vl-e/modulefiles::/data/esia/vlemed/modules
## Listing available modules:
---------------------- /opt/vl-e/modules/Modules/versions ----------------------
3.2.6

----------------- /opt/vl-e/modules/Modules/3.2.6/modulefiles ------------------
dot         module-cvs  module-info modules     null        use.own

-------------------------- /etc/opt/vl-e/modulefiles ---------------------------
fsl/4.0              javagat/1.7.1        pl/5.6.64
fsl/4.0.4            lam/7.1              r/2.6
fsl/4.1              lam/7.1.4            r/2.6.2
fsl/4.1.4            mcr/7.11             r/2.9
gat/1.8              mesa3d/6.4           r/2.9.2
gat/1.8.2            mesa3d/6.4.2         rmpi/0.5
graphviz/2.18        mpitb/2.1            srb/3.4
gt/4.0               mpitb/2.1.73         srb/3.4.2
gt/4.0.8             mricro/1.39          vlet/1.0
ibis/1.4             mricro/1.39.3        vlet/1.0.2
itk/3.14             octave/2.1           vtk/4.4
itk/3.14.0           octave/2.1.73        vtk/4.4.2
itk/3.4              openmpi/1.3.2        vtk/5.4
itk/3.4.0            openrdf-sesame/2.0   vtk/5.4.0
java/1.6             openrdf-sesame/2.0.1 weka/3.4
javagat/1.7          pl/5.6               weka/3.4.12

-------------------------- /data/esia/vlemed/modules ---------------------------
fsl/5.0       mypackage/5.0
## Loading fsl
## Which modules are now loaded:
Currently Loaded Modulefiles:
  1) fsl/4.1.4
## Loading MY fsl module
## Which modules are now loaded:
Currently Loaded Modulefiles:
  1) fsl/5.0
## which fsl:
/data/esia/vlemed/fsl-5.0/bin/fsl

Notes

  • when adding a new version (e.g. 5.0) of an existing system-wide package (e.g. fsl) this version does not become the default, as can be seen in the script output. To use the new version you have to explicitly specify the version number.
  • Completely new packages are picked up automatically:
$ module load mypackage
$ module list
Currently Loaded Modulefiles:
  1) fsl/4.1.4       2) mypackage/5.0