LCMAPS Tracking GroupID plugin
Tracking Group IDs are added to batch jobs to be able to track them regardless if they escape the process tree.
Batch systems that use this feature are:
- Sun Grid Engine (SGE, now known as the Oracle Grid Engine)
- Condor-C batch system
Other batch systems are known to have the feature, but it doesn't seem to be used in (known) Grid deployments:
What's a Tracking Group ID?
A tracking Group ID is a group ID issued by the batch system and attached to the first process that is the batch job of a user. When a process spwans a child process, then the set of secondary GroupIDs is also inherited by the process image copy of the fork() system call. All stray processes can be rounded up by the secondary Group ID that each of the user's process own.
Process tree example
Here is an example process tree on a PBS/Torque based cluster Worker Node. For illustration purposes all non-relative processes are removed from the tree:
init-+ ├─pbs_mom │ ├─bash │ │ └─1337.stro.n /var/spool/pbs/mom_priv/jobs/1337.stro.nikhef.nl.SC │ │ └─jobwrapper /opt/lcg/libexec/jobwrapper ./CREAM31337_jobWrapper.sh │ │ └─CREAM31337_ -l ./CREAM31337_jobWrapper.sh │ │ └─glexec /bin/bash payload.sh │ │ ├─payload.sh
The problem is that gLExec ignores the tracking Group ID issued by the batch system. gLExec will use LCMAPS to decide to which Unix account the payload needs to be mapped. The Tracking Group ID plug-in will make LCMAPS aware of the tracking Group IDs and preserve them when LCMAPS is building the Unix account mapping resolution.
The distribution is going to be officially through the EMI-1 release.
Temporary Distribution (April 9th, 2011)
Binary from Etics (from Volatile repository!) in tarball and RPM form: http://etics-repository.cern.ch/repository/download/volatile/salle/emi/emi.sac.lcmaps-plugins-tracking-groupid/0.0.1/sl5_x86_64_gcc412EPEL
At the moment I recommend the binary tarball from Etics, as the RPM is build with the FHS compliant "/usr" prefix, but it doesn't seem to be relocatable with rpm -i --prefix [...].
A man page and documentation is packaged with it.
The LCMAPS configuration file for my gLExec was changed to add this snippet in the top part:
trackinggid = "lcmaps_tracking_groupid.mod" "--tracking-groupid-min 1000" "--tracking-groupid-max 2000"
and at the bottom the flow of plugins to execute was altered to the following:
pluginexecpolicy: verify_proxy -> good good -> trackinggid trackinggid -> posix_enf
As you can see, I've placed the trackinggid before the posix_enf plugin and after the good plugin. Replace the good plugin with the regular set of sequences of plugin to make it work. If you have multiple of these blocks, place the 'trackinggid' right before each instance of the posix_enf.
The gLExec version 0.8.1-1 checks if all newly mapped UserIDs, (primary) GroupIDs and all secondary GroupIDs are resolvable to usernames and groupnames as a safety measure. You will not be hindered by this detail, but do note that other sites might experience problems. We're still discussion our options, but I take on it is that the UID and primary GID can perfectly be checked, but the secondary GID check and failure is a false-positive and should be regarded as a pedantic check that hinders Tracking GID deployments.
I might have made some typos and style mistakes in the documentation and man pages. These will be fixed in a next release.