Difference between revisions of "LCMAPS Tracking GroupID plugin"
| (11 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| − | Tracking Group IDs are added to batch jobs to be able to track them regardless if they escape the process tree. | + | The LCMAPS Tracking Group ID plugin preserves the Batch System issued Tracking GroupIDs during a gLExec execution in a Multi User Pilot Job. Tracking Group IDs are added to batch jobs to be able to track them regardless if they escape the process tree.   | 
| Batch systems that use this feature are: | Batch systems that use this feature are: | ||
| Line 10: | Line 10: | ||
| === What's a Tracking Group ID? === | === What's a Tracking Group ID? === | ||
| − | A tracking Group ID is a group ID issued by the batch system and attached to the first process that is the batch job of a user. When a process  | + | A tracking Group ID is a group ID issued by the batch system and attached to the first process that is the batch job of a user. When a process spawns a child process, then the set of secondary GroupIDs is also inherited by the process image copy of the fork() system call. All stray processes can be rounded up by the secondary Group ID that each of the user's process own. | 
| === Process tree example === | === Process tree example === | ||
| Here is an example process tree on a PBS/Torque based cluster Worker Node. For illustration purposes all non-relative processes are removed from the tree: | Here is an example process tree on a PBS/Torque based cluster Worker Node. For illustration purposes all non-relative processes are removed from the tree: | ||
| − |   init- | + |   init- | 
| − | + |      \_ pbs_mom | |
| − | + |      |    \_ bash | |
| − | + |      |    |    \_ 1337.stro.n /var/spool/pbs/mom_priv/jobs/1337.stro.nikhef.nl.SC | |
| − | + |      |    |        \_ jobwrapper /opt/lcg/libexec/jobwrapper ./CREAM31337_jobWrapper.sh | |
| − | + |      |    |            \_ CREAM31337_ -l ./CREAM31337_jobWrapper.sh | |
| − | + |      |    |                \_ glexec /bin/bash payload.sh | |
| − | + |      |    |                    \_ payload.sh | |
| === gLExec's involvement === | === gLExec's involvement === | ||
| − | + | The problem is that [[gLExec]] ignores the tracking Group ID issued by the batch system. [[gLExec]] will use [[LCMAPS]] to decide to which Unix account the payload needs to be mapped. The Tracking Group ID plug-in will make [[LCMAPS]] aware of the tracking Group IDs and preserve them when [[LCMAPS]] is building the Unix account mapping resolution. | |
| + | |||
| + | === Distribution === | ||
| + | The distribution is officially distributed via EMI. | ||
| + | |||
| + | A man page and documentation is packaged with it. | ||
| + | |||
| + | === LCMAPS configuration === | ||
| + | The LCMAPS configuration file for my gLExec was changed to add this snippet in the top part: | ||
| + |  trackinggid = "lcmaps_tracking_groupid.mod" | ||
| + |                "--tracking-groupid-min 1000" | ||
| + |                "--tracking-groupid-max 2000" | ||
| + | |||
| + | and at the bottom the flow of plugins to execute was altered to the following: | ||
| + | |||
| + |  pluginexecpolicy: | ||
| + |  verify_proxy -> good | ||
| + |  good -> trackinggid | ||
| + |  trackinggid -> posix_enf | ||
| + | |||
| + | As you can see, I've placed the trackinggid before the posix_enf plugin and after the good plugin. Replace the good plugin with the regular set of sequences of plugin to make it work. If you have multiple of these blocks, place the 'trackinggid' right before each instance of the posix_enf. | ||
| + | |||
| + | === CAVEAT 1 === | ||
| + | The [[gLExec]] version 0.8.1-1 checks if all newly mapped UserIDs, (primary) GroupIDs and all secondary GroupIDs are resolvable to usernames and groupnames as a safety measure. You will not be hindered by this detail, but do note that other sites might experience problems. We're still discussion our options, but I take on it is that the UID and primary GID can perfectly be checked, but the secondary GID check and failure is a false-positive and should be regarded as a pedantic check that hinders Tracking GID deployments. | ||
| + | |||
| + | GLExec version '''0.8.10''' solves this problem by relaxing the check on the mapped (target) secondary Group IDs. | ||
| + | |||
| + | === CAVEAT 2 === | ||
| + | I might have made some typos and style mistakes in the documentation and man pages. These will be fixed in a next release. | ||
| + | |||
| + | === CAVEAT 3 === | ||
| + | We have not tested this on an AFS site. We know that AFS issues secondary GIDs itself. We have not tested nor assessed (yet) what the potential security implications could be of preserving the AFS issued secondary GIDs with respect to accessing the AFS token from the MUPJ's payload process. | ||
Latest revision as of 16:04, 24 May 2012
The LCMAPS Tracking Group ID plugin preserves the Batch System issued Tracking GroupIDs during a gLExec execution in a Multi User Pilot Job. Tracking Group IDs are added to batch jobs to be able to track them regardless if they escape the process tree.
Batch systems that use this feature are:
- Sun Grid Engine (SGE, now known as the Oracle Grid Engine)
- Condor-C batch system
Other batch systems are known to have the feature, but it doesn't seem to be used in (known) Grid deployments:
- LSF
- Torque/PBS
What's a Tracking Group ID?
A tracking Group ID is a group ID issued by the batch system and attached to the first process that is the batch job of a user. When a process spawns a child process, then the set of secondary GroupIDs is also inherited by the process image copy of the fork() system call. All stray processes can be rounded up by the secondary Group ID that each of the user's process own.
Process tree example
Here is an example process tree on a PBS/Torque based cluster Worker Node. For illustration purposes all non-relative processes are removed from the tree:
init-
    \_ pbs_mom
    |    \_ bash
    |    |    \_ 1337.stro.n /var/spool/pbs/mom_priv/jobs/1337.stro.nikhef.nl.SC
    |    |        \_ jobwrapper /opt/lcg/libexec/jobwrapper ./CREAM31337_jobWrapper.sh
    |    |            \_ CREAM31337_ -l ./CREAM31337_jobWrapper.sh
    |    |                \_ glexec /bin/bash payload.sh
    |    |                    \_ payload.sh
gLExec's involvement
The problem is that gLExec ignores the tracking Group ID issued by the batch system. gLExec will use LCMAPS to decide to which Unix account the payload needs to be mapped. The Tracking Group ID plug-in will make LCMAPS aware of the tracking Group IDs and preserve them when LCMAPS is building the Unix account mapping resolution.
Distribution
The distribution is officially distributed via EMI.
A man page and documentation is packaged with it.
LCMAPS configuration
The LCMAPS configuration file for my gLExec was changed to add this snippet in the top part:
trackinggid = "lcmaps_tracking_groupid.mod"
              "--tracking-groupid-min 1000"
              "--tracking-groupid-max 2000"
and at the bottom the flow of plugins to execute was altered to the following:
pluginexecpolicy: verify_proxy -> good good -> trackinggid trackinggid -> posix_enf
As you can see, I've placed the trackinggid before the posix_enf plugin and after the good plugin. Replace the good plugin with the regular set of sequences of plugin to make it work. If you have multiple of these blocks, place the 'trackinggid' right before each instance of the posix_enf.
CAVEAT 1
The gLExec version 0.8.1-1 checks if all newly mapped UserIDs, (primary) GroupIDs and all secondary GroupIDs are resolvable to usernames and groupnames as a safety measure. You will not be hindered by this detail, but do note that other sites might experience problems. We're still discussion our options, but I take on it is that the UID and primary GID can perfectly be checked, but the secondary GID check and failure is a false-positive and should be regarded as a pedantic check that hinders Tracking GID deployments.
GLExec version 0.8.10 solves this problem by relaxing the check on the mapped (target) secondary Group IDs.
CAVEAT 2
I might have made some typos and style mistakes in the documentation and man pages. These will be fixed in a next release.
CAVEAT 3
We have not tested this on an AFS site. We know that AFS issues secondary GIDs itself. We have not tested nor assessed (yet) what the potential security implications could be of preserving the AFS issued secondary GIDs with respect to accessing the AFS token from the MUPJ's payload process.
