Difference between revisions of "LCMAPS Tracking GroupID plugin"

From PDP/Grid Wiki
Jump to navigationJump to search
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Tracking Group IDs are added to batch jobs to be able to track them regardless if they escape the process tree.
+
The LCMAPS Tracking Group ID plugin preserves the Batch System issued Tracking GroupIDs during a gLExec execution in a Multi User Pilot Job. Tracking Group IDs are added to batch jobs to be able to track them regardless if they escape the process tree.  
  
 
Batch systems that use this feature are:
 
Batch systems that use this feature are:
Line 9: Line 9:
 
* Torque/PBS
 
* Torque/PBS
  
=== Why do we need this plugin? ===
+
=== What's a Tracking Group ID? ===
Processes are nested. They are always spawned from another process. The root is the init process.  
+
A tracking Group ID is a group ID issued by the batch system and attached to the first process that is the batch job of a user. When a process spawns a child process, then the set of secondary GroupIDs is also inherited by the process image copy of the fork() system call. All stray processes can be rounded up by the secondary Group ID that each of the user's process own.
  
Here is an example process tree:
+
=== Process tree example ===
 +
Here is an example process tree on a PBS/Torque based cluster Worker Node. For illustration purposes all non-relative processes are removed from the tree:
 +
init-
 +
    \_ pbs_mom
 +
    |    \_ bash
 +
    |    |    \_ 1337.stro.n /var/spool/pbs/mom_priv/jobs/1337.stro.nikhef.nl.SC
 +
    |    |        \_ jobwrapper /opt/lcg/libexec/jobwrapper ./CREAM31337_jobWrapper.sh
 +
    |    |            \_ CREAM31337_ -l ./CREAM31337_jobWrapper.sh
 +
    |    |                \_ glexec /bin/bash payload.sh
 +
    |    |                    \_ payload.sh
  
  init-+-crond
+
=== gLExec's involvement ===
        |-dbus-daemon
+
The problem is that [[gLExec]] ignores the tracking Group ID issued by the batch system. [[gLExec]] will use [[LCMAPS]] to decide to which Unix account the payload needs to be mapped. The Tracking Group ID plug-in will make [[LCMAPS]] aware of the tracking Group IDs and preserve them when [[LCMAPS]] is building the Unix account mapping resolution.
        |-dhclient
+
 
        |-events/0
+
=== Distribution ===
      ├─pbs_mom
+
The distribution is officially distributed via EMI.
      │  ├─bash
+
 
      │  │  └─1337.stro.n /var/spool/pbs/mom_priv/jobs/1337.stro.nikhef.nl.SC
+
A man page and documentation is packaged with it.
      │  │      └─jobwrapper /opt/lcg/libexec/jobwrapper ./CREAM31337_jobWrapper.sh
+
 
      │  │          └─CREAM31337_ -l ./CREAM31337_jobWrapper.sh
+
=== LCMAPS configuration ===
      │  │              └─perl -e...
+
The LCMAPS configuration file for my gLExec was changed to add this snippet in the top part:
      │  │                  ├─perl -e...
+
  trackinggid = "lcmaps_tracking_groupid.mod"
 +
              "--tracking-groupid-min 1000"
 +
              "--tracking-groupid-max 2000"
 +
 
 +
and at the bottom the flow of plugins to execute was altered to the following:
 +
 
 +
pluginexecpolicy:
 +
verify_proxy -> good
 +
good -> trackinggid
 +
trackinggid -> posix_enf
 +
 
 +
As you can see, I've placed the trackinggid before the posix_enf plugin and after the good plugin. Replace the good plugin with the regular set of sequences of plugin to make it work. If you have multiple of these blocks, place the 'trackinggid' right before each instance of the posix_enf.
 +
 
 +
=== CAVEAT 1 ===
 +
The [[gLExec]] version 0.8.1-1 checks if all newly mapped UserIDs, (primary) GroupIDs and all secondary GroupIDs are resolvable to usernames and groupnames as a safety measure. You will not be hindered by this detail, but do note that other sites might experience problems. We're still discussion our options, but I take on it is that the UID and primary GID can perfectly be checked, but the secondary GID check and failure is a false-positive and should be regarded as a pedantic check that hinders Tracking GID deployments.
 +
 
 +
GLExec version '''0.8.10''' solves this problem by relaxing the check on the mapped (target) secondary Group IDs.
 +
 
 +
=== CAVEAT 2 ===
 +
I might have made some typos and style mistakes in the documentation and man pages. These will be fixed in a next release.
 +
 
 +
=== CAVEAT 3 ===
 +
We have not tested this on an AFS site. We know that AFS issues secondary GIDs itself. We have not tested nor assessed (yet) what the potential security implications could be of preserving the AFS issued secondary GIDs with respect to accessing the AFS token from the MUPJ's payload process.

Latest revision as of 17:04, 24 May 2012

The LCMAPS Tracking Group ID plugin preserves the Batch System issued Tracking GroupIDs during a gLExec execution in a Multi User Pilot Job. Tracking Group IDs are added to batch jobs to be able to track them regardless if they escape the process tree.

Batch systems that use this feature are:

  • Sun Grid Engine (SGE, now known as the Oracle Grid Engine)
  • Condor-C batch system

Other batch systems are known to have the feature, but it doesn't seem to be used in (known) Grid deployments:

  • LSF
  • Torque/PBS

What's a Tracking Group ID?

A tracking Group ID is a group ID issued by the batch system and attached to the first process that is the batch job of a user. When a process spawns a child process, then the set of secondary GroupIDs is also inherited by the process image copy of the fork() system call. All stray processes can be rounded up by the secondary Group ID that each of the user's process own.

Process tree example

Here is an example process tree on a PBS/Torque based cluster Worker Node. For illustration purposes all non-relative processes are removed from the tree:

init-
    \_ pbs_mom
    |    \_ bash
    |    |    \_ 1337.stro.n /var/spool/pbs/mom_priv/jobs/1337.stro.nikhef.nl.SC
    |    |        \_ jobwrapper /opt/lcg/libexec/jobwrapper ./CREAM31337_jobWrapper.sh
    |    |            \_ CREAM31337_ -l ./CREAM31337_jobWrapper.sh
    |    |                \_ glexec /bin/bash payload.sh
    |    |                    \_ payload.sh

gLExec's involvement

The problem is that gLExec ignores the tracking Group ID issued by the batch system. gLExec will use LCMAPS to decide to which Unix account the payload needs to be mapped. The Tracking Group ID plug-in will make LCMAPS aware of the tracking Group IDs and preserve them when LCMAPS is building the Unix account mapping resolution.

Distribution

The distribution is officially distributed via EMI.

A man page and documentation is packaged with it.

LCMAPS configuration

The LCMAPS configuration file for my gLExec was changed to add this snippet in the top part:

trackinggid = "lcmaps_tracking_groupid.mod"
              "--tracking-groupid-min 1000"
              "--tracking-groupid-max 2000"

and at the bottom the flow of plugins to execute was altered to the following:

pluginexecpolicy:
verify_proxy -> good
good -> trackinggid
trackinggid -> posix_enf

As you can see, I've placed the trackinggid before the posix_enf plugin and after the good plugin. Replace the good plugin with the regular set of sequences of plugin to make it work. If you have multiple of these blocks, place the 'trackinggid' right before each instance of the posix_enf.

CAVEAT 1

The gLExec version 0.8.1-1 checks if all newly mapped UserIDs, (primary) GroupIDs and all secondary GroupIDs are resolvable to usernames and groupnames as a safety measure. You will not be hindered by this detail, but do note that other sites might experience problems. We're still discussion our options, but I take on it is that the UID and primary GID can perfectly be checked, but the secondary GID check and failure is a false-positive and should be regarded as a pedantic check that hinders Tracking GID deployments.

GLExec version 0.8.10 solves this problem by relaxing the check on the mapped (target) secondary Group IDs.

CAVEAT 2

I might have made some typos and style mistakes in the documentation and man pages. These will be fixed in a next release.

CAVEAT 3

We have not tested this on an AFS site. We know that AFS issues secondary GIDs itself. We have not tested nor assessed (yet) what the potential security implications could be of preserving the AFS issued secondary GIDs with respect to accessing the AFS token from the MUPJ's payload process.