Difference between revisions of "GLExec"

From PDP/Grid Wiki
Jump to navigationJump to search
 
(179 intermediate revisions by 3 users not shown)
Line 1: Line 1:
gLExec is a program to make the required mapping between the grid world and the Unix notion of users and groups, and has the capacity to enforce that mapping by modifying the uid and gids of running processes. Based on LCMAPS and LCMAPS, it can both act as a light-weight 'gatekeeper' replacement, and even be used on the worker node in late-binding (pilot job) scenarios. Through the LCMAPS SCAS client a central mapping and authorization service (SCAS, or any interoperable SAML2XACML2 service) can be used.
+
[[Image:MUPJ-CE-WN-gLExec.png|thumb|upright|400px|Multi User Pilot Job with CE & WN]] gLExec is a program that acts as a light-weight 'gatekeeper'. gLExec takes Grid credentials as input. gLExec takes the local site policy into account to '''authenticate''' and '''authorize''' the credentials. gLExec will switch to a new execution '''sandbox''' and execute the given command as the switched identity. gLExec is also capable of functioning as a light-weight control point which offers a binary ''yes''/''no'' result called the logging-only mode.
  
gLExec is a program to make the required mapping between the grid world and the Unix notion of users and groups, and has the capacity to enforce that mapping by modifying the uid and gids of running processes. It uses LCAS and LCMAPS for access control and the mapping engine. For a service running under a 'generic' uid, such as a web services container, it provides the way to escape from this container uid. It may be used similarly by externally managed services run on a site.s edge. Lastly, in a late-binding scenario, the identity of the workload owner can be set at the instant the job starts executing.
+
== Current gLExec version ==
  
The description, design and caveats are described [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec-chep2007-limited.pdf in the paper to the CHEP conference].  
+
The latest stable versions released are:
 +
* gLite-3.2: 0.8.1
 +
* EMI-1: 0.8.10
 +
* EMI-2: 0.9.6
 +
* EMI-3: 0.9.11
  
Local services, in particular computing services offered on Unix [5] and Unix-like platforms, use a different native representation of the user and group concepts. In the Unix domain, these are expressed as (numeric) identifiers, where each user is assigned a user identifier (uid) and one or more group identifiers (gid). At any one time, a single gid will be the .primary. gid (pgid) of a particular process, This pgid is initially used for group-level process (and batch system) accounting. The uid and gid representation is local to each administrative domain.  
+
Latest version available: 0.9.11, released in EMI-3 and UMD-3. Latest OSG release [https://www.opensciencegrid.org/bin/view/Documentation/Release3/WebHome OSG-3] is 0.9.9.
  
= Batch system interoperability =
+
== User information ==
  
When used on a worker node (in a late binding pilot job scenario), gLExec attempts really hard to be neutral to its OS environment. In particular, gLExec will not break the process tree, and will accumulate CPU and system usage times from the child processes it spawns. We recognize that this is particularly important in the gLExec-on-WN scenario, where the entire process (pilot job and target user processes) should be managed as a whole by the node-local batch system daemon.
+
* [[Proxy file handling in gLExec]] What do all the '''environment variables''' do with '''proxy''' files
 +
* [[GLExec TransientPilotJobs]] describes how you may go about managing a '''target''' workload's '''directory''' in '''Pilot Job Frameworks'''.
 +
* [[GLExec Environment Wrap and Unwrap scripts]] describes how you can '''preserve''' the '''environment''' variables between the calling process of gLExec and the user switched side of gLExec. For example: to preserve the environment variables from a Pilot Job Framework, through gLExec and into Pilot Job Payload.
  
You are encouraged to verify OS and batch system interoperability. In order to do that, you have two options:
+
=== Documentation ===
  
* Comprehensive testing: Ulrich Schwickerath has defined a series of (partially CERN-specific) tests to verify that glExec does not break the batch system setup of a site. He has extensively documented his efforts on the Wiki at [https://twiki.cern.ch/twiki/bin/view/FIOgroup/FsLSFGridglExec https://twiki.cern.ch/twiki/bin/view/FIOgroup/FsLSFGridglExec]. Note that the Local Tools section is CERN-specific. If you use other tools to clean up the user's work area (such as the $tmpdir facility of PBSPro and Troque), or use the [http://www.nikhef.nl/grid/sysutils/prune_users/ PruneUserproc utility] to remove stray processes, you are not affected by this.
+
* [[Exit codes of gLExec]]
 +
* [[Man pages of gLExec]]
  
* Basic OS and [http://www.nikhef.nl/grid/lcaslcmaps/glexec/osinterop batch-system] testing can be done even without installing glExec, but just compiling a simple C program with one hard-coded uid for testing. This is the fastest solution for testing, but only verifies that your batch system reacts correctly, not that your other grid-aware system script will work as you expect.
+
* EMI-2 and EMI-3 information:
 +
** [http://www.nikhef.nl/grid/lcaslcmaps/EMI2_docs/glexec_userguide.pdf EMI-2 User Guide PDF]
  
The following batch systems are known to be compatible with gLExec-on-the-Worker-Node:
+
== Sysadmin information ==
  
* Torque, all versions
+
=== Deployment: Installation and setups ===
* OpenPBS, all versions
 
* Platform LSF, all versions
 
* BQS, all versions
 
* Condor, all versions
 
  
If you notice any anomalies after testing, i.e. the job will not die, please notify the developers at grid dash mw dash security at nikhef dot nl.
+
* gLExec on the Worker Nodes or Computing Element
 +
** [[Using generic per-node pool accounts or a shared map database]]
 +
** [[GLExec Argus Quick Installation Guide]]
 +
** [[Using the SCAS]]
 +
** [[Batch System Interoperability]]
 +
** [[LCMAPS Tracking GroupID plugin]]
 +
* [[Deployment Scenarios in EGEE and OSG]]
 +
* [[Secure installation considerations]]
 +
* [[Debugging hints]]
 +
* [[GLExec Epilogue Functionality]] (version 0.9 and up)
  
= Deploying gLExec on the worker node =
+
* To help you master gLExec's security:
 +
** [[Need to Know's]]: Explains about the '''LD_LIBRARY_PATH''' in combo with '''setuid''' programs.
 +
** [https://www.nikhef.nl/pub/projects/grid/gridwiki/images/a/ab/Argus-SCAS-note-20100602.pdf Argus and SCAS node dd. June 2nd, 2010]: quick guide on how to decide for either SCAS or Argus as the central service with gLExec.
  
== Using generic per-node pool accounts or a shareed map database ==
+
* [[FAQs and misconceptions about gLExec]]
  
The preferred way to deploy gLExec on the worker node is by using (VO-agnostic) generic pool accounts that are local to each worker node. This way, you can be sure that a gLExec'ed job does not "escape" from the node, and it limits the number of pool accounts needed. For this configuration, you
+
See also the [[#Background information|Background information]]
  
* create at least as many pool accounts as you have job slots on a WN
+
=== To help you adapt or rebuild gLExec ===
* assign a worker node local gridmapdir (suggestion: <tt>/var/local/gridmapdir</tt>)
+
* [[Building gLExec and its gLite dependencies from SVN source]] How to build gLExec and all its gLite dependencies directly from source.
* create local pool accounts with a local home directory (suggestion: account names <tt>wnpool00</tt> etc, and home directories in a local file system that has enough space, e.g., /var/local/home/poolwn00, etc.)
+
* [[Building gLExec from src rpm]] How to build gLExec from a source RPM.
* configure the lcmaps.db configuration used by glexec to refer to this gridmapdir
 
  
Note that the /var/run/glexec directory is used to maintain the mapping between the target and the originator account for easy back-mapping for running jobs. This information is of course also logged to syslog(3).
+
=== Documentation ===
  
If you like shared pool accounts, you can use a shared atomic state database (implemented as an NFS directory) to host the gridmapdir. All operations on the gridmapdir are atomic, even over NFS, and it scales really well (remember that NFS is still the file sharing mechanism of choice for many large installations)
+
* [[Man pages of gLExec]]
 +
* [[Service Reference Card for gLExec]]
 +
* [[Papers about gLExec]]
  
Detailed documentation is given at [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec-install-procedure.html http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec-install-procedure.html].
+
See also the [[#Background information|Background information]]
  
== Using the SCAS ==
+
* EMI-2 and EMI-3 information:
 +
** [http://www.nikhef.nl/grid/lcaslcmaps/EMI2_docs/glexec_funcdesc.pdf EMI-2 Functional Description PDF]
 +
** [http://www.nikhef.nl/grid/lcaslcmaps/EMI2_docs/glexec_servrefcard.pdf EMI-2 Service Reference Card PDF]
 +
** [http://www.nikhef.nl/grid/lcaslcmaps/EMI2_docs/glexec_sysadminguide.pdf EMI-2 System Administrator's Guide PDF]
  
If you prefer to use LCMAPS with the SCAS service, add the [http://etics-repository.cern.ch:8080/repository/download/registered/org.glite/org.glite.security.lcmaps-plugins-scas-client/0.2.8/ scas-client plugin] to the set of RPMs, and configure the SCAS client. You would add to <tt>/opt/glite/etc/lcmaps/lcmaps-glexec.db</tt>:
+
== Test plans/reports ==
  
scasclient = "lcmaps_scas_client.mod"
+
* [[EMI-1 gLExec release test report]]: This is the report describing the test performed for the software certification of the released components with gLExec.
            " -capath /etc/grid-security/certificates/"
+
* [[EMI-2 gLExec release test report]]: This is the report describing the test performed for the software certification of the released components with gLExec.
            " -endpoint https://graszaad.nikhef.nl:8443"
 
            " -resourcetype wn"
 
            " -actiontype execute-now"
 
  
and the following policy execution flow at the end:
+
== Background information ==
  
# policies
+
* [https://twiki.cern.ch/twiki/bin/view/LCG/GlexecDeployment LCG Deployment of gLExec on the Worker Node]
glexec_get_account:
+
* [https://wlcg-tf.hep.ac.uk/wiki/Multi_User_Pilot_Jobs Multi User Pilot Jobs]
verify_proxy  -> scasclient
 
scasclient -> posix_enf
 
 
 
= Using gLExec in a pilot job framework =
 
 
 
When you use glexec with transient directories and input sandboxes, it's important that you create a writable directory for your target job, and you do this in a safe and portable way. We provide a proof-of-principle imple,entation on hwo to create such a directory, and clean up after yourself here:
 
 
 
* [https://ndpfsvn.nikhef.nl/cgi-bin/viewvc.cgi/pdpsoft/trunk/grid-mw-security/glexec/util/mkgltempdir/ https://ndpfsvn.nikhef.nl/cgi-bin/viewvc.cgi/pdpsoft/trunk/grid-mw-security/glexec/util/mkgltempdir/]
 
 
 
See also the more extensive text on [[GLExec TransientPilotJobs]].
 
 
 
== Exit Codes ==
 
 
 
The error code that glexec returns:
 
 
 
201 - client error, which includes:
 
* no proxy is provided
 
* wrong proxy permissions
 
* target location is not accessible
 
* the binary to execute does not exist
 
* the mapped user has no rigths to execute the binary when GLEXEC_CLIENT_CERT is not set
 
 
 
202 - system error
 
* glexec.conf is not present or malformed
 
* lcas or lcmaps initialization failure, can be obtained moving the lcas/lcmaps db files.
 
 
 
203 - authorization error
 
* user is not whitelisted
 
* local lcas authorization failure
 
* user banned by the SCAS server
 
* lcmaps failure on the scas server
 
* SCAS server not running
 
* network cable unplugged on the SCAS server host.
 
 
 
204 - exit code of the called application overlap with the previous ones
 
* application called by glexec exit with code 201, 202, 203 or 204
 
 
 
= Deployment scenarios in EGEE and OSG =
 
 
 
The way gLExec is installed depends a bit on the chosen scenario and the way authorization in done in your infrastructure. Have a look at these installation and deployment guides for more information:
 
 
 
* [https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/GlexecInstall gLExec installations in Open Science Grid]
 
* YAIM supported installation in EGEE, both [https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#GLEXEC_wn YAIM site-info.def variables] and a specific section for [https://twiki.cern.ch/twiki/bin/view/EGEE/YAIM_glexec_wn gLExec on worker nodes] installed with YAIM
 
* Installing gLExec on the worker node (setuid) manually is described [https://twiki.cern.ch/twiki/bin/view/EGEE/GlexecOnWNConfig here].
 
 
 
= Pilot Job How To's =
 
 
 
The gLExec...
 
* [[GLExec TransientPilotJobs]] describes how you may go about managing a target workload's transient area.
 
 
 
 
 
= Manual and documentation =
 
 
 
* http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.1.html
 
* http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.conf.5.html
 

Latest revision as of 10:19, 4 March 2015

Multi User Pilot Job with CE & WN

gLExec is a program that acts as a light-weight 'gatekeeper'. gLExec takes Grid credentials as input. gLExec takes the local site policy into account to authenticate and authorize the credentials. gLExec will switch to a new execution sandbox and execute the given command as the switched identity. gLExec is also capable of functioning as a light-weight control point which offers a binary yes/no result called the logging-only mode.

Current gLExec version

The latest stable versions released are:

  • gLite-3.2: 0.8.1
  • EMI-1: 0.8.10
  • EMI-2: 0.9.6
  • EMI-3: 0.9.11

Latest version available: 0.9.11, released in EMI-3 and UMD-3. Latest OSG release OSG-3 is 0.9.9.

User information

  • Proxy file handling in gLExec What do all the environment variables do with proxy files
  • GLExec TransientPilotJobs describes how you may go about managing a target workload's directory in Pilot Job Frameworks.
  • GLExec Environment Wrap and Unwrap scripts describes how you can preserve the environment variables between the calling process of gLExec and the user switched side of gLExec. For example: to preserve the environment variables from a Pilot Job Framework, through gLExec and into Pilot Job Payload.

Documentation

Sysadmin information

Deployment: Installation and setups

See also the Background information

To help you adapt or rebuild gLExec

Documentation

See also the Background information

Test plans/reports

Background information