|
|
(107 intermediate revisions by 3 users not shown) |
Line 1: |
Line 1: |
− | gLExec is a program that acts as a light-weight 'gatekeeper'. gLExec takes Grid credentials as input. By taking the local site policy into account it '''authenticates''' and '''authorizes''' the credentials. For extra safety gLExec is capable of creating a new execution sandbox based on the Grid credentials. Besides the ''yes''/''no'' control point functionality in the '''logging-only''' mode it can create identity specific sandboxes in the '''identity-switching''' mode. | + | [[Image:MUPJ-CE-WN-gLExec.png|thumb|upright|400px|Multi User Pilot Job with CE & WN]] gLExec is a program that acts as a light-weight 'gatekeeper'. gLExec takes Grid credentials as input. gLExec takes the local site policy into account to '''authenticate''' and '''authorize''' the credentials. gLExec will switch to a new execution '''sandbox''' and execute the given command as the switched identity. gLExec is also capable of functioning as a light-weight control point which offers a binary ''yes''/''no'' result called the logging-only mode. |
| | | |
− | = Installation/setup, Functionality test, How To = | + | == Current gLExec version == |
| | | |
− | * gLExec on the Worker Nodes
| + | The latest stable versions released are: |
− | ** Using generic per-node pool accounts or a shared map database | + | * gLite-3.2: 0.8.1 |
− | ** Using the SCAS | + | * EMI-1: 0.8.10 |
− | ** [[Batch System Interoperability]] | + | * EMI-2: 0.9.6 |
− | * Deployment Scenarios (EGEE vs. OSG) | + | * EMI-3: 0.9.11 |
| | | |
− | * [[FAQs and misconceptions about gLExec]]
| + | Latest version available: 0.9.11, released in EMI-3 and UMD-3. Latest OSG release [https://www.opensciencegrid.org/bin/view/Documentation/Release3/WebHome OSG-3] is 0.9.9. |
− | * Debugging hints
| |
− | * How To's
| |
| | | |
− | = Online documentation (on other sites) = | + | == User information == |
| | | |
− | * Man pages | + | * [[Proxy file handling in gLExec]] What do all the '''environment variables''' do with '''proxy''' files |
− | * Service Reference Card | + | * [[GLExec TransientPilotJobs]] describes how you may go about managing a '''target''' workload's '''directory''' in '''Pilot Job Frameworks'''. |
| + | * [[GLExec Environment Wrap and Unwrap scripts]] describes how you can '''preserve''' the '''environment''' variables between the calling process of gLExec and the user switched side of gLExec. For example: to preserve the environment variables from a Pilot Job Framework, through gLExec and into Pilot Job Payload. |
| | | |
| + | === Documentation === |
| | | |
| + | * [[Exit codes of gLExec]] |
| + | * [[Man pages of gLExec]] |
| | | |
− | gLExec is a program to make the required mapping between the grid world and the Unix notion of users and groups, and has the capacity to enforce that mapping by modifying the uid and gids of running processes. It used LCAS and LCMAPS for access control and the mapping engine. It can both act as a light-weight 'gatekeeper' replacement, and even be used on the worker node in late-binding (pilot job) scenarios. Through the LCMAPS SCAS client a central mapping and authorization service (SCAS, or any interoperable SAML2XACML2 service) can be used.
| + | * EMI-2 and EMI-3 information: |
| + | ** [http://www.nikhef.nl/grid/lcaslcmaps/EMI2_docs/glexec_userguide.pdf EMI-2 User Guide PDF] |
| | | |
− | The description, design and caveats are described [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec-chep2007-limited.pdf in the paper to the CHEP conference].
| + | == Sysadmin information == |
| | | |
− | Local services, in particular computing services offered on Unix [5] and Unix-like platforms, use a different native representation of the user and group concepts. In the Unix domain, these are expressed as (numeric) identifiers, where each user is assigned a user identifier (uid) and one or more group identifiers (gid). At any one time, a single gid will be the .primary. gid (pgid) of a particular process, This pgid is initially used for group-level process (and batch system) accounting. The uid and gid representation is local to each administrative domain.
| + | === Deployment: Installation and setups === |
| | | |
| + | * gLExec on the Worker Nodes or Computing Element |
| + | ** [[Using generic per-node pool accounts or a shared map database]] |
| + | ** [[GLExec Argus Quick Installation Guide]] |
| + | ** [[Using the SCAS]] |
| + | ** [[Batch System Interoperability]] |
| + | ** [[LCMAPS Tracking GroupID plugin]] |
| + | * [[Deployment Scenarios in EGEE and OSG]] |
| + | * [[Secure installation considerations]] |
| + | * [[Debugging hints]] |
| + | * [[GLExec Epilogue Functionality]] (version 0.9 and up) |
| | | |
| + | * To help you master gLExec's security: |
| + | ** [[Need to Know's]]: Explains about the '''LD_LIBRARY_PATH''' in combo with '''setuid''' programs. |
| + | ** [https://www.nikhef.nl/pub/projects/grid/gridwiki/images/a/ab/Argus-SCAS-note-20100602.pdf Argus and SCAS node dd. June 2nd, 2010]: quick guide on how to decide for either SCAS or Argus as the central service with gLExec. |
| | | |
− | = Deploying gLExec on the worker node =
| + | * [[FAQs and misconceptions about gLExec]] |
− | | |
− | == Using generic per-node pool accounts or a shared map database ==
| |
− | | |
− | The preferred way to deploy gLExec on the worker node is by using (VO-agnostic) generic pool accounts that are local to each worker node. This way, you can be sure that a gLExec'ed job does not "escape" from the node, and it limits the number of pool accounts needed. For this configuration, you
| |
− | | |
− | * create at least as many pool accounts as you have job slots on a WN
| |
− | * assign a worker node local gridmapdir (suggestion: <tt>/var/local/gridmapdir</tt>)
| |
− | * create local pool accounts with a local home directory (suggestion: account names <tt>wnpool00</tt> etc, and home directories in a local file system that has enough space, e.g., /var/local/home/poolwn00, etc.)
| |
− | * configure the lcmaps.db configuration used by glexec to refer to this gridmapdir
| |
− | | |
− | Note that the /var/run/glexec directory is used to maintain the mapping between the target and the originator account for easy back-mapping for running jobs. This information is of course also logged to syslog(3).
| |
− | | |
− | If you like shared pool accounts, you can use a shared atomic state database (implemented as an NFS directory) to host the gridmapdir. All operations on the gridmapdir are atomic, even over NFS, and it scales really well (remember that NFS is still the file sharing mechanism of choice for many large installations)
| |
− | | |
− | Detailed documentation is given at [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec-install-procedure.html http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec-install-procedure.html].
| |
− | | |
− | == Using the SCAS ==
| |
− | | |
− | If you prefer to use LCMAPS with the SCAS service, add the [http://etics-repository.cern.ch:8080/repository/download/registered/org.glite/org.glite.security.lcmaps-plugins-scas-client/0.2.8/ scas-client plugin] to the set of RPMs, and configure the SCAS client. You would add to <tt>/opt/glite/etc/lcmaps/lcmaps-glexec.db</tt>:
| |
− | | |
− | scasclient = "lcmaps_scas_client.mod"
| |
− | " -capath /etc/grid-security/certificates/"
| |
− | " -endpoint https://graszaad.nikhef.nl:8443"
| |
− | " -resourcetype wn"
| |
− | " -actiontype execute-now"
| |
− | | |
− | and the following policy execution flow at the end:
| |
− | | |
− | # policies
| |
− | glexec_get_account:
| |
− | verify_proxy -> scasclient
| |
− | scasclient -> posix_enf
| |
− | | |
− | | |
− | == Deployment scenarios in EGEE and OSG ==
| |
− | | |
− | The way gLExec is installed depends a bit on the chosen scenario and the way authorization in done in your infrastructure. Have a look at these installation and deployment guides for more information:
| |
− | | |
− | * [https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/GlexecInstall gLExec installations in Open Science Grid]
| |
− | * YAIM supported installation in EGEE, both [https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#GLEXEC_wn YAIM site-info.def variables] and a specific section for [https://twiki.cern.ch/twiki/bin/view/EGEE/YAIM_glexec_wn gLExec on worker nodes] installed with YAIM
| |
− | * Installing gLExec on the worker node (setuid) manually is described [https://twiki.cern.ch/twiki/bin/view/EGEE/GlexecOnWNConfig here].
| |
− | | |
− | | |
− | == Secure installation considerations ==
| |
− | | |
− | To prevent a wrong installation of gLExec, which could lead to easy exploitation of the computer system, an out side source must be able to verify the installation. Consider the use of tripwire, rpm --verify <rpm package name> or something.
| |
− | At the moment the packages that we produce are without the setuid-bit on root. This means that an admin would need to run YAIM or the chmod command manually to get the setuid bit enabled on root. Because the deployment needs this post installation manipulation on the executable the rpm --verify (and Debian package equivalent) will inherently fail, because not only the hash of the binary also the file permissions are verified.
| |
− | | |
− | It's pointless for gLExec to provide a safe test in itself to signal its binary to be, for example, be world writable. If this test fails, you would send a strong signal to a potential attacker to rewrite the binary. On Linux systems and most Unix system the setuid-root bit is stripped when the image is rewritten, making it a harmless executable at best. However, this is not desired, but unavoidable to provide such a self test in gLExec itself.
| |
− | | |
− | == Service Reference Cards ==
| |
− | | |
− | In the list of Service Reference Cards (https://twiki.cern.ch/twiki/bin/view/EGEE/ServiceReferenceCards).
| |
− | | |
− | There is now an entry for gLExec: https://twiki.cern.ch/twiki/bin/view/EGEE/GLExec
| |
− | | |
− | = Using gLExec in a pilot job framework =
| |
− | | |
− | When you use glexec with transient directories and input sandboxes, it's important that you create a writable directory for your target job, and you do this in a safe and portable way. We provide a proof-of-principle imple,entation on hwo to create such a directory, and clean up after yourself here:
| |
− | | |
− | * [https://ndpfsvn.nikhef.nl/cgi-bin/viewvc.cgi/pdpsoft/trunk/grid-mw-security/glexec/util/mkgltempdir/ https://ndpfsvn.nikhef.nl/cgi-bin/viewvc.cgi/pdpsoft/trunk/grid-mw-security/glexec/util/mkgltempdir/] | |
− | | |
− | See also the more extensive text on [[GLExec TransientPilotJobs]].
| |
− | | |
− | == Exit Codes ==
| |
− | | |
− | The error code that glexec returns:
| |
− | | |
− | 201 - client error, which includes:
| |
− | * no proxy is provided
| |
− | * wrong proxy permissions
| |
− | * target location is not accessible
| |
− | * the binary to execute does not exist
| |
− | * the mapped user has no rigths to execute the binary when GLEXEC_CLIENT_CERT is not set
| |
− | | |
− | 202 - system error
| |
− | * glexec.conf is not present or malformed
| |
− | * lcas or lcmaps initialization failure, can be obtained moving the lcas/lcmaps db files.
| |
− | | |
− | 203 - authorization error
| |
− | * user is not whitelisted
| |
− | * local lcas authorization failure
| |
− | * user banned by the SCAS server
| |
− | * lcmaps failure on the scas server
| |
− | * SCAS server not running
| |
− | * network cable unplugged on the SCAS server host.
| |
− | | |
− | 204 - exit code of the called application overlap with the previous ones
| |
− | * application called by glexec exit with code 201, 202, 203 or 204
| |
− | | |
− | == Need to Know's ==
| |
− | | |
− | The gLExec executable is installable in two ways, with an without the setuid (file system) bit on root. With the setuid-bit enabled on root, this effectively means that gLExec is being executed with root privileges. Without the setuid or setgid bits on root the gLExec executable is like any other regular executable.
| |
− | | |
− | The safety features of gLExec are implemented with great care to avoid misuse and exploitation by anybody who executes it. As gLExec is typically installed with a setuid bit on root, this effectively means that anybody on the system is able to execute something with root privileges for a brief moment of time to perform the user switch.
| |
− | | |
− | A couple of safety features that are build in the gLExec tool are:
| |
− | | |
− | * The LD_LIBRARY_PATH, LD_RUN_PATH and other LD_* environment variables are removed from the process environment by the Operating System before the first line of gLExec code is executed by a Unix and Linux system. Only the /etc/ld.so.conf{.d/}, RPATH settings and other system specific paths are used and resolved. This statement holds for '''any''' setuid or setgid executable.
| |
− | | |
− | * The rest of the environment is stripped off by gLExec. There are a couple of environment settings that can easily lead to a root exploit in the standard library of a Unix and Linux system. Only the GLEXEC_* environment variables are kept. There is an option in the glexec.conf file to preserve more variables, but these must be selected with great care and setup by each System Administrator on all their machines.
| |
− | | |
− | * If the target user is authorized and when a mapping and Unix process identity switch the HOME and X509_USER_PROXY will be rewritten. Their value will contain the paths that are relevant for the target user account.
| |
− | | |
− | * The target user process has the Unix identity as mapped by LCMAPS. This could be from a separate set of pool accounts, or the regular set of pool accounts as given by the same user credentials from an LCG-CE or CREAM-CE. It could be a poolaccount defined locally on the machine. The only assumption that holds is that the target user account has the privileges that are appointed to them by the local site administrator.
| |
− | | |
− | == How To's ==
| |
− | | |
− | To help you master the obstacles of gLExec's security we offer some interesting How To material:
| |
− | | |
− | * [[GLExec TransientPilotJobs]] describes how you may go about managing a target workload's transient area.
| |
− | * [[GLExec Environment Wrap and Unwrap scripts]] describes how you can preserve the environment variables between the calling process of gLExec and the user switched side of gLExec. For example: to preserve the environment variables from a Pilot Job Framework, through gLExec and into Pilot Job Payload.
| |
− | | |
− | | |
− | | |
− | | |
− | = Manual and documentation =
| |
− | | |
− | * http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.1.html
| |
− | * http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.conf.5.html
| |
− | | |
− | | |
− | = Debugging hints and answers to FAQ =
| |
− | | |
− | Here are some useful things to check and mention when contacting us for help:
| |
− | | |
− | == 1. Check the version of the gLExec version: ==
| |
− | | |
− | The latest released version is gLExec version: 0.6.8-3
| |
− | | |
− | /opt/glite/sbin/glexec -v
| |
− | | |
− | == 2. Check the file permissions of the gLExec executable. ==
| |
− | | |
− | For all run-modes of gLExec, the gLExec must be executable for all user:
| |
− | 0111, 0555, 0755; owned by root:glexec or root:root.
| |
− | | |
− | For running gLExec in setuid mode, the gLExec must be executable for all user and have at least the setuid bit on the user:
| |
− | 4111, 4555, 4755; owned by root:glexec or root:root.
| |
− | | |
− | For running gLExec in setuid mode, the gLExec could also be setgid too (this solves a bug in the file log at the first run of gLExec):
| |
− | 6111, 6555, 6755; owned by root:glexec or root:root.
| |
− | | |
− | == 3. Execute with exported GLEXEC_CLIENT_CERT and exported X509_USER_PROXY, with the full path ==
| |
− | export GLEXEC_CLIENT_CERT=`pwd`/mkproxy-x509-voms
| |
− | export X509_USER_PROXY=`pwd`/mkproxy-x509-voms
| |
− | | |
− | | |
− | == 4. Is the user account that tries to use gLExec whitelisted? ==
| |
− | | |
− | Method 1.: the calling account is a member of the 'glexec' primary or secondary group.
| |
− |
| |
− | Method 2.: the account or the pool is whitelisted in the glexec.conf. See the glexec.conf man page for more details on the whitelist options: [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.conf.5.html#DESCRIPTION man 5 gLExec.conf]
| |
− | | |
− | == Example test script for gLExec ==
| |
− | | |
− | #!/bin/sh
| |
− |
| |
− | export GLEXEC_CLIENT_CERT=`pwd`/mkproxy-x509-voms
| |
− | export X509_USER_PROXY=`pwd`/mkproxy-x509-voms
| |
− |
| |
− | /opt/glite/sbin/glexec /usr/bin/id -a ; echo $?
| |
− | | |
− | | |
− | = FAQs and misconceptions about gLExec =
| |
− | | |
− | | |
− | '''Question''' : ''Is gLExec the similar to like [http://www.gratisoft.us/sudo/readme.html sudo]?''
| |
− | | |
− | '''Answer''' : No. Sudo is generically meant to execute a program or script with root-privileges. gLExec will not perform any task besides the actual user switch with root-privileges. All operational task within gLExec are performed with the privileges of either the calling (system/Unix) user (e.g. reading the proxy) or the mapped user (writing the proxy and executing the command).
| |
− | | |
− | | |
− | '''Question''' : ''Is gLExec like Apache's [http://httpd.apache.org/docs/2.0/suexec.html suexec]?''
| |
− | | |
− | '''Answer''' : No. gLExec does implement all the safety checks of suexec, but suexec lacks the advanced Grid credential authentication, authorization and account mapping features that we've build-in. For example: gLExec's execution can be restricted to a limited list of accounts from a whitelist, it uses [[LCAS]] as a pluggable authorization framework and it uses [[LCMAPS]] as pluggable framework to perform the local account mapping. The [[LCMAPS]] layer can also be extended to use [[SCAS]], [https://twiki.cern.ch/twiki/bin/view/EGEE/AuthorizationFramework Argus] or [https://www.racf.bnl.gov/Facility/GUMS/1.4/ GUMS].
| |
− | | |
− | | |
− | '''Statement''' : ''Is my Batch System able to handle identity switching during a job run?''
| |
− | | |
− | '''Response''' : Processes like the '''pbs_mom''' run with root-privileges and thus have all the privileges to manage all job types. Please have a look at the [https://www.nikhef.nl/pub/projects/grid/gridwiki/index.php/GLExec#Batch_system_interoperability Batch System Interoperability] experiences with different types of Batch Systems and a non-gLExec [http://www.nikhef.nl/grid/lcaslcmaps/glexec/osinterop testing tool].
| |
− |
| |
− | | |
− | '''Question''' : ''gLExec runs with elevated privileges, isn't it dangerous to offer gLExec on my Worker Nodes?''
| |
− | | |
− | '''Answer''' : Security measures build from the ground up prevent any use of the elevated privileges. Both the user process calling gLExec and the executed command with the target identity are unable to use gLExec's privileges. Multiple build-in security measures prevent a target user to be mapped to a root account or root group.
| |
− | | |
− | | |
− | '''Question''' : ''Who controls the gLExec run mode i.e. choice to run in '''Logging-only''' mode or '''setuid''' mode?''
| |
− | | |
− | '''Answer''' : The site is in full control of this choice. The system administrator will need to install the right configuration settings for the mapping process to function properly and will need to install gLExec with the required setuid-bit enable on the binary with the root owenership.
| |
− | | |
| | | |
− | '''Question''' : ''Can everybody on my system can call gLExec?''
| + | See also the [[#Background information|Background information]] |
| | | |
− | '''Answer''' : No. To start using gLExec the Unix account that calls it must be whitelisted. There are a few options of whitelisting:
| + | === To help you adapt or rebuild gLExec === |
| + | * [[Building gLExec and its gLite dependencies from SVN source]] How to build gLExec and all its gLite dependencies directly from source. |
| + | * [[Building gLExec from src rpm]] How to build gLExec from a source RPM. |
| | | |
− | # Per account white listing: In the [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.conf.5.html glexec.conf] file write '''user_white_list = okoeroo'''
| + | === Documentation === |
− | # Per pool of account white listing: In the [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.conf.5.html glexec.conf] file write '''user_white_list = .atlpilot'''
| |
− | # By letting the calling account be a member of the special Unix group 'glexec'.
| |
− | # You could whitelist every account by using the wildcard '''*'''. There are good motivations why you want to do this and it should not blindly be regarded as a security risk. Please read ahead in the motivation section for details about this.
| |
| | | |
− | Note: Even the 'root' account itself needs to be whitelisted to be able to work with gLExec.
| + | * [[Man pages of gLExec]] |
− | For more information see the [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.1.html glexec] and [http://www.nikhef.nl/grid/lcaslcmaps/glexec/glexec.conf.5.html glexec.conf] man pages for the user_white_list option.
| + | * [[Service Reference Card for gLExec]] |
| + | * [[Papers about gLExec]] |
| | | |
| + | See also the [[#Background information|Background information]] |
| | | |
− | '''Question''' : ''Isn't gLExec a risk to my infrastructure?''
| + | * EMI-2 and EMI-3 information: |
| + | ** [http://www.nikhef.nl/grid/lcaslcmaps/EMI2_docs/glexec_funcdesc.pdf EMI-2 Functional Description PDF] |
| + | ** [http://www.nikhef.nl/grid/lcaslcmaps/EMI2_docs/glexec_servrefcard.pdf EMI-2 Service Reference Card PDF] |
| + | ** [http://www.nikhef.nl/grid/lcaslcmaps/EMI2_docs/glexec_sysadminguide.pdf EMI-2 System Administrator's Guide PDF] |
| | | |
− | '''Answer''' : Taking the use case of Multi User Pilot Job Frameworks as an example; the Pilot Job frameworks have moved the front door of your site from the CE to the WN. gLExec on the Worker Nodes gives back control to the '''Sites''' which they have on their '''CEs''' and regular jobs. With the identity switching feature enabled it can give the '''VOs''' the opportunity to not be regarded as '''one''' user i.e. when one user in the VO goes rogue the entire VO is suspected and might be disallowed as a whole from a site.
| + | == Test plans/reports == |
| | | |
| + | * [[EMI-1 gLExec release test report]]: This is the report describing the test performed for the software certification of the released components with gLExec. |
| + | * [[EMI-2 gLExec release test report]]: This is the report describing the test performed for the software certification of the released components with gLExec. |
| | | |
− | == Motivations == | + | == Background information == |
| | | |
− | We've invited multiple vulnerability assessment teams to look at gLExec. They've assessed the code in a lot of detail and written multiple reports about the quality of the code and the vulnerabilities that were in them. The overall conclusions were that we've done a very good job over time in creating a very secure tool that exactly does what we advertise. We've build upon our experiences in the LCG-CE, gridFTPd and other security related tools that we've made over time. We have a strong drive to make gLExec even safer than it already is without compromising the usability of the tool.
| + | * [https://twiki.cern.ch/twiki/bin/view/LCG/GlexecDeployment LCG Deployment of gLExec on the Worker Node] |
| + | * [https://wlcg-tf.hep.ac.uk/wiki/Multi_User_Pilot_Jobs Multi User Pilot Jobs] |