Difference between revisions of "GLExec Epilogue Functionality"

From PDP/Grid Wiki
Jump to navigationJump to search
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
Starting from version 0.9 gLExec can optionally run a epilogue executable after the payload has finished.
+
Starting from version 0.9 gLExec can optionally run an epilogue executable after the payload has finished.
  
 
== General ==
 
== General ==
  
In linger mode, gLExec can optionally run a trusted executable, intended to clean up the payload environment. The option should point to the absolute path of a trusted executable: it must not be possible for anyone except the root user
+
In linger mode, gLExec can optionally run a trusted executable, intended to clean up the payload environment. Whether it will run is triggered by the glexec option '''''epilogue'''''. The option should point to the absolute path of a trusted executable: it must not be possible for anyone except the root user
 
(or the epilogue_user and/or members of the epilogue_group when set) to change the executable. It will run as uid/gid 0,0 (unless epilogue_user and/or epilogue_group are set). If it does not finish within a set epilogue_timeout, it will be send a SIGTERM. For proper functioning it is advised that gLExec will do the userswitch (instead of LCMAPS).
 
(or the epilogue_user and/or members of the epilogue_group when set) to change the executable. It will run as uid/gid 0,0 (unless epilogue_user and/or epilogue_group are set). If it does not finish within a set epilogue_timeout, it will be send a SIGTERM. For proper functioning it is advised that gLExec will do the userswitch (instead of LCMAPS).
  
The epilogue can be configured using the [Man_pages_of_gLExec | glexec.conf] settings
+
If the epilogue fails for whatever reason, gLExec will return either with a 202 exit code (internal gLExec error) or potentially a 204 (e.g. when the epilog itself returned a 201-204 range exit code).
{|
+
 
|epilogue || when set, the name of the trusted binary or script to run. Needs to be a absolute canonical path
+
== Logging ==
 +
 
 +
The epilogue runs with stdin, stdout and stderr all attached to /dev/null. There is no special logging functionality implemented and this is left to the developer of the epilogue code.
 +
 
 +
== Configuration options ==
 +
The epilogue can be configured using the [[Man_pages_of_gLExec | glexec.conf]] settings
 +
{| class="wikitable"
 +
|'''epilogue''' || when set, the name of the trusted binary or script to run. Needs to be a absolute canonical path
 
|-
 
|-
|epilogue_user || When  set, the epilogue will be run with this user identity. In addition this user is allowed to have write permission for the epilogue executable (i.e. is trusted). This option can only be used when gLExec does the userswitch. Default: root.
+
|'''epilogue_user''' || When  set, the epilogue will be run with this user identity. In addition this user is allowed to have write permission for the epilogue executable (i.e. is trusted). This option can only be used when gLExec does the userswitch. It can be useful if the script is located on an NFS with root squash. Default: root.
 
|-
 
|-
|epilogue_group || When set, the epilogue will be run with this group identity. In addition members of this group are allowed to have write permission for the epilogue executable (i.e. are trusted). When unset, the executable will be run with GID 0 and no group will be trusted. This option can only be used when gLExec does the userswitch.
+
|'''epilogue_group''' || When set, the epilogue will be run with this group identity. In addition members of this group are allowed to have write permission for the epilogue executable (i.e. are trusted). When unset, the executable will be run with GID 0 and no group will be trusted. This option can only be used when gLExec does the userswitch. It can be useful if the script is located on an NFS with root squash.
 
|-
 
|-
|epilogue_timeout || The epilogue executable will run for at most this timeout in seconds, before being sent a SIGTERM (and SIGKILL).  Default: 300 seconds.
+
|'''epilogue_timeout''' || The epilogue executable will run for at most this timeout in seconds, before being sent a SIGTERM (and SIGKILL).  Default: 300 seconds.
 
|}
 
|}
  
== Environment ==
+
== Runtime Environment ==
  
 
The epilogue runs with the same cleaned environment as gLExec sets up for the payload, with a number of additional variables, all starting with <tt>GLEXEC_EPILOG_</tt>. Any variables setup before gLExec starting with GLEXEC_EPILOG_ will be cleared before the epilogue is run.
 
The epilogue runs with the same cleaned environment as gLExec sets up for the payload, with a number of additional variables, all starting with <tt>GLEXEC_EPILOG_</tt>. Any variables setup before gLExec starting with GLEXEC_EPILOG_ will be cleared before the epilogue is run.
{|
+
{| class="wikitable"
 
|GLEXEC_EPILOG_ARGV<N>      || argv of payload
 
|GLEXEC_EPILOG_ARGV<N>      || argv of payload
 
|-
 
|-
Line 56: Line 63:
 
|-
 
|-
 
|}
 
|}
 
  
 
== Security ==
 
== Security ==
  
* In order to prevent tampering with the epilogue binary or script, the permissions need to be such, that only the root user and optionally epilogue user, has write access to the file or one of its path members (it is ``trusted-root'').
+
* In order to prevent tampering with the epilogue binary or script, the permissions need to be such, that only the root user and optionally epilogue user, has write access to the file or one of its path members (it is "trusted-root").
 
* GLExec becomes immune to signals from any user but root.
 
* GLExec becomes immune to signals from any user but root.
 +
* It is important to note that writing a epilogue should be done with '''''utmost care''''':
 +
** it will be ran (normally) by root user
 +
** it is triggered automatically
 +
** blindly killing all processes from the payload user can kill ''good'' processes
 +
** ...
 +
* Logging should be done in a secure way, e.g. to either syslog or to a trusted file location.
 +
 +
== Example usage ==
 +
 +
Providing a catch-all example script is not possible as this heavily depends on site details. Sites might want to have a look at Nikhef's [http://www.nikhef.nl/grid/sysutils/prune_users/ reaper script], intended to clean up daemonized processes after a grid job has finished.
 +
 +
Another example script that can be used '''''after necessary changes''''' to clean up remaining subprocesses after the payload has finished. It assumes that gLExec uses process groups (default behaviour).
 +
<pre>
 +
#!/bin/sh
 +
# Example epilogue script for use with gLExec 0.9.
 +
#
 +
# DISCLAIMER: THIS SCRIPT IS INTENDED AS EXAMPLE CODE ONLY
 +
# DO NOT USE WITHOUT PROPERLY UNDERSTANDING WHAT IT IS DOING AND MAKING ALL NECESSARY CHANGES.
 +
#
 +
# Nikhef and/or the author do NOT except any liability for any damage that might occur when using this script.
 +
#
 +
# Author: Mischa Salle <msalle@nikhef.nl>
 +
#
 +
 +
# Location of logfile
 +
LOGFILE=/var/log/glexec/epilog.log
 +
 +
# Some useful variables
 +
LOGDIR=$(dirname $LOGFILE)
 +
PROG=$(basename $0)
 +
FMT="%b %e %H:%M:%S $(hostname) $PROG[$$]: "
 +
EPIL_UNAME=$(whoami)
 +
EPIL_UID=$(id -u $epil_uname)
 +
 +
# Create log directory and file if not exist and make sure at least logfile has
 +
# right permission
 +
[ ! -d $LOGDIR ] && { mkdir -p -m 0700 $LOGDIR || exit 1; }
 +
[ ! -f $LOGFILE ] && { touch $LOGFILE || exit 1; }
 +
chmod 0600 $LOGFILE || exit 1
 +
 +
# Log general information
 +
echo $(date +"$FMT") "Running as uid $EPIL_UID ($EPIL_UNAME) for" \
 +
    "glexec $GLEXEC_EPILOG_GLEXEC_PID," \
 +
    "payload $GLEXEC_EPILOG_TARGET_PID" \
 +
    "uid $GLEXEC_EPILOG_TARGET_UID ($GLEXEC_EPILOG_TARGET_USER)" >> $LOGFILE
 +
 +
# Get process IDs of all processes in the payload process group running as the
 +
# target user. Note: this is no guarantee to catch all processes as the payload
 +
# could have started a new process group.
 +
pids=$(ps -opid=,pgid= -u $GLEXEC_EPILOG_TARGET_UID|grep " ${GLEXEC_EPILOG_TARGET_PGID}$"|cut -c-5)
 +
if [ -n "$pids" ]; then
 +
    echo $(date +"$FMT") "Sending SIGKILL to PIDs: `echo $pids`" >> $LOGFILE
 +
    kill_output="$(kill -9 $pids 2>&1)"
 +
    if [ $? -ne 0 ];then
 +
        # Log error but do not fail epilogue on it
 +
echo $(date +"$FMT") "$kill_output" >> $LOGFILE
 +
    fi
 +
fi
 +
 +
# 1 second gracetime
 +
sleep 1
 +
 +
# Log remaining process IDs of processes running as the payload user: these
 +
# might have come from this or other jobs
 +
rem_pids=$(ps -opid= -u $GLEXEC_EPILOG_TARGET_UID)
 +
if [ -n "$rem_pids" ];then
 +
    echo $(date +"$FMT") "Remaining processes from uid" \
 +
"$GLEXEC_EPILOG_TARGET_UID:" $rem_pids >> $LOGFILE
 +
else
 +
    echo $(date +"$FMT") "No remaining processes from uid" \
 +
"$GLEXEC_EPILOG_TARGET_UID" >> $LOGFILE
 +
fi
 +
 +
# All done
 +
exit 0
 +
</pre>

Latest revision as of 14:33, 22 May 2012

Starting from version 0.9 gLExec can optionally run an epilogue executable after the payload has finished.

General

In linger mode, gLExec can optionally run a trusted executable, intended to clean up the payload environment. Whether it will run is triggered by the glexec option epilogue. The option should point to the absolute path of a trusted executable: it must not be possible for anyone except the root user (or the epilogue_user and/or members of the epilogue_group when set) to change the executable. It will run as uid/gid 0,0 (unless epilogue_user and/or epilogue_group are set). If it does not finish within a set epilogue_timeout, it will be send a SIGTERM. For proper functioning it is advised that gLExec will do the userswitch (instead of LCMAPS).

If the epilogue fails for whatever reason, gLExec will return either with a 202 exit code (internal gLExec error) or potentially a 204 (e.g. when the epilog itself returned a 201-204 range exit code).

Logging

The epilogue runs with stdin, stdout and stderr all attached to /dev/null. There is no special logging functionality implemented and this is left to the developer of the epilogue code.

Configuration options

The epilogue can be configured using the glexec.conf settings

epilogue when set, the name of the trusted binary or script to run. Needs to be a absolute canonical path
epilogue_user When set, the epilogue will be run with this user identity. In addition this user is allowed to have write permission for the epilogue executable (i.e. is trusted). This option can only be used when gLExec does the userswitch. It can be useful if the script is located on an NFS with root squash. Default: root.
epilogue_group When set, the epilogue will be run with this group identity. In addition members of this group are allowed to have write permission for the epilogue executable (i.e. are trusted). When unset, the executable will be run with GID 0 and no group will be trusted. This option can only be used when gLExec does the userswitch. It can be useful if the script is located on an NFS with root squash.
epilogue_timeout The epilogue executable will run for at most this timeout in seconds, before being sent a SIGTERM (and SIGKILL). Default: 300 seconds.

Runtime Environment

The epilogue runs with the same cleaned environment as gLExec sets up for the payload, with a number of additional variables, all starting with GLEXEC_EPILOG_. Any variables setup before gLExec starting with GLEXEC_EPILOG_ will be cleared before the epilogue is run.

GLEXEC_EPILOG_ARGV<N> argv of payload
GLEXEC_EPILOG_GLEXEC_USER calling user username
GLEXEC_EPILOG_GLEXEC_GROUP calling user's primary groupname
GLEXEC_EPILOG_GLEXEC_UID calling user's uid
GLEXEC_EPILOG_GLEXEC_GID calling user's primary gid
GLEXEC_EPILOG_GLEXEC_SGIDS calling user's secondary gids, colon separated
GLEXEC_EPILOG_TARGET_USER target user's username
GLEXEC_EPILOG_TARGET_GROUP target user's primary groupname
GLEXEC_EPILOG_TARGET_UID target user's uid
GLEXEC_EPILOG_TARGET_GID target user's primary gid
GLEXEC_EPILOG_TARGET_SGIDS target user's secondary gids, colon separated
GLEXEC_EPILOG_GLEXEC_PID lingering gLExec process ID
GLEXEC_EPILOG_GLEXEC_SID lingering gLExec session ID
GLEXEC_EPILOG_GLEXEC_PGID lingering gLExec process group
GLEXEC_EPILOG_TARGET_PID payload process ID
GLEXEC_EPILOG_TARGET_PGID payload process group
GLEXEC_EPILOG_TARGET_RC payload exit code

Security

  • In order to prevent tampering with the epilogue binary or script, the permissions need to be such, that only the root user and optionally epilogue user, has write access to the file or one of its path members (it is "trusted-root").
  • GLExec becomes immune to signals from any user but root.
  • It is important to note that writing a epilogue should be done with utmost care:
    • it will be ran (normally) by root user
    • it is triggered automatically
    • blindly killing all processes from the payload user can kill good processes
    • ...
  • Logging should be done in a secure way, e.g. to either syslog or to a trusted file location.

Example usage

Providing a catch-all example script is not possible as this heavily depends on site details. Sites might want to have a look at Nikhef's reaper script, intended to clean up daemonized processes after a grid job has finished.

Another example script that can be used after necessary changes to clean up remaining subprocesses after the payload has finished. It assumes that gLExec uses process groups (default behaviour).

#!/bin/sh
# Example epilogue script for use with gLExec 0.9.
#
# DISCLAIMER: THIS SCRIPT IS INTENDED AS EXAMPLE CODE ONLY
# DO NOT USE WITHOUT PROPERLY UNDERSTANDING WHAT IT IS DOING AND MAKING ALL NECESSARY CHANGES.
#
# Nikhef and/or the author do NOT except any liability for any damage that might occur when using this script.
#
# Author: Mischa Salle <msalle@nikhef.nl>
#

# Location of logfile
LOGFILE=/var/log/glexec/epilog.log

# Some useful variables
LOGDIR=$(dirname $LOGFILE)
PROG=$(basename $0)
FMT="%b %e %H:%M:%S $(hostname) $PROG[$$]: "
EPIL_UNAME=$(whoami)
EPIL_UID=$(id -u $epil_uname)

# Create log directory and file if not exist and make sure at least logfile has
# right permission
[ ! -d $LOGDIR ] && { mkdir -p -m 0700 $LOGDIR || exit 1; }
[ ! -f $LOGFILE ] && { touch $LOGFILE || exit 1; }
chmod 0600 $LOGFILE || exit 1

# Log general information
echo $(date +"$FMT") "Running as uid $EPIL_UID ($EPIL_UNAME) for" \
    "glexec $GLEXEC_EPILOG_GLEXEC_PID," \
    "payload $GLEXEC_EPILOG_TARGET_PID" \
    "uid $GLEXEC_EPILOG_TARGET_UID ($GLEXEC_EPILOG_TARGET_USER)" >> $LOGFILE

# Get process IDs of all processes in the payload process group running as the
# target user. Note: this is no guarantee to catch all processes as the payload
# could have started a new process group.
pids=$(ps -opid=,pgid= -u $GLEXEC_EPILOG_TARGET_UID|grep " ${GLEXEC_EPILOG_TARGET_PGID}$"|cut -c-5)
if [ -n "$pids" ]; then
    echo $(date +"$FMT") "Sending SIGKILL to PIDs: `echo $pids`" >> $LOGFILE
    kill_output="$(kill -9 $pids 2>&1)"
    if [ $? -ne 0 ];then
        # Log error but do not fail epilogue on it
	echo $(date +"$FMT") "$kill_output" >> $LOGFILE
    fi
fi

# 1 second gracetime
sleep 1

# Log remaining process IDs of processes running as the payload user: these
# might have come from this or other jobs
rem_pids=$(ps -opid= -u $GLEXEC_EPILOG_TARGET_UID)
if [ -n "$rem_pids" ];then
    echo $(date +"$FMT") "Remaining processes from uid" \
	"$GLEXEC_EPILOG_TARGET_UID:" $rem_pids >> $LOGFILE
else
    echo $(date +"$FMT") "No remaining processes from uid" \
	"$GLEXEC_EPILOG_TARGET_UID" >> $LOGFILE
fi

# All done
exit 0