GLExec Epilogue Functionality

From PDP/Grid Wiki
Jump to navigationJump to search

Starting from version 0.9 gLExec can optionally run an epilogue executable after the payload has finished.


In linger mode, gLExec can optionally run a trusted executable, intended to clean up the payload environment. Whether it will run is triggered by the glexec option epilogue. The option should point to the absolute path of a trusted executable: it must not be possible for anyone except the root user (or the epilogue_user and/or members of the epilogue_group when set) to change the executable. It will run as uid/gid 0,0 (unless epilogue_user and/or epilogue_group are set). If it does not finish within a set epilogue_timeout, it will be send a SIGTERM. For proper functioning it is advised that gLExec will do the userswitch (instead of LCMAPS).

If the epilogue fails for whatever reason, gLExec will return either with a 202 exit code (internal gLExec error) or potentially a 204 (e.g. when the epilog itself returned a 201-204 range exit code).


The epilogue runs with stdin, stdout and stderr all attached to /dev/null. There is no special logging functionality implemented and this is left to the developer of the epilogue code.

Configuration options

The epilogue can be configured using the glexec.conf settings

epilogue when set, the name of the trusted binary or script to run. Needs to be a absolute canonical path
epilogue_user When set, the epilogue will be run with this user identity. In addition this user is allowed to have write permission for the epilogue executable (i.e. is trusted). This option can only be used when gLExec does the userswitch. It can be useful if the script is located on an NFS with root squash. Default: root.
epilogue_group When set, the epilogue will be run with this group identity. In addition members of this group are allowed to have write permission for the epilogue executable (i.e. are trusted). When unset, the executable will be run with GID 0 and no group will be trusted. This option can only be used when gLExec does the userswitch. It can be useful if the script is located on an NFS with root squash.
epilogue_timeout The epilogue executable will run for at most this timeout in seconds, before being sent a SIGTERM (and SIGKILL). Default: 300 seconds.

Runtime Environment

The epilogue runs with the same cleaned environment as gLExec sets up for the payload, with a number of additional variables, all starting with GLEXEC_EPILOG_. Any variables setup before gLExec starting with GLEXEC_EPILOG_ will be cleared before the epilogue is run.

GLEXEC_EPILOG_ARGV<N> argv of payload
GLEXEC_EPILOG_GLEXEC_USER calling user username
GLEXEC_EPILOG_GLEXEC_GROUP calling user's primary groupname
GLEXEC_EPILOG_GLEXEC_UID calling user's uid
GLEXEC_EPILOG_GLEXEC_GID calling user's primary gid
GLEXEC_EPILOG_GLEXEC_SGIDS calling user's secondary gids, colon separated
GLEXEC_EPILOG_TARGET_USER target user's username
GLEXEC_EPILOG_TARGET_GROUP target user's primary groupname
GLEXEC_EPILOG_TARGET_UID target user's uid
GLEXEC_EPILOG_TARGET_GID target user's primary gid
GLEXEC_EPILOG_TARGET_SGIDS target user's secondary gids, colon separated
GLEXEC_EPILOG_GLEXEC_PID lingering gLExec process ID
GLEXEC_EPILOG_GLEXEC_SID lingering gLExec session ID
GLEXEC_EPILOG_GLEXEC_PGID lingering gLExec process group
GLEXEC_EPILOG_TARGET_PGID payload process group
GLEXEC_EPILOG_TARGET_RC payload exit code


  • In order to prevent tampering with the epilogue binary or script, the permissions need to be such, that only the root user and optionally epilogue user, has write access to the file or one of its path members (it is "trusted-root").
  • GLExec becomes immune to signals from any user but root.
  • It is important to note that writing a epilogue should be done with utmost care:
    • it will be ran (normally) by root user
    • it is triggered automatically
    • blindly killing all processes from the payload user can kill good processes
    • ...
  • Logging should be done in a secure way, e.g. to either syslog or to a trusted file location.

Example usage

Providing a catch-all example script is not possible as this heavily depends on site details. Sites might want to have a look at Nikhef's reaper script, intended to clean up daemonized processes after a grid job has finished.

Another example script that can be used after necessary changes to clean up remaining subprocesses after the payload has finished. It assumes that gLExec uses process groups (default behaviour).

# Example epilogue script for use with gLExec 0.9.
# Nikhef and/or the author do NOT except any liability for any damage that might occur when using this script.
# Author: Mischa Salle <>

# Location of logfile

# Some useful variables
LOGDIR=$(dirname $LOGFILE)
PROG=$(basename $0)
FMT="%b %e %H:%M:%S $(hostname) $PROG[$$]: "
EPIL_UID=$(id -u $epil_uname)

# Create log directory and file if not exist and make sure at least logfile has
# right permission
[ ! -d $LOGDIR ] && { mkdir -p -m 0700 $LOGDIR || exit 1; }
[ ! -f $LOGFILE ] && { touch $LOGFILE || exit 1; }
chmod 0600 $LOGFILE || exit 1

# Log general information
echo $(date +"$FMT") "Running as uid $EPIL_UID ($EPIL_UNAME) for" \

# Get process IDs of all processes in the payload process group running as the
# target user. Note: this is no guarantee to catch all processes as the payload
# could have started a new process group.
pids=$(ps -opid=,pgid= -u $GLEXEC_EPILOG_TARGET_UID|grep ${GLEXEC_EPILOG_TARGET_PGID}\$|cut -c-5)
if [ -n "$pids" ]; then
    echo $(date +"$FMT") "Sending SIGKILL to PIDs: `echo $pids`" >> $LOGFILE
    kill_output="$(kill -9 $pids 2>&1)"
    if [ $? -ne 0 ];then
        # Log error but do not fail epilogue on it
	echo $(date +"$FMT") "$kill_output" >> $LOGFILE

# 1 second gracetime
sleep 1

# Log remaining process IDs of processes running as the payload user: these
# might have come from this or other jobs
rem_pids=$(ps -opid= -u $GLEXEC_EPILOG_TARGET_UID)
if [ -n "$rem_pids" ];then
    echo $(date +"$FMT") "Remaining processes from uid" \
    echo $(date +"$FMT") "No remaining processes from uid" \

# All done
exit 0