Various local tools

From PDP/Grid Wiki
Revision as of 13:58, 15 November 2013 by Ronalds@nikhef.nl (talk | contribs) (Created page with "This article present some local tools for use with the Torque resource manager. These tools are working with Torque version 2.3.8. Other versions may require modifications. ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This article present some local tools for use with the Torque resource manager. These tools are working with Torque version 2.3.8. Other versions may require modifications.


when_idle

Purpose: execute a command when a node has drained.

Description: this script is executed from the epilogue script. At the end of each job, the script checks whether the following conditions are met:

  • The node is offline and the offline comment matches a certain pattern (default: when_idle)
  • The node is idle (contains no running batch jobs)

If the node is offline and idle, a script provided by the administrator can be executed if its md5sum is not present in an archive. This ensures that the scripts gets executed only once. After execution of the administrator's script, its md5sum is added to the archive and message is written to syslog. After a certain delay (15s), the node's status is cleared (i.e., no longer offline).

At node boot into run level 3, it is checked whether the node was offline with tag "when_idle". If that is the case, after a delay (600s) the node state is cleared and a message is sent to syslog. This handles the situation when the administrator's script reboots the node before the node's state could be cleared, which would keep the node offline.

Example: reboot the worker node when all batch jobs have finished, for example to load a new kernel.

# 2013-09-04
# Substitute <DATE> to force a different checksum
# Put the action here
reboot


prune_userprocs


mom-taskset