DANS Job Scripts
The DANS Job Scripts
Currently there are 5 scripts for the entire DANS workflow, divided over each of the phases of the DANS workflow. There are also 2 job monitoring scripts, which are used in both phase 2 and phase 3 of the DANS workflow.
- Phase 1: Data Upload
- gen-tar-list
- upload-tar
- Phase 2: Data Compress
- compress-tar
- Phase 3: Data Verification
- verify-tar
- compare-checksums
- Job monitoring
- job-status
- job-info
Each job that is submitted by the 'compress-tar' and 'check-tar' scripts is registered in the DANS job directory.
These scripts, as well as the layout of the DANS job directory are described on this page.
gen-tar-list script
Before an archive can be uploaded to the grid a listing of all entries needs to be made. This listing is split across multiple tarballs (.tar files) so that each tarball archive is at least 8 GB in size. The gen-tar-list script processes a full directory listing and splits it into separate '$ARCHIVE-nnnn.tar.lst' files, where 'nnnn' is a counter starting at 1. The gen-tar-list takes a single argument
$ ./gen-tar-list ${ARCHIVE}-files.txt
but if no argument is specified then the name of the current archive is determined from the directory in which the 'gen-tar-list' script itself is located. Thus, if a copy of or symlink to the gen-tar-list script is in the directory
$HOME/dans/soundbites
then the archive name is decuded as 'soundbites'.
upload-tar script
Before an archive can be uploaded to the grid a listing of all entries needs to be made. This listing is split across multiple tarballs (.tar files) so that each tarball archive is at least 8 GB in size. The gen-tar-list script processes a full directory listing and splits it into separate '$ARCHIVE-nnnn.tar.lst' files, where 'nnnn' is a counter starting at 1. The gen-tar-list takes a single argument
$ ./gen-tar-list ${ARCHIVE}-files.txt
but if no argument is specified then the name of the current archive is determined from the directory in which the 'gen-tar-list' script itself is located. Thus, if a copy of or symlink to the gen-tar-list script is in the directory
$HOME/dans/soundbites
then the archive name is decuded as 'soundbites'.
job-status script
After jobs have been submitted to the grid you can track the status of these jobs using the 'job-status' script. It will scan the DANS job directory for all active jobs and will query the status of each of them. If a job has finished the 'job-status' script will retrieve the output automatically and will also record the job logging info in its corresponding DANS job directory. If there are no active jobs then the 'job-status' script performs no actions.
$ ./job-status -h ./job-status - check the status of all DANS 'RACM' jobs in /home/janjust/dans/gridjobs Usage: ./job-status [-q|--quiet] [-d|--debug] [-k|--keepgoing] [--jobdir dir] Where: --keepgoing tells ./job-status to keep going after an error --jobdir dir overrules the default value of the JOBDIR variable
The flag '-q' or '--quiet' suppresses a lot of output. The flag '-d' or '--debug' produces a lot of extra output. You can combine '-q' and '-d'.
job-info script
The 'job-info' script can be used to display the status of all jobs in the DANS job directory. It will only display information and will not query active jobs. This script is intended for troubleshooting purposes mostly.
$ ./job-info -h ./job-info - print info on all DANS 'RACM' jobs in /home/janjust/dans/gridjobs Usage: ./job-info [-q|--quiet] [-d|--debug] [-k|--keepgoing] [--jobdir dir] Where: --keepgoing tells ./job-info to keep going after an error --jobdir dir overrules the default value of the JOBDIR variable
The flag '-q' or '--quiet' suppresses a lot of output. The flag '-d' or '--debug' produces a lot of extra output. You can combine '-q' and '-d'.
The DANS job directory
The default location of the DANS job directory is $HOME/dans/gridjobs. Underneath this directory you will find directories with the name of the DANS jobid. These jobids currently are 5 digit numbers, e.g. 00128. In each of the job directories the following information is recorded:
- the job status
- the job's JDL file
- the jobid as seen by the grid WMS
For jobs that have completed the following files are also stored:
- the log of the 'glite-wms-get-output' command which was used to retrieve the job output
- the entire job log that was retrieved using the 'glite-wms-get-logging-info' command
- a directory 'output' where the output files of that job are stored.
An example: gridjob 00128
The DANS grid job #00128 was run on June 12th 2012. It was a job to verify the MD5 checksums of an archive which was uploaded previously. The job completed successfully. This can all be seen by looking at the contents of the job directory:
$HOME/dans/gridjobs: Status=Cleared jdl job-get-output.log job-logging-info.log jobid output/adler32sums.txt output/md5sums.tar.gz output/stderror output/stdout
The first entry, 'Status=Cleared', is actually an empty file and it reflects the state of the job. The 'Status=Cleared' means that the job has completed and that its output was downloaded ("cleared") from the WMS. The following entries are possible for the 'Status=...' file:
- Status=Submitted : means the job has been submitted but has not been scheduled yet
- Status=Scheduled : means the job has been accepted by a grid site and is scheduled for execution
- Status=Running, Status=ReallyRunning : means the job is now actively running. The difference between 'Running' and 'ReallyRunning' is mostly a historic artefact.
- Status=Done : means the job has completed; the script that was run might have returned an error, but the WMS and batch system now consider the job 'done'.
- Status=Cleared : means the job has completed and that its output was downloaded ("cleared") from the WMS.
- Status=Aborted : means the job did not run successfully and was aborted by the grid WMS and/or batch system.
Normally a job directory contains only a single 'Status=...' entry.
The file 'jdl' is the Job Description Language file that was used during the submission of the job. The contents of this file are:
Executable = "check-archive.sh"; Arguments = "RACM 1156 1167"; Stdoutput = "stdout"; StdError = "stderror"; InputSandbox = { "check-archive.sh", "adler32sum", "md5deep" }; OutputSandbox = { "stdout", "stderror", "adler32sums.txt", "md5sums.tar.gz" }; Requirements = other.GlueCEPolicyMaxCPUTime >= 300;
The JDL file shows us that this was a 'check-archive' job to verify the .tar.gz files from the archive RACM, numbered RACM-1156.tar.gz upto and including RACM-1167.tar.gz. The input sandbox files lists the 'check-archive.sh' script itself and two binary programs that the script needs during execution. The first binary is used to calculate the ADLER32 checksum of the .tar.gz file itself, the second is the command used to calculate the MD5 checksums of an entire directory tree. The requested output files are 'stdout', 'stderror', 'adler32sums.txt' and 'md5sums.tar.gz'. These are the file we will expect in the job output directory.
The 'job-get-output.log' and 'job-logging-info.log' files are the log files from the commands 'glite-wms-job-output' and 'glite-wms-job-logging-info'. These commands are called by the 'job-status' script after the job has finished running. These log files are normally not needed but they contain a lot of debugging and troubleshooting information about the job itself. For example, information on when and where the job was run can be found in the 'job-logging-info.log' file.
The 'jobid' file is the jobid as recorded by the 'glite-wms-job-submit' command. It is only needed during while the job is running, as the jobid is discarded by the WMS after the job has been "cleared" using the 'glite-wms-job-output' command.
The directory 'output' contains the expected output files. The file 'adler32sums.txt' is the list of ADLER32 checksums of all .tar.gz files that have been verified. The 'mdsums.tar.gz' file is a tarball containing the MD5 checksums files of each .tar.gz file that has been verified. The reason that the RACM-*.tar.md5sum files are again stored in a single .tar.gz file is that it simplifies the JDL file used for submission.