Job collections

From BiGGrid Wiki
Jump to navigation Jump to search

When you have a large number of jobs to submit we strongly advise you to use job collections or parametric jobs. This will reduce the overhead for job submission and reduce the load on the WMS, the software module which handles job submissions.

A job collection is a set of mutually independent jobs that is submitted, monitored and controlled as a single request. A good reason for using job collections is that the sub-jobs have common input files. The WMS allows for sharing and inheriting sandboxes. This means that a only single copy of each file will be transferred even when it is used in many sub-jobs.

GOOD PRACTICE: As it may happen that jobs fail, throwing enormous amounts of jobs into a job collection make it a tedious activity to manage those that failed. At the same we have experienced some subsystems to choke as a result of 1000+ jobs. So start small. Begin with 50 and eventually scale it up.


Example Job Collection

$ cat collection_ex.jdl
[
  Type = "collection";
  InputSandbox = {
    "input_common1.txt",
    "input_common2.txt"
  };
nodes = {
   [
     JobType = "Normal";
     NodeName = "node1";
     Executable = "/bin/sh";
     Arguments = "script_node1.sh";
     InputSandbox = {"script_node1.sh",
                      root.InputSandbox[0]
                    };
     StdOutput = "myoutput1";
     StdError  = "myerror1";
     OutputSandbox = {"myoutput1","myerror1"};
     ShallowRetryCount = 1;
   ],[
     JobType = "Normal";
     NodeName = "node2";
     Executable = "/bin/sh";
     InputSandbox = {"script_node2.sh",
                    root.InputSandbox[1]
                  };
     Arguments = "script_node2.sh";
     StdOutput = "myoutput2";
     StdError  = "myerror2";
     OutputSandbox = {"myoutput2","myerror2"};
    ShallowRetryCount = 1;
   ],[
     JobType = "Normal";
     NodeName = "node3";
     Executable = "/bin/cat";
     InputSandbox = {root.InputSandbox};
     Arguments = "*.txt";
     StdOutput = "myoutput3";
     StdError  = "myerror3";
     OutputSandbox = {"myoutput3","myerror3"};
    ShallowRetryCount = 1;
   ]
 };
]

In this example, three jobs are run. All have a common InputSandbox, from which node1 and node2 both inherit one file each. The third job "node3" inherits the full InputSandbox and uses the content of both files as arguments to the unix cat command. The bash scripts just print out some info and echo the contents of every txt file present in the working directory. Before submitting create the common input files by creating two files with a content of your choice.

$ echo "first input" > input_common1.txt
$ echo "2nd input" > input_common2.txt
$ cat script_node1.sh

#!/bin/sh

echo "Current date is `date`"
echo "Dumping now input files"
echo "**********************"
cat *.txt
$
$ cat script_node2.sh
#!/bin/sh

echo "Running machine is `hostname`"
ls -l
echo "Dumping now input files"
echo "**********************"
cat *.txt

Submit and monitor the job. It's helpful if you save the job ids in a file, jobId in this case.

$ glite-wms-job-submit -d $USER -o jobId collection_ex.jdl

Connecting to the service https://wms.grid.sara.nl:7443/glite_wms_wmproxy_server

====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://wms.grid.sara.nl:9000/06zotEhQcQavkMOrrengnw

The job identifier has been saved in the following file:
/home/mgjansen/WMProxy_ex/jobId

==========================================================================

$ glite-wms-job-status -i jobId

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://wms.grid.sara.nl:9000/06zotEhQcQavkMOrrengnw
Current Status:     Running
Status Reason:      unavailable
Destination:        dagman
Submitted:          Thu Jul 20 10:12:14 2006 CEST
*************************************************************

- Nodes information:
    Status info for the Job : https://wms.grid.sara.nl:9000/Oya0mK9e41_8gFsjXi_FHQ
    Node Name:          node1
    Current Status:     Ready
    Destination:        celisa.grid.sara.nl:2119/jobmanager-lcgpbs-long
    Submitted:          Thu Jul 20 10:12:14 2006 CEST
*************************************************************
    Status info for the Job : https://wms.grid.sara.nl:9000/SYh5ArEgzExv-30lk0A7eA
    Node Name:          node2
    Current Status:     Waiting
    Destination:        ce.gina.sara.nl:2119/jobmanager-lcgpbs-short
    Submitted:          Thu Jul 20 10:12:14 2006 CEST
*************************************************************
    Status info for the Job : https://glite-rb3.ct.infn.it:9000/rjIEqDoIxLAP12yWrXszZw
    Node Name:          node3
    Current Status:     Ready
    Destination:        gb-ce-ams.els.sara.nl:2119/jobmanager-lcgpbs-short
    Submitted:          Thu Jul 20 10:12:14 2006 CEST
*************************************************************

Note that the node names are the ones specified in JDL file. When all the jobs are finished, download and verify the job output. The following example uses --dir to create a new directory for the output:

glite-wms-job-output --dir ./myOp -i jobId

Related