Job collections
When you have a large number of jobs to submit we strongly advise you to use job collections or parametric jobs. This will reduce the overhead for job submission and reduce the load on the WMS, the software module which handles job submissions.
A job collection is a set of mutually independent jobs that is submitted, monitored and controlled as a single request. A good reason for using job collections is that the sub-jobs have common input files. The WMS allows for sharing and inheriting sandboxes. This means that a only single copy of each file will be transferred even when it is used in many sub-jobs.
GOOD PRACTICE: As it may happen that jobs fail, throwing enormous amounts of jobs into a job collection make it a tedious activity to manage those that failed. At the same we have experienced some subsystems to choke as a result of 1000+ jobs. So start small. Begin with 50 and eventually scale it up. |
Example Job Collection
$ cat collection_ex.jdl [ Type = "collection"; InputSandbox = { "input_common1.txt", "input_common2.txt" }; nodes = { [ JobType = "Normal"; NodeName = "node1"; Executable = "/bin/sh"; Arguments = "script_node1.sh"; InputSandbox = {"script_node1.sh", root.InputSandbox[0] }; StdOutput = "myoutput1"; StdError = "myerror1"; OutputSandbox = {"myoutput1","myerror1"}; ShallowRetryCount = 1; ],[ JobType = "Normal"; NodeName = "node2"; Executable = "/bin/sh"; InputSandbox = {"script_node2.sh", root.InputSandbox[1] }; Arguments = "script_node2.sh"; StdOutput = "myoutput2"; StdError = "myerror2"; OutputSandbox = {"myoutput2","myerror2"}; ShallowRetryCount = 1; ],[ JobType = "Normal"; NodeName = "node3"; Executable = "/bin/cat"; InputSandbox = {root.InputSandbox}; Arguments = "*.txt"; StdOutput = "myoutput3"; StdError = "myerror3"; OutputSandbox = {"myoutput3","myerror3"}; ShallowRetryCount = 1; ] }; ]
In this example, three jobs are run. All have a common InputSandbox, from which node1 and node2 both inherit one file each. The third job "node3" inherits the full InputSandbox and uses the content of both files as arguments to the unix cat command. The bash scripts just print out some info and echo the contents of every txt file present in the working directory. Before submitting create the common input files by creating two files with a content of your choice.
$ echo "first input" > input_common1.txt $ echo "2nd input" > input_common2.txt $ cat script_node1.sh #!/bin/sh echo "Current date is `date`" echo "Dumping now input files" echo "**********************" cat *.txt $ $ cat script_node2.sh #!/bin/sh echo "Running machine is `hostname`" ls -l echo "Dumping now input files" echo "**********************" cat *.txt
Submit and monitor the job. It's helpful if you save the job ids in a file, jobId in this case.
$ glite-wms-job-submit -d $USER -o jobId collection_ex.jdl Connecting to the service https://wms.grid.sara.nl:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://wms.grid.sara.nl:9000/06zotEhQcQavkMOrrengnw The job identifier has been saved in the following file: /home/mgjansen/WMProxy_ex/jobId ========================================================================== $ glite-wms-job-status -i jobId ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://wms.grid.sara.nl:9000/06zotEhQcQavkMOrrengnw Current Status: Running Status Reason: unavailable Destination: dagman Submitted: Thu Jul 20 10:12:14 2006 CEST ************************************************************* - Nodes information: Status info for the Job : https://wms.grid.sara.nl:9000/Oya0mK9e41_8gFsjXi_FHQ Node Name: node1 Current Status: Ready Destination: celisa.grid.sara.nl:2119/jobmanager-lcgpbs-long Submitted: Thu Jul 20 10:12:14 2006 CEST ************************************************************* Status info for the Job : https://wms.grid.sara.nl:9000/SYh5ArEgzExv-30lk0A7eA Node Name: node2 Current Status: Waiting Destination: ce.gina.sara.nl:2119/jobmanager-lcgpbs-short Submitted: Thu Jul 20 10:12:14 2006 CEST ************************************************************* Status info for the Job : https://glite-rb3.ct.infn.it:9000/rjIEqDoIxLAP12yWrXszZw Node Name: node3 Current Status: Ready Destination: gb-ce-ams.els.sara.nl:2119/jobmanager-lcgpbs-short Submitted: Thu Jul 20 10:12:14 2006 CEST *************************************************************
Note that the node names are the ones specified in JDL file. When all the jobs are finished, download and verify the job output. The following example uses --dir to create a new directory for the output:
glite-wms-job-output --dir ./myOp -i jobId
Related
- Using the Grid/ToPoS Helps you handle and manage the completion of many jobs
- parametric jobs A different way to aggregate your to be submitted jobs