Difference between revisions of "Stoomboot"

From Atlas Wiki
Jump to navigation Jump to search
Line 32: Line 32:
 
directory. The returned string is the job identifier and can be used to look
 
directory. The returned string is the job identifier and can be used to look
 
up the status of the job, or to manipulate it later.  
 
up the status of the job, or to manipulate it later.  
 
The <tt>qstat</tt> command shows the stats of all your jobs. Status code 'C' indicates completed, 'R' indicates running and 'Q' indicates queued.
 
 
<pre>
 
unix> qstat
 
Job id              Name            User            Time Use S Queue
 
------------------- ---------------- --------------- -------- - -----
 
9714.allier        test.sh          verkerke        00:00:00 C test
 
</pre>
 
  
 
The output of the job appears in files named <tt><jobname>.o<number></tt>, e.g. <tt>test.sh.o9714</tt> in example of previous page. The following default settings
 
The output of the job appears in files named <tt><jobname>.o<number></tt>, e.g. <tt>test.sh.o9714</tt> in example of previous page. The following default settings
Line 60: Line 51:
  
 
* ''Pass all environment variables of submitting shell to batch job (with exception of <tt>$PATH</tt>)''.  Add option <tt>-V</tt> to <tt>qsub</tt> command
 
* ''Pass all environment variables of submitting shell to batch job (with exception of <tt>$PATH</tt>)''.  Add option <tt>-V</tt> to <tt>qsub</tt> command
 +
 +
=== Examining the status of your jobs ===
 +
 +
The <tt>qstat</tt> command shows the stats of all your jobs. Status code 'C' indicates completed, 'R' indicates running and 'Q' indicates queued.
 +
 +
<pre>
 +
unix> qstat
 +
Job id              Name            User            Time Use S Queue
 +
------------------- ---------------- --------------- -------- - -----
 +
9714.allier        test.sh          verkerke        00:00:00 C test
 +
</pre>
 +
 +
The <tt>qstat</tt> command only shows your own jobs, not those of other users.
 +
Only completed jobs that completed less than 10 minutes ago are listsed with status 'C'.
 +
Output of jobs that completed longer ago is kept, but they are simply no longer
 +
listed in the status overview.

Revision as of 13:32, 8 December 2008

What is stoomboot

Stoomboot is a batch farm for local use at NIKHEF. It is in principle open to all NIKHEF users, but a login account does not give automatic access to stoomboot. Contact helpdesk@nikhef.nl to gain access

Hardware

Stoomboot consists of 16 nodes (stbc-01 through stbc-16) that are each a equipped with dual quad-core Intel Xeon E5335 2.0 Ghz processors and 16 Gb of memory. The total number of cores is 128.

Software & disk access

All stoomboot nodes run Scientific Linux 4.7. All NFS mountable disks at NIKHEF are visible (/project/* and /data/*). Stoomboot does not run AFS so no AFS directories including /afs/cern.ch are not visible.

How to use stoomboot

Submitting batch jobs

Stoomboot is a batch-only facilities and jobs can be submitted through the PBS qsub command

unix> qsub test.sh
9714.allier.nikhef.nl

The argument passed to qsub is a script that will be executed in your home directory. The returned string is the job identifier and can be used to look up the status of the job, or to manipulate it later.

The output of the job appears in files named <jobname>.o<number>, e.g. test.sh.o9714 in example of previous page. The following default settings apply when you submit a batch job

  • Job runs in home directory ($HOME)
  • Job starts with clean shell (any environment variable from the shell from which you submit are not transferred to batch job) E.g. if you need ATLAS software setup, it should be done in the submitted script
  • Job output (stdout) is sent to a file in directory in which job was submitted. Job stderr output is sent to separate file E.g. for example of previous slide file test.sh.o9714 contains stdout and file test.sh.e9714 contains stderr. If there is no stdout or stderr, an empty file is created
  • A mail is sent to you the output files cannot be created

Here is a listed of frequently desired changes in default behavior and their corresponding option in qsub

  • Merge stdout and stderr in a single file. Add option -j oe to qsub command (single file *.o* is written)
  • Choose batch queue. Right now there are two queues: test (30 min) and qlong (48h) Add option -q <queuename> to qsub command
  • Choose different output file for stdout. Add option -o <filename> to qsub command
  • Pass all environment variables of submitting shell to batch job (with exception of $PATH). Add option -V to qsub command

Examining the status of your jobs

The qstat command shows the stats of all your jobs. Status code 'C' indicates completed, 'R' indicates running and 'Q' indicates queued.

unix> qstat
Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
9714.allier         test.sh          verkerke        00:00:00 C test

The qstat command only shows your own jobs, not those of other users. Only completed jobs that completed less than 10 minutes ago are listsed with status 'C'. Output of jobs that completed longer ago is kept, but they are simply no longer listed in the status overview.