Difference between revisions of "User:Wvengen@nikhef.nl/FSL on the Grid"
m (markup typo) |
(→Phase 1: Using FSL with Torque: add script) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
Version 4.0 of FSL and later [http://www.fmrib.ox.ac.uk/fsl/fsl/downloading.html#sge support] cluster computing directly using the [http://gridengine.sunsource.net/ Sun Grid Engine] (SGE). | Version 4.0 of FSL and later [http://www.fmrib.ox.ac.uk/fsl/fsl/downloading.html#sge support] cluster computing directly using the [http://gridengine.sunsource.net/ Sun Grid Engine] (SGE). | ||
− | This page | + | This page contains some notes on an attempt to get FSL running on different clusters than SGE, and possibly finally on a grid infrastructure. Since some SGE qsub options are used that are not directly translatable to other systems, it still requires some effort to get there. |
== Phase 1: Using FSL with Torque == | == Phase 1: Using FSL with Torque == | ||
− | [http://www.clusterresources.com/products/torque/ Torque PBS] is a widely used batch system (one could say an alternative to Sun Grid Engine). Enabling the use of FSL with Torque would be a | + | [http://www.clusterresources.com/products/torque/ Torque PBS] is a widely used batch system (one could say an alternative to Sun Grid Engine). Enabling the use of FSL with Torque would be a second step to enable users unlock the power of local clusters. Both have the command <tt>qsub</tt> to submit jobs to a cluster. While behaviour of both is (mostly) according to the [http://www.opengroup.org/onlinepubs/9699919799/utilities/qsub.html Open Group Base Specifications], there are some required options that are not part of this. So the batch system needs to be detected, and the <tt>qsub</tt> invocation adapted. |
=== <tt>fsl_sub</tt> command-line options === | === <tt>fsl_sub</tt> command-line options === | ||
Line 22: | Line 22: | ||
** Open Group compliant | ** Open Group compliant | ||
* '''<tt>j <jid></tt>''' Place a hold on this task until job jid has completed | * '''<tt>j <jid></tt>''' Place a hold on this task until job jid has completed | ||
− | ** ''SGE:'' use argument <tt>-hold_jid <jid></tt> | + | ** ''SGE:'' use argument <tt>-hold_jid <jid></tt>, with <jid> either a job id or job name |
− | ** ''Torque:'' use argument <tt>-W depend=afterok:<jid></tt> | + | ** ''Torque:'' use argument <tt>-W depend=afterok:<jid></tt>, where <jid> must be a job id |
* '''<tt>-t <filename></tt>''' Specify a task file of commands to execute in parallel | * '''<tt>-t <filename></tt>''' Specify a task file of commands to execute in parallel | ||
** ''SGE:'' array job using <tt>-t 1-<#tasks></tt> and task file as command argument | ** ''SGE:'' array job using <tt>-t 1-<#tasks></tt> and task file as command argument | ||
Line 39: | Line 39: | ||
* '''<tt>-v</tt>''' verbose mode | * '''<tt>-v</tt>''' verbose mode | ||
** <tt>fsl_sub</tt> option to log what it's doing; no <tt>qsub</tt> effects | ** <tt>fsl_sub</tt> option to log what it's doing; no <tt>qsub</tt> effects | ||
+ | |||
+ | A start was made with a script that works with both SGE and Torque, but this is far from finished. Once issue is that job dependency references are handled differently in SGE and Torque. [[File:Fsl_sub.sh]] | ||
== Phase 2: Using FSL with gqsub == | == Phase 2: Using FSL with gqsub == | ||
− | + | [http://www.scotgrid.ac.uk/gqsub/ gqsub] would be a convenient way to use the sun grid engine interface while operating on the grid. This would enable one to use FSL using the large distributed computing grid. Management of files is the culprit here, since the grid doesn't have a shared directory in general. | |
== Links == | == Links == | ||
Line 55: | Line 57: | ||
** [http://biowulf.nih.gov/apps/fsl.html FSL on Beowulf] | ** [http://biowulf.nih.gov/apps/fsl.html FSL on Beowulf] | ||
* [https://gforge.vl-e.nl/plugins/scmcvs/cvsweb.php/~checkout~/report/fslwgs-report.pdf?rev=1.8;content-type=document%2Fpdf;cvsroot=fslwgs Using FSL on the Grid] (without cluster engine) | * [https://gforge.vl-e.nl/plugins/scmcvs/cvsweb.php/~checkout~/report/fslwgs-report.pdf?rev=1.8;content-type=document%2Fpdf;cvsroot=fslwgs Using FSL on the Grid] (without cluster engine) | ||
+ | * On [http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto SGE] and [http://www.clusterresources.com/torquedocs/2.1jobsubmission.shtml Torque] array jobs |
Latest revision as of 10:03, 31 December 2012
FSL is a comprehensive library of analysis tools for fMRI, MRI and DTI brain imaging data. It is being used by medical users of DutchGrid and packaged as part of the PoC distribution of VL-e.
Version 4.0 of FSL and later support cluster computing directly using the Sun Grid Engine (SGE).
This page contains some notes on an attempt to get FSL running on different clusters than SGE, and possibly finally on a grid infrastructure. Since some SGE qsub options are used that are not directly translatable to other systems, it still requires some effort to get there.
Phase 1: Using FSL with Torque
Torque PBS is a widely used batch system (one could say an alternative to Sun Grid Engine). Enabling the use of FSL with Torque would be a second step to enable users unlock the power of local clusters. Both have the command qsub to submit jobs to a cluster. While behaviour of both is (mostly) according to the Open Group Base Specifications, there are some required options that are not part of this. So the batch system needs to be detected, and the qsub invocation adapted.
fsl_sub command-line options
- -T <minutes> Estimated job length in minutes, used to auto-set queue name
- SGE: system administrator must provide mapping of times to queue names
- Torque: either same as SGE, but it can be autodetected since qstat -Q -f gives resources_max.cput as maximum cpu time for each queue
- -q <queuename> Queue to submit to
- Open Group compliant
- -a <arch-name> Architecture
- SGE: available
- Torque: not available
- -p <job-priority> Lower priority
- Open Group compliant
- -M <email-address> Who to email
- Open Group compliant
- j <jid> Place a hold on this task until job jid has completed
- SGE: use argument -hold_jid <jid>, with <jid> either a job id or job name
- Torque: use argument -W depend=afterok:<jid>, where <jid> must be a job id
- -t <filename> Specify a task file of commands to execute in parallel
- SGE: array job using -t 1-<#tasks> and task file as command argument
- Torque: array job using -t 1-<#tasks>
- -N <jobname> Specify a jobname as it will appear on queue
- Open Group compliant
- -l <logdirname> Where to output logfiles
- Open Group compliant with -o <path_name> and -e <path_name> options, though here a directory name is assumed instead of a file, of which the behaviour is not specified
- SGE: allows using a directory, and creates the standard filenames in there
- Torque: does not allow a directory according to the manual page
- -m <mailoptions> Change the SGE mail options, see qsub for details
- Open Group compliant
- -F Use flags embedded in scripts to set SGE queing options
- Open Group idea of -C directive_prefix with default prefix, which is the default. This flag to fsl_sub disables the 'automatic' batch options
- -v verbose mode
- fsl_sub option to log what it's doing; no qsub effects
A start was made with a script that works with both SGE and Torque, but this is far from finished. Once issue is that job dependency references are handled differently in SGE and Torque. File:Fsl sub.sh
Phase 2: Using FSL with gqsub
gqsub would be a convenient way to use the sun grid engine interface while operating on the grid. This would enable one to use FSL using the large distributed computing grid. Management of files is the culprit here, since the grid doesn't have a shared directory in general.
Links
- FSL and distribution packaging
- RPM building and the PoC
- Other FSL grid users
- Using FSL on the Grid (without cluster engine)
- On SGE and Torque array jobs