User:Wvengen@nikhef.nl/FSL on the Grid

From PDP/Grid Wiki
< User:Wvengen@nikhef.nl
Revision as of 11:03, 31 December 2012 by Wvengen@nikhef.nl (talk | contribs) (→‎Phase 1: Using FSL with Torque: add script)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

FSL is a comprehensive library of analysis tools for fMRI, MRI and DTI brain imaging data. It is being used by medical users of DutchGrid and packaged as part of the PoC distribution of VL-e.

Version 4.0 of FSL and later support cluster computing directly using the Sun Grid Engine (SGE).

This page contains some notes on an attempt to get FSL running on different clusters than SGE, and possibly finally on a grid infrastructure. Since some SGE qsub options are used that are not directly translatable to other systems, it still requires some effort to get there.

Phase 1: Using FSL with Torque

Torque PBS is a widely used batch system (one could say an alternative to Sun Grid Engine). Enabling the use of FSL with Torque would be a second step to enable users unlock the power of local clusters. Both have the command qsub to submit jobs to a cluster. While behaviour of both is (mostly) according to the Open Group Base Specifications, there are some required options that are not part of this. So the batch system needs to be detected, and the qsub invocation adapted.

fsl_sub command-line options

  • -T <minutes> Estimated job length in minutes, used to auto-set queue name
    • SGE: system administrator must provide mapping of times to queue names
    • Torque: either same as SGE, but it can be autodetected since qstat -Q -f gives resources_max.cput as maximum cpu time for each queue
  • -q <queuename> Queue to submit to
    • Open Group compliant
  • -a <arch-name> Architecture
    • SGE: available
    • Torque: not available
  • -p <job-priority> Lower priority
    • Open Group compliant
  • -M <email-address> Who to email
    • Open Group compliant
  • j <jid> Place a hold on this task until job jid has completed
    • SGE: use argument -hold_jid <jid>, with <jid> either a job id or job name
    • Torque: use argument -W depend=afterok:<jid>, where <jid> must be a job id
  • -t <filename> Specify a task file of commands to execute in parallel
    • SGE: array job using -t 1-<#tasks> and task file as command argument
    • Torque: array job using -t 1-<#tasks>
  • -N <jobname> Specify a jobname as it will appear on queue
    • Open Group compliant
  • -l <logdirname> Where to output logfiles
    • Open Group compliant with -o <path_name> and -e <path_name> options, though here a directory name is assumed instead of a file, of which the behaviour is not specified
    • SGE: allows using a directory, and creates the standard filenames in there
    • Torque: does not allow a directory according to the manual page
  • -m <mailoptions> Change the SGE mail options, see qsub for details
    • Open Group compliant
  • -F Use flags embedded in scripts to set SGE queing options
    • Open Group idea of -C directive_prefix with default prefix, which is the default. This flag to fsl_sub disables the 'automatic' batch options
  • -v verbose mode
    • fsl_sub option to log what it's doing; no qsub effects

A start was made with a script that works with both SGE and Torque, but this is far from finished. Once issue is that job dependency references are handled differently in SGE and Torque. File:Fsl sub.sh

Phase 2: Using FSL with gqsub

gqsub would be a convenient way to use the sun grid engine interface while operating on the grid. This would enable one to use FSL using the large distributed computing grid. Management of files is the culprit here, since the grid doesn't have a shared directory in general.

Links