Using GANGA with AMAAthena
Introduction
This guide gives an step-by-step instruction for running AMAAthena within GANGA on a NIKHEF desktop (e.g. ribble.nikhef.nl). AMAAthena is an Athena package providing a framework for modular analysis. GANGA is an official ATLAS grid utility for distributed data analysis.
Preparation
Please follow the AMAAthena guide to setup CMT and checkout AMAAthena package.
Starting GANGA
Typing the following commands within the directory: PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt in a clean shell environment (i.e. no environment setup for Athena and CMT).
% source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF % export DPNS_HOST=tbn18.nikhef.nl % export LFC_HOST=lfc-atlas.grid.sara.nl % source /project/atlas/nikhef/ganga/etc/setup.[c]sh % ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef
HelloWorld jobs
Trying the following commands in the Ganga shell to gets your hands dirty :)
In [n]: j = Job() In [n]: j.backend=PBS() In [n]: j.submit() In [n]: jobs In [n]: j = j.copy() In [n]: j.backend=LCG() In [n]: j.submit() In [n]: jobs
GANGA magic functions for cmtsetup
Inside GANGA, one could deal with the complex CMT setup with two magic functions.
The following example shows how to setup the CMT environment for Athena 14.2.0 in 32 bit mode.
In [n]: config.Athena.CMTHOME = '/path/to/your/cmthome' In [n]: cmtsetup 14.2.0,32 In [n]: setup
Running AMAAthena in GANGA
The example below assumes:
-
Users have the following Athena job option files in the run directory of the AMAAthena package
-
AMAAthena_jobOptions.py
Trigger_jobOptions.py
-
exampleaod.conf
reader.conf
Creating new GANGA job
In [n]: j = Job()
Setting application
In [n]: j.application = AMAAthena() In [n]: j.application.option_file += [ File('../run/AMAAthena_jobOptions.py'), File('../run/Trigger_jobOptions.py') ] In [n]: j.application.driver_config.config_file = File('../run/exampleaod.conf') In [n]: j.application.driver_config.include_file += [ File('../run/reader.conf') ] In [n]: j.application.prepare()
Setting input data
-
StagerDataset
When using the StagerDataset, the AMAAthena job will use the Athena FileStager to copy dataset files from a grid storage.
In [n]: j.inputdata = StagerDataset() In [n]: j.inputdata.dataset += [ 'fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10' ]DQ2Dataset When using the DQ2Dataset, GANGA will handle the file access externally from Athena.
In [n]: j.inputdata = DQ2Dataset() In [n]: j.inputdata.dataset += [ 'fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10' ] In [n]: j.inputdata.type = 'DQ2_DOWNLOAD'
Setting job splitter (optional)
The examples below ask each subjob to process on 2 files in maximum.
-
using StagerJobSplitter with StagerDataset
In [n]: j.splitter = StagerJobSplitter() In [n]: j.splitter.numfiles = 2using DQ2JobSplitter with DQ2Dataset for jobs running on LCG
In [n]: j.splitter = DQ2JobSplitter() In [n]: j.splitter.numfiles = 2
Setting computing backend
-
using Stoomboot cluster
In [n]: j.backend = PBS()
For a long running job, please also do
In [n]: j.backend.queue = 'qlong'
to avoid running over the walltime limitation of the default PBS queue.
using LCGIn [n]: j.backend = LCG()
StagerDataset is not yet supported for jobs on LCG. Please using DQ2Dataset instead:
In [n]: j.inputdata = DQ2Dataset() In [n]: j.inputdata.dataset = []
Starting from Ganga 5.0.7, jobs submitted to LCG backend require users to setup one of the follows:
-
In [n]: j.backend.requirements.cloud = 'NL' In [n]: j.splitter = DQ2JobSpliter()
meaning that let Ganga decided how to distribute the jobs within a particular computing cloud.
-
In [n]: j.backend.CE = 'gazon.nikhef.nl:2119/jobmanager-pbs-atlas'
meaning that I want the job to be run on a particular computing element (I know what I am doing now!!).
Submitting job
In [n]: j.submit()
After job submission
Checking job status
GANGA automatically polls the up-to-date status of your jobs and updates local repository accordingly. A notification will pop up to the user when the job status is changed.
In addition, you can get a job summary table by:
In [n]: jobs
or a summary table for subjobs:
In [n]: j.subjobs
Result and output merging
For the moment, the completed (sub-)job returns an root summary file. The file is stored in the summary sub-directory in the job's output directory.
For jobs using StagerJobSplitter, the RootMerger is automatically attached with the job so that when the whole job is completed, the summary root files from sub-jobs are merged together.
For jobs using DQ2Dataset, the merging process can be done manually when the whole job is completed. For example, assuming each sub-job produces a root summary file called summary/summary_mySample_confFile_exampleaod.conf_nEvts_1000.root. To merge them, one can do:
In [n]: merger = RootMerger() In [n]: merger.files += ['summary/summary_mySample_confFile_exampleaod.conf_nEvts_1000.root'] In [n]: merger.overwrite = True In [n]: merger.ignorefailed = True In [n]: merger.merge(j)
The merged root file has the same name and it will be created in the job's outputdir.
Killing and removing jobs
You can kill a job by calling
In [n]: j.kill()
or remove a job by
In [n]: j.remove()
Advance usage
Restricting max. number of events
In [n]: j.application.max_events = '1000'
Running on more than one dataset
The StagerDataset supports wildcard specification in the dataset name. For example, if you want to run on all FDR2 Muon stream datasets, you can set the inputdata like the following:
In [n]: j.inputdata.dataset += ['fdr08_run2*physics_Muon*']
Dealing with failed sub-jobs
It's very possible to have some failed sub-jobs. In this case, GANGA reports the whole job as failed. There is no necessary to resubmit the whole job, you can just resubmit the failed subjobs. Assuming you have a failed job, j:
In [n]: j.subjobs.select(status='failed').resubmit()
Failing jobs manually
Some unexpected issues in the job may cause Ganga unable to update the job status to failed as it should be. In this case, you can manually fail the job in force
In [n]: j.fail(force=True)
This can avoid Ganga to keep polling the status of the problematic job which may be gone from the backend system.
The basic trouble shooting
GANGA tries to bring the stdout/err back to the client side even when the job is failed remotely on the Grid. So for the failed jobs, you can check them as the following for trouble shooting:
In [n]: j.peek('stdout','less') In [n]: j.peek('stderr','cat')
or
In [n]: j.peek('stdout.gz','zcat') In [n]: j.peek('stdout.gz','zcat')
for the LCG jobs.
More information
Known issues/ToDo items
-
StagerDataset not supported for jobs on LCG
--Hclee 16:17, 13 Aug 2008 (MET DST)