Using GANGA with AMAAthena
Introduction
This guide provides step-by-step instructions for running AMAAthena through GANGA. Users will run GANGA on a NIKHEF desktop (e.g. elel22.nikhef.nl) and submit AMAAthena jobs to Stoomboot (a PBS cluster) and to the LCG.
AMAAthena is an Athena package providing a framework for modular analysis. GANGA is an official tool for ATLAS distributed data analysis.
Preparation
Please follow https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/PhysicsAnalysisTools?topic=AMAMainPage to setup CMT and checkout AMAAthena package.
Starting GANGA session
Typing the following commands within the directory: PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt in a clean shell environment (i.e. no environment setup for Athena and CMT).
% source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF % export DPNS_HOST=tbn18.nikhef.nl % export LFC_HOST=lfc-atlas.grid.sara.nl % source /project/atlas/nikhef/ganga/etc/setup.[c]sh % ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef
Every time you start with a clean shell, and you'll need to setup ganga with the lines given right above.
The last command loads a system-wide ATLAS-specific configuration for your Ganga session. You can override the system-wide configuration by providing a ~/.gangarc file. The template of the ~/.gangarc file can be generated by:
% ganga -g
If you see the following prompt:
*** Welcome to Ganga *** Version: Ganga-5-1-1 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help. This is free software (GPL), and you are welcome to redistribute it under certain conditions; type license() for details. In [1]:
you are already in a GANGA session. The GANGA session is actually an IPython shell with GANGA specific extensions (modules), meaning that you can do programming (python only, of course) inside the GANGA session.
Leaving GANGA session
To quit from a GANGA session, just press CTRL-D.
Getting familiar with GANGA
My first Grid job running a HelloWorld shell script
Now go to your project directory
cd /project/atlas/Users/yourusernamehere
and create 'myscript.sh'
#!/bin/sh echo 'myscript.sh running...' echo "----------------------" /bin/hostname echo "HELLO PLANET!" echo "----------------------"
and the file 'gangaScript.py'. Do not forget to modify the following to your directory structure+
In[n]: j = Job() In[n]: j.application=Executable() In[n]: j.application.exe=File('/project/atlas/Users/yourusernamehere/myscript.sh') In[n]: j.backend=LCG() In[n]: j.submit()
This Ganga Job means the following
* Line 1 defines the job * Line 2 sets it as an Executable * Line 3 tell which file to run * Line 4 Tell where the job should run * Line 5 submits the job
The imprtant point is here that we have chosen LCG() as backend, i.e. the script will be executed on the grid. Now start ganga again and submit the job to the LCG-grid
In[n]: execfile("./gangaScript.py")
the status of the job can be monitored with
In[n]: jobs
After the job is submitted, GANGA is now responsible for monitoring your jobs when it's still running; and for downloading output files (e.g. stdout/stderr) to the local machine when the job is finished.
When your job is completed, the job's output is automatically fetched from the Grid and stored in your gangadir directory. The exact output location can be found by:
In[n]: j.outputdir Out[n]: /project/atlas/Users/yourusernamehere/gangadir/workspace/yourusernamehere/LocalAMGA/0/output
if 0 was the job ID. This was our first grid-job submitted via ganga!
Working with historical jobs
GANGA internally archive your previously submitted jobs (historical jobs) in the local job repository (i.e. gangadir) so that you don't have to do bookkeeping by yourself. You can freely get in/out GANGA and still have your historical jobs ready for your future work.
The first thing to work with your historical job is to get the job instance from the repository as the following:
In [n]: jobs Out[1]: Job slice: jobs (12 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 17 submitted 1000 Executable LCG # 18 submitted 2000 Executable LCG # 20 completed 10 Executable LCG # 28 submitted Executable LCG # 29 submitted test_lcg Executable LCG
The table above lists the historical jobs in your GANGA repository indexed by fqid. For example, if you are interested in the job with id 29, you can get the job instance by
In [n]: j = jobs(29)
then you are all set to work with the job.
Please note that you CANNOT change the attributes of a historical job.
More GANGA jobs to run on different platforms
Now try the following commands in the Ganga shell to gets your hands dirty :) Try to find where the second job runs.
In [n]: j = Job() In [n]: j.backend=Local() In [n]: j.submit() In [n]: jobs In [n]: j = j.copy() In [n]: j.backend=PBS() In [n]: j.submit() In [n]: jobs
Running AMAAthena in GANGA
The example below assumes:
-
Users have the following Athena job option files in the run directory of the AMAAthena package
-
AMAAthena_jobOptions.py
Trigger_jobOptions.py
You can find them from the share directory of the AMAAthena package.
Users have the following AMA driver configuration files in the run directory of the AMAAthena package-
exampleaod.conf
reader.conf
You can find them from the Config directory of the AMAAthena package.
After copying them into the run directory, modify the exampleaod.conf by replacinginclude_file = Config/reader.conf
with
include_file = reader.confAnalysis is performed on dataset: fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10
GANGA magic functions for cmtsetup
Inside GANGA, one could deal with the complex CMT setup with two magic functions.
The following example shows how to setup the CMT environment for Athena 14.2.20 in 32 bit mode.
In [n]: config.Athena.CMTHOME = '/path/to/your/cmthome' In [n]: cmtsetup 14.2.20,32 In [n]: setup
GANGA magic function resolving python conflict at CERN
This is specific for solving the python conflict between the LCG UI and the ATLAS release at CERN. If you plan to run jobs on lxplus (with the Local backend) or lxbatch (with the LSF backend), please apply the following magic function before j.submit() to resolve the issue:
In [n]: fixpython
Creating a new GANGA job
In [n]: j = Job()
Setting application
From the AMAAthena/cmt directory, start ganga and do:
In [n]: j.application = AMAAthena() In [n]: j.application.option_file += [ File('../run/AMAAthena_jobOptions.py'), File('../run/Trigger_jobOptions.py') ] In [n]: j.application.driver_config.config_file = File('../run/exampleaod.conf') In [n]: j.application.driver_config.include_file += [ File('../run/reader.conf') ]
Starting from Ganga 5.1.7, you can set the AMA flags using the following way:
In [n]: j.application.driver_flags = 'MuonSample=1 HasTopQuarks=1 DoDiLepton=1'
Finally, preparing the application tarball to be shipped to the grid worker node:
In [n]: j.application.prepare()
Setting input data
- StagerDataset Please use StagerDataset with LSF and PBS backends for local jobs. When using the StagerDataset, the AMAAthena job will use the Athena FileStager to copy dataset files from a grid storage.
In [n]: j.inputdata = StagerDataset() In [n]: j.inputdata.dataset += [ 'fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10' ]
In [n]: j.inputdata = StagerDataset() In [n]: j.inputdata.type = 'LOCAL' In [n]: j.inputdata.dataset = ['/project/atlas/data/fdr2']
All the files with the name '*.root*' in this directory (and sub-directories) will be included.
In [n]: j.inputdata = DQ2Dataset() In [n]: j.inputdata.dataset += [ 'fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10' ] In [n]: j.inputdata.type = 'DQ2_LOCAL'
DQ2_DOWNLOAD copies files from the SE to the local disk of the WN and then runs over them, DQ2_LOCAL runs directly over the files on the SE. Starting from Ganga 5.1.4, another type FILE_STAGER is supported to use the FileStager package of Athena.
Setting job splitter (optional)
The examples below ask each subjob to process on 2 files in maximum.
-
using StagerJobSplitter with StagerDataset
In [n]: j.splitter = StagerJobSplitter() In [n]: j.splitter.numfiles = 2using DQ2JobSplitter with DQ2Dataset for jobs running on LCG
In [n]: j.splitter = DQ2JobSplitter() In [n]: j.splitter.numfiles = 2
Setting computing backend
-
using Stoomboot (PBS cluster) at NIKHEF
In [n]: j.backend = PBS()
For a long running job, please also do
In [n]: j.backend.queue = 'qlong'
to avoid running over the walltime limitation of the default PBS queue.
-
Make sure the grid environment is set up automatically every time you start a new shell (eg. in ~/.profile) - as the FileStager needs the LCG tools like lcg-cp:
. /global/ices/lcg/current/etc/profile.d/grid_env.sh
- When submitting the job with Ganga, make sure you are working on a machine which has the qsub commando available, eg. ribble (aka login) or elel22.
- It is possible to change the LCG site where data is copied from by the FileStager by setting config.DQ2.DQ2_LOCAL_SITE_ID = 'SARA-MATRIX_MCDISK'.
In [n]: j.backend = LSF()
For a long running job, please also specify the queue name concerning the job's walltime. For example:
In [n]: j.backend.queue = '8nh'using the world-wide grid (WLCG/EGEE)
In [n]: j.backend = LCG()
StagerDataset is not yet supported for jobs on LCG. Please using DQ2Dataset instead. For example:
-
In [n]: j.inputdata = DQ2Dataset() In [n]: j.inputdata.dataset = [] In [n]: j.inputdata.type = 'FILE_STAGER'
Starting from Ganga 5.0.7, jobs submitted to LCG backend require users to specify one of the following requirements:
-
In [n]: j.backend.requirements.cloud = 'NL' In [n]: j.splitter = DQ2JobSplitter()
meaning that let Ganga distribute the jobs within a particular computing cloud.
-
In [n]: j.backend.CE = 'gazon.nikhef.nl:2119/jobmanager-pbs-atlas'
meaning that I want the job to be run on a particular computing element (I know what I am doing now!!).
Submitting job
In [n]: j.submit()
After job submission
Checking job status
GANGA automatically polls the up-to-date status of your jobs and updates local repository accordingly. A notification will pop up to the user when the job status is changed.
In addition, you can get a job summary table by:
In [n]: jobs
or a summary table for subjobs:
In [n]: j.subjobs
Result and output merging
For the moment, the completed (sub-)job returns an root summary file. The file is stored in the summary sub-directory in the job's output directory.
For jobs using StagerJobSplitter, the RootMerger is automatically attached with the job so that when the whole job is completed, the summary root files from sub-jobs are merged together.
For jobs using DQ2Dataset, the merging process can be done manually when the whole job is completed. For example, assuming each sub-job produces a root summary file called summary/summary_mySample_confFile_exampleaod.conf_nEvts_1000.root. To merge them, one can do:
In [n]: merger = RootMerger() In [n]: merger.files += ['summary/summary_mySample_confFile_exampleaod.conf_nEvts_1000.root'] In [n]: merger.overwrite = True In [n]: merger.ignorefailed = True In [n]: merger.merge(j)
The merged root file has the same name and it will be created in the job's outputdir.
Killing and removing jobs
You can kill a job by calling
In [n]: j.kill()
or remove a job by
In [n]: j.remove()
Advance usage
Changing the default gangadir
For each job, Ganga maintains the associate files (e.g. job's inputs, outputs, metadata, etc.) in gangadir. This may take space (or disk quota) if you have many jobs in Ganga. You may want Ganga to keep those files in another directory where more space is available. To do so, open the ~/.gangarc file and change the directory as the following:
gangadir = /project/atlas/Users/yourusernamehere/gangadir
Restricting max. number of events
In [n]: j.application.max_events = '1000'
Running on more than one dataset
The StagerDataset supports wildcard specification in the dataset name. For example, if you want to run on all FDR2 Muon stream datasets, you can set the inputdata like the following:
In [n]: j.inputdata.dataset += ['fdr08_run2*physics_Muon*']
Dealing with failed sub-jobs
It's very possible to have some failed sub-jobs. In this case, GANGA reports the whole job as failed. There is no necessary to resubmit the whole job, you can just resubmit the failed subjobs. Assuming you have a failed job, j:
In [n]: j.subjobs.select(status='failed').resubmit()
Failing jobs manually
Some unexpected issues in the job may cause Ganga unable to update the job status to failed as it should be. In this case, you can manually fail the job in force
In [n]: j.force_status("failed")
This can avoid Ganga to keep polling the status of the problematic job which may be gone from the backend system.
The basic trouble shooting
GANGA tries to bring the stdout/err back to the client side even when the job is failed remotely on the Grid. So for the failed jobs, you can check them as the following for trouble shooting:
In [n]: j.peek('stdout','less') In [n]: j.peek('stderr','cat')
or
In [n]: j.peek('stdout.gz','zcat') In [n]: j.peek('stdout.gz','zcat')
for the LCG jobs.
More information
Known issues/ToDo items
-
StagerDataset not supported for jobs on LCG
--Hclee 16:17, 13 Aug 2008 (MET DST)