Difference between revisions of "Ganga with AMAAthena"

From Atlas Wiki
Jump to navigation Jump to search
(→‎PBS: Fixed a mistake)
 
(46 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== Introduction ==
 
== Introduction ==
This page will describe how to run AMAAthena jobs with Ganga on different computing platforms (Stoomboot, lxbatch, Grid).
+
This page will describe how to run AMAAthena jobs with Ganga on different computing platforms (local desktop, Stoomboot, lxbatch, Grid).
  
 
== Preparation ==
 
== Preparation ==
Line 16: Line 16:
 
</pre>
 
</pre>
  
The start Ganga in <tt>$TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run</tt> directory.
+
Then start Ganga in <tt>$TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run</tt> directory.
  
 
<pre>
 
<pre>
Line 24: Line 24:
 
</pre>
 
</pre>
  
== Templates for quick start ==
+
== Tutorial templates for quick start ==
There are ready-to-go Ganga scripts made for this tutorial:
+
There are ready-to-go Ganga scripts made for this tutorial.  Following the instructions below to copy them into your AMAAthena run directory:
 +
 
 +
<pre>
 +
% cd $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run
 +
% cp /project/atlas/nikhef/ganga/tutorial_2010/job_options/* .
 +
% cp /project/atlas/nikhef/ganga/tutorial_2010/ama_config/* .
 +
% cp /project/atlas/nikhef/ganga/tutorial_2010/ganga_scripts/* .
 +
</pre>
 +
 
 +
<ul>
 +
<li><span style="background:#00FF00"><tt>data_D3PD_simple2.conf</tt></span>: simple AMA configuration file for converting AOD into D3PD and dumping NTuples
 +
<li><span style="background:#00FF00"><tt>data_D3PD_filler_v4.conf</tt></span>: part of the AMA configuration file included by <tt>data_D3PD_simple2.conf</tt>
 +
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.local.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on local desktop
 +
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.pbs.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on Stoomboot
 +
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.lcg.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on the grid through gLite WMS
 +
<li><span style="background:#00FFFF"><tt>ama_d3pd_maker.panda.gpi</tt></span>: Ganga script for creating a ready-to-submit job to run on the grid through Panda
 +
</ul>
 +
 
 +
Apart from the files mentioned above, there are also few files prepared so that one can submit the jobs right away.  They are listed below.  In general, those files are prepared by user as mentioned in the [[#Pre-configuration | Application pre-configuration]] below.
  
 
<ul>
 
<ul>
<li>ama_d3pd_maker.local.gpi:
+
<li><span style="background:#FFFF00"><tt>AMAAthena_jobOptions_new.py</tt></span>: top-level AMAAthena job option file without AutoConfig/RecExCommon
<li>ama_d3pd_maker.pbs.gpi:
+
<li><span style="background:#FFFF00"><tt>AMAAthena_jobOptions_AUTO.py</tt></span>: top-level AMAAthena job option file with AutoConfig/RecExCommon
<li>ama_d3pd_maker.lcg.gpi:
+
<li><span style="background:#FFFF00"><tt>data_D3PD_simple2.py</tt></span>: user-level AMAAthena job option file converted from the AMA configuration file
<li>ama_d3pd_maker.panda.gpi:
+
<li><span style="background:#FFFF00"><tt>rundef.py</tt></span>: run definition job option file of AMAAthena
 
</ul>
 
</ul>
  
You can execute one of them to generate a new Ganga job, for example:
+
With all those files ready in the <tt>run</tt> directory, you can just load one of the Ganga script and submit the analysis jobs right away.
  
 +
For example:
 
<pre>
 
<pre>
 
In [n]: execfile('ama_d3pd_maker.lcg.gpi')
 
In [n]: execfile('ama_d3pd_maker.lcg.gpi')
Line 41: Line 60:
 
</pre>
 
</pre>
  
The details in those scripts are explained in the following sections in this wiki.
+
will create and submit a LCG job to generate D3PD files using AMAAthena.
 +
 
 +
The rest of the wiki will give you detail explanations on what has been done within those scripts.
  
 
== Ganga jobs by yourself ==
 
== Ganga jobs by yourself ==
  
=== Empty Ganga job creation ===
+
=== Ganga job creation ===
To create a new job in Ganga, do
+
The first step is to create a new (empty) job in Ganga, do
  
 
<pre>
 
<pre>
Line 59: Line 80:
  
 
=== Application configuration ===
 
=== Application configuration ===
AMAAthena is an Athena "Algorithm", so you can just use the Athena application object in Ganga to configure your AMAAthena job.  However, there are steps to be done before setting the Athena application object in Ganga:
+
 
 +
==== Pre-configuration ====
 +
AMAAthena is an Athena "Algorithm", so you can just use the Athena application plugin in Ganga to run AMAAthena.  However, there are steps to be done before setting the Athena application object in Ganga:
  
 
<ol>
 
<ol>
 
<li>copy the top-level job option file of AMAAthena to your working directory:
 
<li>copy the top-level job option file of AMAAthena to your working directory:
 
  <ul>
 
  <ul>
  <li> with AutoConfiguration
+
  <li> with AutoConfig/RecExCommon
 
  <pre>% get_files -jo AMAAthena_jobOptions_AUTO.py</pre>
 
  <pre>% get_files -jo AMAAthena_jobOptions_AUTO.py</pre>
  <li> without AutoConfiguration
+
  <li> without AutoConfig/RecExCommon
 
  <pre>% get_files -jo AMAAthena_jobOptions_new.py</pre>   
 
  <pre>% get_files -jo AMAAthena_jobOptions_new.py</pre>   
 
  </ul>
 
  </ul>
<li>convert user-level AMA configuration file into a Athena job option file.  For example, if you have a configuration file called
+
<li>convert user-level AMA configuration file into a Athena job option file.  For example, if you have a configuration file called <tt>data_D3PD_simple2.conf</tt>, do:
<pre>% AMAConfigfileConverter data09_wjet_muon_sel.conf data09_wjet_muon_sel.py</pre>
+
<pre>% AMAConfigfileConverter data_D3PD_simple2.conf data_D3PD_simple2.py</pre>
<li>create a runtime definition job option called <tt>rundef.py</tt> and edit it as the following example:
+
<li>create a AMA runtime definition job option called <tt>rundef.py</tt> and edit it as the following example:
 
<pre>
 
<pre>
 
SampleName = 'data09_900GeV_00140541_MuonswBeam'
 
SampleName = 'data09_900GeV_00140541_MuonswBeam'
ConfigFile = 'data09_wjet_muon_sel.py'
+
ConfigFile = 'data_D3PD_simple2.py'
 
FlagList = ''
 
FlagList = ''
 
EvtMax = -1
 
EvtMax = -1
Line 84: Line 107:
 
<li>'''SampleName''': the user defined sample name.  This name will be used in composing the AMA summary output files.
 
<li>'''SampleName''': the user defined sample name.  This name will be used in composing the AMA summary output files.
 
<li>'''ConfigFile''': the job option file name converted from the user-level configuration file (the output of step 2)
 
<li>'''ConfigFile''': the job option file name converted from the user-level configuration file (the output of step 2)
<li>'''FlagList''':
+
<li>'''FlagList''': legacy AMA flags
 
<li>'''EvtMax''': the maximum number of event to be processed in the job
 
<li>'''EvtMax''': the maximum number of event to be processed in the job
<li>'''AMAAthenaFlags''': the additional AMA job option files to be included by the main AMA job option file.  This is ignored if using AutoConfiguration.
+
<li>'''AMAAthenaFlags''': the additional AMA job option files to be included by the top-level AMA job option file.  This is ignored if using AutoConfig/RecExCommon.
 
</ul>
 
</ul>
  
 
</ol>
 
</ol>
  
 +
==== Configuration ====
 
Once you have the above steps done, you can proceed in Ganga to set up the Athena application:
 
Once you have the above steps done, you can proceed in Ganga to set up the Athena application:
  
Line 96: Line 120:
 
In [n]: j.application = Athena()
 
In [n]: j.application = Athena()
 
In [n]: j.application.max_events = -1
 
In [n]: j.application.max_events = -1
In [n]: j.application.option_file += [ File('rundef.py'), File('AMAAthena_jobOptions_AUTO.py') ]
+
In [n]: j.application.option_file += [ File('rundef.py'), File('AMAAthena_jobOptions_new.py'), File('data_D3PD_simple2.py') ]
 
In [n]: j.application.prepare()
 
In [n]: j.application.prepare()
 
</pre>
 
</pre>
 +
 +
The <tt>j.application.prepare()</tt> method automatically detects the input/output files by virtually run through the job option files given above.  As the outputs are controlled internally by AMA, it's suggested to always add the following two lines to avoid possible confusion (e.g. with Panda). So if you run AMAAthena, always do the following lines after <tt>j.application.prepare()</tt>.
 +
 +
<pre>
 +
In [n]: j.application.atlas_run_config['output']['outHist'] = False
 +
In [n]: j.application.atlas_run_config['output']['alloutputs'] = []
 +
</pre>
 +
 +
==== Optional configurations ====
 +
===== Override default DBRelease =====
 +
By default, the job will pick up the DBRelease shipped together with the Athena release that you are using to run the job.  In some cases, you may want to override it, for example, when you encounter the following error:
 +
 +
<pre>
 +
T_AthenaPoolCnv    ERROR poolToObject: caught error:
 +
FID "74981861-8AD2-DE11-95BD-001CC466D3D3" is not existing in the catalog
 +
( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )
 +
</pre>
 +
 +
For local jobs, you can simply set the following lines in order to force the job to load a proper DBRelease version from certain area, presuming that the DBRelease area is on a shared file system:
 +
 +
<pre>
 +
In [n]: j.application.atlas_environment = ['ATLAS_DB_AREA=/data/atlas/offline/db', 'DBRELEASE_OVERRIDE=7.8.1']
 +
</pre>
 +
 +
For grid jobs, you cannot do that as you don't know the path on the remote machine in advance.  To achieve it, one needs to do:
 +
 +
<pre>
 +
In [n]: j.application.atlas_dbrelease  = 'ddo.000001.Atlas.Ideal.DBRelease.v070801:DBRelease-7.8.1.tar.gz'
 +
In [n]: j.application.atlas_environment =['DBRELEASE_OVERRIDE=7.8.1']
 +
</pre>
 +
 +
where the <tt>j.application.atlas_dbrelease</tt> points the job to download the DBRelease tarball "<tt>DBRelease-7.8.1.tar.gz</tt>" in the ATLAS dataset "<tt>ddo.000001.Atlas.Ideal.DBRelease.v070801</tt>"; while <tt>j.application.atlas_environment</tt> enforces the Athena job to use its version of DBRelease instead of the default one.
  
 
=== InputDataset configuration ===
 
=== InputDataset configuration ===
It is encouraged to enable [https://twiki.cern.ch/twiki/bin/view/AtlasProtected/FileStager FileStager] with your analysis job as it has been proved to be more efficient in majority of cases.  To do so, there are two '''InputDataset''' object to use in Ganga depending on where you submit your jobs.
+
It is encouraged to enable [https://twiki.cern.ch/twiki/bin/view/AtlasProtected/FileStager FileStager] with your analysis job as it has been proved to be more efficient in majority of cases.  To do so, there are two InputDataset plugins in Ganga can be used depending on where the job will run on.
  
<ul>
+
==== <tt>'''StagerDataset'''</tt> for local jobs ====
<li><tt>'''StagerDataset'''</tt> for local jobs</li>
 
  
Presuming you have a dataset <tt>"data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268"</tt> located at <tt>"NIKHEF-ELPROD_DATADISK"</tt>, you can set the <tt>inputdata</tt> attribute of the Ganga job object as the following:
+
Presuming that you want to run over a dataset <tt>"data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268"</tt> located at <tt>"NIKHEF-ELPROD_DATADISK"</tt>, you can set the <tt>inputdata</tt> attribute of the Ganga job object as the following:
  
 
<pre>
 
<pre>
Line 115: Line 170:
 
</pre>
 
</pre>
  
<span style="color:#800000">Remarks</span>
+
<span style="color:#800000"><u>'''Remarks'''</u></span>
 
<ul>
 
<ul>
<li>Always use <tt>'''StagerDataset'''</tt> with '''LSF''' and '''PBS''' backends for local jobs.
+
<li>Use <tt>StagerDataset</tt> only with <tt>Local</tt>, <tt>LSF</tt> and <tt>PBS</tt> backend plugins for local jobs.
<li><tt>'''StagerDataset'''</tt> is aimed for copying files from local storage. You need to find the local location of the dataset in terms of DDM site name and set it properly in Ganga by <tt>'''config.DQ2.DQ2_LOCAL_SITE_ID'''</tt>
+
<li><tt>StagerDataset</tt> is restricted to copy files from the grid storage close to the computing node. You need to find the local location of the dataset in terms of DDM site name and set it properly with <tt>'''config.DQ2.DQ2_LOCAL_SITE_ID'''</tt>
 
</ul>
 
</ul>
  
Line 129: Line 184:
 
</pre>
 
</pre>
  
<!-- All the files with the name <tt>'*.root*'</tt> in this directory (and sub-directories) will be included. -->
+
==== <tt>'''DQ2Dataset'''</tt> for grid jobs ====
 
 
<li><tt>'''DQ2Dataset'''</tt> for grid jobs</li>
 
  
 
Presuming you want to run on a dataset <tt>data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268</tt> on the grid, you can set the <tt>InputDataset</tt> object as the following in Ganga:
 
Presuming you want to run on a dataset <tt>data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268</tt> on the grid, you can set the <tt>InputDataset</tt> object as the following in Ganga:
Line 141: Line 194:
 
</pre>
 
</pre>
  
<span style="color:#800000">Remarks</span>
+
<span style="color:#800000"><u>'''Remarks'''</u></span>
 
<ul>
 
<ul>
 
<li>Always use <tt>'''DQ2Dataset'''</tt> with '''Panda''' and '''LCG''' backends.
 
<li>Always use <tt>'''DQ2Dataset'''</tt> with '''Panda''' and '''LCG''' backends.
</ul>
 
 
 
</ul>
 
</ul>
  
Line 151: Line 202:
 
The examples below ask each subjob to process on 2 files in maximum.
 
The examples below ask each subjob to process on 2 files in maximum.
  
<ul>
+
==== <tt>'''StagerJobSplitter'''</tt> for <tt>'''StagerDataset'''</tt> ====
<li/>using <tt>'''StagerJobSplitter'''</tt> with <tt>'''StagerDataset'''</tt>
 
 
 
 
<pre>
 
<pre>
 
In [n]: j.splitter = StagerJobSplitter()
 
In [n]: j.splitter = StagerJobSplitter()
Line 159: Line 208:
 
</pre>
 
</pre>
  
<li/>using <tt>'''DQ2JobSplitter'''</tt> with <tt>'''DQ2Dataset''' for jobs running on LCG</tt>
+
==== <tt>'''DQ2JobSplitter'''</tt> for <tt>'''DQ2Dataset'''</tt> ====
  
 
<pre>
 
<pre>
Line 165: Line 214:
 
In [n]: j.splitter.numfiles = 2
 
In [n]: j.splitter.numfiles = 2
 
</pre>
 
</pre>
 
</ul>
 
  
 
=== Backend (platform) configuration ===
 
=== Backend (platform) configuration ===
Line 172: Line 219:
  
 
<ul>
 
<ul>
<li><tt>'''Local'''</tt>: for running jobs locally right on your desktop
+
<li><span style="background:#00FF00"><tt>'''Local'''</tt></span>: for running jobs locally right on your desktop
<li><tt>'''PBS'''</tt>: for running jobs on a PBS-based computer cluster (e.g. the Stoomboot)
+
<li><span style="background:#00FF00"><tt>'''PBS'''</tt></span>: for running jobs on a PBS-based computer cluster (e.g. the Stoomboot)
<li><tt>'''LSF'''</tt>: for running jobs on a LSF-based computer cluster (e.g. lxbatch@CERN)
+
<li><span style="background:#00FF00"><tt>'''LSF'''</tt></span>: for running jobs on a LSF-based computer cluster (e.g. lxbatch@CERN)
<li><tt>'''LCG'''</tt>: for running jobs on the grid (EGEE sites), jobs are brokered by gLite Workload Management System (WMS)
+
<li><span style="background:#00FF00"><tt>'''LCG'''</tt></span>: for running jobs on the grid (EGEE sites), jobs are brokered by gLite Workload Management System (WMS)
<li><tt>'''Panda'''</tt>: for running jobs on the grid (EGEE, OSG, NorduGrid sites), jobs are brokered by [http://panda.cern.ch Panda]
+
<li><span style="background:#00FF00"><tt>'''Panda'''</tt></span>: for running jobs on the grid (EGEE, OSG, NorduGrid sites), jobs are brokered by [http://panda.cern.ch Panda]
 
</ul>
 
</ul>
  
Line 186: Line 233:
  
 
==== Local ====
 
==== Local ====
This is the default backend of a Ganga job.
+
Ask the job to be executed locally right on the desktop. This is the default backend of a newly created Ganga job.
 +
<pre>
 +
In [n]: j.backend = Local()
 +
</pre>
  
 
==== PBS ====
 
==== PBS ====
Line 192: Line 242:
 
<pre>
 
<pre>
 
In [n]: j.backend = PBS()
 
In [n]: j.backend = PBS()
In [n]: j.queue = 'qlong'
+
In [n]: j.backend.queue = 'qlong'
 
</pre>
 
</pre>
  
Line 199: Line 249:
 
<pre>
 
<pre>
 
In [n]: j.backend = LSF()
 
In [n]: j.backend = LSF()
In [n]: j.queue = '1nh'
+
In [n]: j.backend.queue = '1nh'
 
</pre>
 
</pre>
  
Line 214: Line 264:
 
<pre>
 
<pre>
 
In [n]: j.backend = Panda()
 
In [n]: j.backend = Panda()
 +
In [n]: j.backend.libds = ''
 
In [n]: j.backend.requirements.cloud = 'US'
 
In [n]: j.backend.requirements.cloud = 'US'
 
</pre>
 
</pre>
 +
 +
=== Job submission ===
 +
This is as simple as you can imagine:
 +
<pre>
 +
In [n]: j.submit()
 +
</pre>
 +
 +
=== Job management ===
 +
Job management in Ganga is application independent; therefore you are referred to [http://www.nikhef.nl/pub/experiments/atlaswiki/index.php/Ganga_basic_usage#Basic_job_management Basic job management] where the basic job management functions are explained.
 +
 +
== Helper scripts ==
 +
 +
== Next step ==
 +
For more details about different Athena use cases, you can refer to the following twiki to get more information:
 +
 +
* [https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorial Full GangaAtlas tutorial]: the up-to-date official tutorial wiki for GangaAtlas
 +
* [https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ GangaAtlas FAQ]: the Q&A page with questions collected by the global user analysis support team
 +
 +
You are also encouraged to subscribe to [https://groups.cern.ch/group/hn-atlas-dist-analysis-help/default.aspx atlas-dist-analysis-help forum] where you can send GangaAtlas related issues to ask for supports from experts and/or global users.  Take it as the HelpDesk of the global user analysis support.

Latest revision as of 11:31, 8 February 2010

Introduction

This page will describe how to run AMAAthena jobs with Ganga on different computing platforms (local desktop, Stoomboot, lxbatch, Grid).

Preparation

  • Make sure you can run AMAAthena standalone on local desktop. Here are instructions about doing it at NIKHEF: Using Athena at NIKHEF
  • Make sure you manage to submit HelloWorld jobs to different computing platforms. Here are instructions: Ganga: basic usage

Starting Ganga

Before starting Ganga, set CMT environment properly. Here is the example commands presuming that you have the setup scripts for CMT in $HOME/cmthome directory.

% source $HOME/cmthome/setup.sh -tag=15.6.1,32
% source $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt/setup.sh

Then start Ganga in $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run directory.

% cd $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef

Tutorial templates for quick start

There are ready-to-go Ganga scripts made for this tutorial. Following the instructions below to copy them into your AMAAthena run directory:

% cd $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run
% cp /project/atlas/nikhef/ganga/tutorial_2010/job_options/* .
% cp /project/atlas/nikhef/ganga/tutorial_2010/ama_config/* .
% cp /project/atlas/nikhef/ganga/tutorial_2010/ganga_scripts/* .
  • data_D3PD_simple2.conf: simple AMA configuration file for converting AOD into D3PD and dumping NTuples
  • data_D3PD_filler_v4.conf: part of the AMA configuration file included by data_D3PD_simple2.conf
  • ama_d3pd_maker.local.gpi: Ganga script for creating a ready-to-submit job to run on local desktop
  • ama_d3pd_maker.pbs.gpi: Ganga script for creating a ready-to-submit job to run on Stoomboot
  • ama_d3pd_maker.lcg.gpi: Ganga script for creating a ready-to-submit job to run on the grid through gLite WMS
  • ama_d3pd_maker.panda.gpi: Ganga script for creating a ready-to-submit job to run on the grid through Panda

Apart from the files mentioned above, there are also few files prepared so that one can submit the jobs right away. They are listed below. In general, those files are prepared by user as mentioned in the Application pre-configuration below.

  • AMAAthena_jobOptions_new.py: top-level AMAAthena job option file without AutoConfig/RecExCommon
  • AMAAthena_jobOptions_AUTO.py: top-level AMAAthena job option file with AutoConfig/RecExCommon
  • data_D3PD_simple2.py: user-level AMAAthena job option file converted from the AMA configuration file
  • rundef.py: run definition job option file of AMAAthena

With all those files ready in the run directory, you can just load one of the Ganga script and submit the analysis jobs right away.

For example:

In [n]: execfile('ama_d3pd_maker.lcg.gpi')
In [n]: j.submit()

will create and submit a LCG job to generate D3PD files using AMAAthena.

The rest of the wiki will give you detail explanations on what has been done within those scripts.

Ganga jobs by yourself

Ganga job creation

The first step is to create a new (empty) job in Ganga, do

In [n]: j = Job()

and you can set job's name as

In [n]: j.name = 'my_ama_job'

Application configuration

Pre-configuration

AMAAthena is an Athena "Algorithm", so you can just use the Athena application plugin in Ganga to run AMAAthena. However, there are steps to be done before setting the Athena application object in Ganga:

  1. copy the top-level job option file of AMAAthena to your working directory:
    • with AutoConfig/RecExCommon
      % get_files -jo AMAAthena_jobOptions_AUTO.py
    • without AutoConfig/RecExCommon
      % get_files -jo AMAAthena_jobOptions_new.py
  2. convert user-level AMA configuration file into a Athena job option file. For example, if you have a configuration file called data_D3PD_simple2.conf, do:
    % AMAConfigfileConverter data_D3PD_simple2.conf data_D3PD_simple2.py
  3. create a AMA runtime definition job option called rundef.py and edit it as the following example:
    SampleName = 'data09_900GeV_00140541_MuonswBeam'
    ConfigFile = 'data_D3PD_simple2.py'
    FlagList = ''
    EvtMax = -1
    AMAAthenaFlags = ['DATA', 'TRIG']
    

    The variables in rundef.py is explained in the following:

    • SampleName: the user defined sample name. This name will be used in composing the AMA summary output files.
    • ConfigFile: the job option file name converted from the user-level configuration file (the output of step 2)
    • FlagList: legacy AMA flags
    • EvtMax: the maximum number of event to be processed in the job
    • AMAAthenaFlags: the additional AMA job option files to be included by the top-level AMA job option file. This is ignored if using AutoConfig/RecExCommon.

Configuration

Once you have the above steps done, you can proceed in Ganga to set up the Athena application:

In [n]: j.application = Athena()
In [n]: j.application.max_events = -1
In [n]: j.application.option_file += [ File('rundef.py'), File('AMAAthena_jobOptions_new.py'), File('data_D3PD_simple2.py') ]
In [n]: j.application.prepare()

The j.application.prepare() method automatically detects the input/output files by virtually run through the job option files given above. As the outputs are controlled internally by AMA, it's suggested to always add the following two lines to avoid possible confusion (e.g. with Panda). So if you run AMAAthena, always do the following lines after j.application.prepare().

In [n]: j.application.atlas_run_config['output']['outHist'] = False
In [n]: j.application.atlas_run_config['output']['alloutputs'] = []

Optional configurations

Override default DBRelease

By default, the job will pick up the DBRelease shipped together with the Athena release that you are using to run the job. In some cases, you may want to override it, for example, when you encounter the following error:

T_AthenaPoolCnv     ERROR poolToObject: caught error: 
FID "74981861-8AD2-DE11-95BD-001CC466D3D3" is not existing in the catalog 
( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )

For local jobs, you can simply set the following lines in order to force the job to load a proper DBRelease version from certain area, presuming that the DBRelease area is on a shared file system:

In [n]: j.application.atlas_environment = ['ATLAS_DB_AREA=/data/atlas/offline/db', 'DBRELEASE_OVERRIDE=7.8.1']

For grid jobs, you cannot do that as you don't know the path on the remote machine in advance. To achieve it, one needs to do:

In [n]: j.application.atlas_dbrelease   = 'ddo.000001.Atlas.Ideal.DBRelease.v070801:DBRelease-7.8.1.tar.gz'
In [n]: j.application.atlas_environment =['DBRELEASE_OVERRIDE=7.8.1']

where the j.application.atlas_dbrelease points the job to download the DBRelease tarball "DBRelease-7.8.1.tar.gz" in the ATLAS dataset "ddo.000001.Atlas.Ideal.DBRelease.v070801"; while j.application.atlas_environment enforces the Athena job to use its version of DBRelease instead of the default one.

InputDataset configuration

It is encouraged to enable FileStager with your analysis job as it has been proved to be more efficient in majority of cases. To do so, there are two InputDataset plugins in Ganga can be used depending on where the job will run on.

StagerDataset for local jobs

Presuming that you want to run over a dataset "data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268" located at "NIKHEF-ELPROD_DATADISK", you can set the inputdata attribute of the Ganga job object as the following:

In [n]: config.DQ2.DQ2_LOCAL_SITE_ID = 'NIKHEF-ELPROD_DATADISK'
In [n]: j.inputdata = StagerDataset()
In [n]: j.inputdata.type = 'DQ2'
In [n]: j.inputdata.dataset += [ 'data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268' ]

Remarks

  • Use StagerDataset only with Local, LSF and PBS backend plugins for local jobs.
  • StagerDataset is restricted to copy files from the grid storage close to the computing node. You need to find the local location of the dataset in terms of DDM site name and set it properly with config.DQ2.DQ2_LOCAL_SITE_ID

You can also use StagerDataset to access the dataset files already existing on local disk. The following example assumes that you have dataset files already sitting in the directory /data/atlas3/users/hclee/data/data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268

In [n]: j.inputdata = StagerDataset()
In [n]: j.inputdata.type = 'LOCAL'
In [n]: j.inputdata.dataset = ['/data/atlas3/users/hclee/data/data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268']

DQ2Dataset for grid jobs

Presuming you want to run on a dataset data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268 on the grid, you can set the InputDataset object as the following in Ganga:

In [n]: j.inputdata = DQ2Dataset()
In [n]: j.inputdata.dataset += [ 'data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268' ]
In [n]: j.inputdata.type = 'FILE_STAGER'

Remarks

  • Always use DQ2Dataset with Panda and LCG backends.

Splitter configuration

The examples below ask each subjob to process on 2 files in maximum.

StagerJobSplitter for StagerDataset

In [n]: j.splitter = StagerJobSplitter()
In [n]: j.splitter.numfiles = 2

DQ2JobSplitter for DQ2Dataset

In [n]: j.splitter = DQ2JobSplitter()
In [n]: j.splitter.numfiles = 2

Backend (platform) configuration

You should be able to switch to different computing platform (Backend in Ganga terminology) by simply change the backend attribute of a job object. The available backends are:

  • Local: for running jobs locally right on your desktop
  • PBS: for running jobs on a PBS-based computer cluster (e.g. the Stoomboot)
  • LSF: for running jobs on a LSF-based computer cluster (e.g. lxbatch@CERN)
  • LCG: for running jobs on the grid (EGEE sites), jobs are brokered by gLite Workload Management System (WMS)
  • Panda: for running jobs on the grid (EGEE, OSG, NorduGrid sites), jobs are brokered by Panda

For example, to switch to submit jobs to the grid through Panda:

In [n]: j.backend = Panda()

Local

Ask the job to be executed locally right on the desktop. This is the default backend of a newly created Ganga job.

In [n]: j.backend = Local()

PBS

ask job to be submitted to the "qlong" of the Stoomboot.

In [n]: j.backend = PBS()
In [n]: j.backend.queue = 'qlong'

LSF

Ask job to be submitted to the "1nh" (1 hour) queue on the lxbatch@CERN. You need to run it from lxplus@CERN.

In [n]: j.backend = LSF()
In [n]: j.backend.queue = '1nh'

LCG

Ask job to be submitted to a EGEE site wherever the dataset given above is available and with the queue supporting 12 hours long jobs.

In [n]: j.backend = LCG()
In [n]: j.backend.requirements.cloud = 'ALL'
In [n]: j.backend.requirements.walltime = 720

Panda

Ask the job to be submitted to Panda and then being brokered to whatever site being able to process this job in the "US" cloud.

In [n]: j.backend = Panda()
In [n]: j.backend.libds = ''
In [n]: j.backend.requirements.cloud = 'US'

Job submission

This is as simple as you can imagine:

In [n]: j.submit()

Job management

Job management in Ganga is application independent; therefore you are referred to Basic job management where the basic job management functions are explained.

Helper scripts

Next step

For more details about different Athena use cases, you can refer to the following twiki to get more information:

You are also encouraged to subscribe to atlas-dist-analysis-help forum where you can send GangaAtlas related issues to ask for supports from experts and/or global users. Take it as the HelpDesk of the global user analysis support.