Difference between revisions of "Using GANGA with AMAAthena"

From Atlas Wiki
Jump to navigation Jump to search
 
(191 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
<center>
 +
<h3>This page is out-dated!!</h3>
 +
Please refer to the new tutorial page: [[Ganga_with_AMAAthena | Ganga: running AMAAthena]]
 +
</center>
 +
 
== Introduction ==
 
== Introduction ==
This document gives an step-by-step instruction for running AMAAthena within GANGA on a NIKHEF desktop (e.g. ribble). AMAAthena is a Athena package providing ... developed at NIKHEF. GANGA is an official ATLAS grid utility for distributed data analysis.
+
This guide provides step-by-step instructions for running AMAAthena through GANGA. Users will run GANGA on a NIKHEF desktop (e.g. <tt>elel22.nikhef.nl</tt>) and submit AMAAthena jobs to Stoomboot (a PBS cluster) and to the LCG.
 +
 
 +
AMAAthena is an Athena package providing a framework for modular analysis. GANGA is an official tool for ATLAS distributed data analysis.
 +
 
 +
== Preparation ==
 +
Please follow 
 +
https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/PhysicsAnalysisTools?topic=AMAMainPage
 +
to setup <tt>CMT</tt> and checkout <tt>AMAAthena</tt> package.
 +
 
 +
== Starting GANGA session ==
 +
Typing the following commands within the directory: <tt>PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt</tt> in a clean shell environment (i.e. no environment setup for <tt>Athena</tt> and <tt>CMT</tt>).
 +
 
 +
<ul>
 +
<li>'''For NIKHEF users'''
 +
<pre>
 +
% source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF
 +
% export DPNS_HOST=tbn18.nikhef.nl
 +
% export LFC_HOST=lfc-atlas.grid.sara.nl
 +
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh
 +
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef
 +
</pre>
 +
 
 +
Every time you start with a clean shell, and you'll need to setup ganga with the lines given right above.
 +
</li>
 +
 
 +
<li>'''For CERN lxplus users'''
 +
<pre>
 +
% source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh
 +
% ganga
 +
</pre>
 +
 
 +
More detail for CERN users can be found here: http://ganga.web.cern.ch/ganga/user/index.php
 +
 
 +
</li>
 +
</ul>
 +
 
 +
The last command loads a system-wide ATLAS-specific configuration for your Ganga session. You can override the system-wide configuration by providing a <tt>~/.gangarc</tt> file. The template of the <tt>~/.gangarc</tt> file can be generated by:
 +
 
 +
<pre>
 +
% ganga -g
 +
</pre>
 +
 
 +
If you see the following prompt:
 +
 
 +
<pre>
 +
*** Welcome to Ganga ***
 +
Version: Ganga-5-1-1
 +
Documentation and support: http://cern.ch/ganga
 +
Type help() or help('index') for online help.
 +
 
 +
This is free software (GPL), and you are welcome to redistribute it
 +
under certain conditions; type license() for details.
 +
 
 +
In [1]:
 +
</pre>
 +
 
 +
you are already in a GANGA session. The GANGA session is actually an [http://ipython.scipy.org/moin/ IPython] shell with GANGA specific extensions (modules), meaning that you can do programming (python only, of course) inside the GANGA session.
 +
 
 +
== Leaving GANGA session ==
 +
To quit from a GANGA session, just press CTRL-D.
 +
 
 +
== Getting familiar with GANGA ==
 +
=== My first Grid job running a HelloWorld shell script ===
 +
 
 +
Now go to your project directory
 +
<pre>
 +
cd /project/atlas/Users/yourusernamehere
 +
</pre>
 +
and create 'myscript.sh'
 +
<pre>
 +
#!/bin/sh
 +
echo 'myscript.sh running...'
 +
echo "----------------------"
 +
/bin/hostname
 +
echo "HELLO PLANET!"
 +
echo "----------------------"
 +
</pre>
 +
and the file 'gangaScript.py'. Do not forget to modify the following to your directory structure+
 +
<pre>
 +
In[n]: j = Job()
 +
In[n]: j.application=Executable()
 +
In[n]: j.application.exe=File('/project/atlas/Users/yourusernamehere/myscript.sh')
 +
In[n]: j.backend=LCG()
 +
In[n]: j.submit()
 +
</pre>
 +
 
 +
This Ganga Job means the following
 +
  * Line 1 defines the job
 +
  * Line 2 sets it as an Executable
 +
  * Line 3 tell which file to run
 +
  * Line 4 Tell where the job should run
 +
  * Line 5 submits the job
 +
The imprtant point is here that we have chosen LCG() as backend, i.e. the script will be executed on the grid.
 +
Now start ganga again and submit the job to the LCG-grid
 +
<pre>
 +
In[n]: execfile("./gangaScript.py")
 +
</pre>
 +
 
 +
the status of the job can be monitored with
 +
<pre>
 +
In[n]: jobs
 +
</pre>
 +
 
 +
After the job is submitted, GANGA is now responsible for monitoring your jobs when it's still running; and for downloading output files (e.g. stdout/stderr) to the local machine when the job is finished.
 +
 
 +
When your job is <tt>completed</tt>, the job's output is automatically fetched from the Grid and stored in your <tt>gangadir</tt> directory. The exact output location can be found by:
 +
<pre>
 +
In[n]: j.outputdir
 +
Out[n]: /project/atlas/Users/yourusernamehere/gangadir/workspace/yourusernamehere/LocalAMGA/0/output
 +
</pre>
 +
 
 +
if 0 was the job ID. This was our first grid-job submitted via ganga!
 +
 
 +
=== Working with historical jobs ===
 +
GANGA internally archive your previously submitted jobs (historical jobs) in the local job repository (i.e. <tt>gangadir</tt>) so that you don't have to do bookkeeping by yourself. You can freely get in/out GANGA and still have your historical jobs ready for your future work.
 +
 
 +
The first thing to work with your historical job is to get the job instance from the repository as the following:
 +
 
 +
<pre>
 +
In [n]: jobs
 +
Out[1]:
 +
Job slice:  jobs (12 jobs)
 +
--------------
 +
# fqid      status        name  subjobs      application          backend  backend.actualCE                                               
 +
#  17  submitted                  1000      Executable              LCG                                               
 +
#  18  submitted                  2000      Executable              LCG                                                                                   
 +
#  20  completed                    10      Executable              LCG
 +
#  28  submitted                            Executable              LCG
 +
#  29  submitted    test_lcg                Executable              LCG                                             
 +
</pre>
 +
 
 +
The table above lists the historical jobs in your GANGA repository indexed by <tt>fqid</tt>. For example, if you are interested in the job with id <tt>29</tt>, you can get the job instance by
 +
 
 +
<pre>
 +
In [n]: j = jobs(29)
 +
</pre>
 +
 
 +
then you are all set to work with the job.
 +
 
 +
Please note that you <span style="color:#800000">'''CANNOT'''</span> change the attributes of a historical job.
 +
 
 +
=== More GANGA jobs to run on different platforms ===
 +
Now try the following commands in the Ganga shell to gets your hands dirty :)
 +
Try to find where the second job runs.
 +
 
 +
<pre>
 +
In [n]: j = Job()
 +
In [n]: j.backend=Local()
 +
In [n]: j.submit()
 +
In [n]: jobs
 +
 
 +
In [n]: j = j.copy()
 +
In [n]: j.backend=PBS()
 +
In [n]: j.submit()
 +
In [n]: jobs
 +
</pre>
 +
 
 +
== Running AMAAthena in GANGA ==
  
The examples below assume that:
+
The example below assumes:
 
<ol>
 
<ol>
 
<li/>Users have the following Athena job option files in the <tt>run</tt> directory of the AMAAthena package
 
<li/>Users have the following Athena job option files in the <tt>run</tt> directory of the AMAAthena package
Line 9: Line 171:
 
   <li/><tt>Trigger_jobOptions.py</tt>
 
   <li/><tt>Trigger_jobOptions.py</tt>
 
   </ul>
 
   </ul>
 +
 +
You can find them from the <tt>share</tt> directory of the AMAAthena package.
  
 
<li/>Users have the following AMA driver configuration files in the <tt>run</tt> directory of the AMAAthena package
 
<li/>Users have the following AMA driver configuration files in the <tt>run</tt> directory of the AMAAthena package
Line 16: Line 180:
 
   </ul>
 
   </ul>
  
<li/>Analysis is performed on an ATLAS dataset: <tt>fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10</tt>
+
You can find them from the <tt>Config</tt> directory of the AMAAthena package.
</ol>
+
 
 +
<li/>After copying them into the <tt>run</tt> directory, modify the <tt>exampleaod.conf</tt> by replacing
 +
 
 +
<pre>
 +
include_file = Config/reader.conf
 +
</pre>
  
== Preparation ==
+
with
<ol>
 
<li>follow the CMT instructions to setup your CMTHOME directory
 
<li>checkout the AMAAthena package from CVS
 
<li>make sure you will start GANGA with a clear environment without any Athena and CMT setup
 
</ol>
 
  
== Starting GANGA ==
 
Typing the following commands within the directory: <tt>PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt</tt>
 
 
<pre>
 
<pre>
% source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF
+
include_file = reader.conf
% export DPNS_HOST=tbn18.nikhef.nl
 
% export LFC_HOST=lfc-atlas.grid.sara.nl
 
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh
 
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef
 
 
</pre>
 
</pre>
  
== GANGA magic functions for cmtsetup ==
+
<li/>Analysis is performed on dataset: <tt>fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10</tt>
 +
</ol>
 +
 
 +
=== GANGA magic functions for cmtsetup ===
 
Inside GANGA, one could deal with the complex CMT setup with two magic functions.
 
Inside GANGA, one could deal with the complex CMT setup with two magic functions.
  
The following example shows how to setup the CMT environment for Athena 14.2.0 in 32 bit mode.
+
The following example shows how to setup the CMT environment for Athena 14.2.20 in 32 bit mode.
  
 
<pre>
 
<pre>
In [n]: config.Athena.CMTHOME = '/your/cmthome'
+
In [n]: config.Athena.CMTHOME = '/path/to/your/cmthome'
In [n]: cmtsetup 14.2.0,32
+
In [n]: cmtsetup 14.2.20,32
 
In [n]: setup
 
In [n]: setup
 
</pre>
 
</pre>
  
== Running AMAAthena in GANGA ==
+
=== GANGA magic function resolving python conflict at CERN ===
 +
This is specific for solving the python conflict between the LCG UI and the ATLAS release at CERN. If you plan to run jobs on lxplus (with the <tt>Local</tt> backend) or lxbatch (with the <tt>LSF</tt> backend), please apply the following magic function before <tt>j.submit()</tt> to resolve the issue:
 +
 
 +
<pre>
 +
In [n]: fixpython
 +
</pre>
  
=== Creating new GANGA job ===
+
=== Creating a new GANGA job ===
 
<pre>
 
<pre>
 
In [n]: j = Job()
 
In [n]: j = Job()
 
</pre>
 
</pre>
  
=== Setting up AMAAthena application ===
+
=== Setting application ===
 +
From the <tt>AMAAthena/cmt</tt> directory, start ganga and do:
 +
 
 
<pre>
 
<pre>
 
In [n]: j.application = AMAAthena()
 
In [n]: j.application = AMAAthena()
In [n]: j.application.option_files += [ File('../run/AMAAthena_jobOptions.py'), File('../run/Trigger_jobOptions.py') ]
+
In [n]: j.application.option_file += [ File('../run/AMAAthena_jobOptions.py'), File('../run/Trigger_jobOptions.py') ]
 
In [n]: j.application.driver_config.config_file = File('../run/exampleaod.conf')
 
In [n]: j.application.driver_config.config_file = File('../run/exampleaod.conf')
 
In [n]: j.application.driver_config.include_file += [ File('../run/reader.conf') ]
 
In [n]: j.application.driver_config.include_file += [ File('../run/reader.conf') ]
In [n]: j.application.max_events = '1000'
+
</pre>
 +
 
 +
Starting from Ganga 5.1.7, you can set the AMA flags using the following way:
 +
 
 +
<pre>
 +
In [n]: j.application.driver_flags = 'MuonSample=1 HasTopQuarks=1 DoDiLepton=1'
 +
</pre>
 +
 
 +
Finally, preparing the application tarball to be shipped to the grid worker node:
 +
 
 +
<pre>
 +
In [n]: j.application.athena_compile=True
 
In [n]: j.application.prepare()
 
In [n]: j.application.prepare()
 
</pre>
 
</pre>
  
=== Setting input datasets ===
+
The line <tt>j.application.athena_compile=True</tt> requires the source code to be compiled on the worker node.  If you want to skip the compilation and run directly the binary code on the worker node, set it to <tt>False</tt>.
<ol>
+
 
<li/>'''StagerDataset'''
+
[[Image:32px-Nuvola apps important.svg.png|16px|Be careful]] Running pre-compiled binary code requires you to compile the source codes in <tt>UserArea</tt> before submitting jobs.
 +
 
 +
=== Setting input data ===
 +
<ul>
 +
<li><tt>'''StagerDataset''' </tt></li>
 +
 
 +
<span style="color:#800000">NB: Please use <tt>'''StagerDataset'''</tt> with '''LSF''' and '''PBS''' backends for local jobs.</span>
  
 
When using the <tt>'''StagerDataset'''</tt>, the AMAAthena job will use the Athena <tt>'''FileStager'''</tt> to copy dataset files from a grid storage.
 
When using the <tt>'''StagerDataset'''</tt>, the AMAAthena job will use the Athena <tt>'''FileStager'''</tt> to copy dataset files from a grid storage.
Line 75: Line 260:
 
</pre>
 
</pre>
  
<li/>'''DQ2Dataset'''
+
You can also use <tt>'''StagerDataset'''</tt> to access the dataset files already existing on local disk. The following example assumes that you have dataset files already sitting in the directory <tt>/project/atlas/data/fdr2</tt>
  
When using the '''DQ2Dataset''', GANGA will handle the dataset file access externally from Athena.
+
<pre>
 +
In [n]: j.inputdata = StagerDataset()
 +
In [n]: j.inputdata.type = 'LOCAL'
 +
In [n]: j.inputdata.dataset = ['/project/atlas/data/fdr2']
 +
</pre>
 +
 
 +
All the files with the name <tt>'*.root*'</tt> in this directory (and sub-directories) will be included.
 +
 
 +
<li><tt>'''DQ2Dataset'''</tt></li>
 +
 
 +
<span style="color:#800000">NB: Please use <tt>'''DQ2Dataset'''</tt> for grid jobs.</span>
 +
 
 +
When using the <tt>'''DQ2Dataset'''</tt>, GANGA will handle the file access externally from Athena.
  
 
<pre>
 
<pre>
 
In [n]: j.inputdata = DQ2Dataset()
 
In [n]: j.inputdata = DQ2Dataset()
 
In [n]: j.inputdata.dataset += [ 'fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10' ]
 
In [n]: j.inputdata.dataset += [ 'fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10' ]
In [n]: j.inputdata.type = 'DQ2_DOWNLOAD'
 
 
</pre>
 
</pre>
</ol>
+
 
 +
The <tt>'''DQ2Dataset'''</tt> supports several types of data access mode. By default, it uses <tt>'''DQ2_LOCAL'''</tt> mode to read events by POSIX I/O via a local protocol.  One can switch to use <tt>'''FILE_STAGER'''</tt> from Ganga 5.1.4 using the following command:
 +
 
 +
<pre>
 +
In [n]: j.inputdata.type = 'FILE_STAGER'
 +
</pre>
 +
 
 +
</ul>
  
 
=== Setting job splitter (optional) ===
 
=== Setting job splitter (optional) ===
 
The examples below ask each subjob to process on 2 files in maximum.
 
The examples below ask each subjob to process on 2 files in maximum.
  
<ol>
+
<ul>
<li/>using '''StagerJobSplitter''' with '''StagerDataset'''
+
<li/>using <tt>'''StagerJobSplitter'''</tt> with <tt>'''StagerDataset'''</tt>
  
 
<pre>
 
<pre>
Line 97: Line 300:
 
</pre>
 
</pre>
  
<li/>using '''DQ2JobSplitter''' with '''DQ2Dataset'''
+
<li/>using <tt>'''DQ2JobSplitter'''</tt> with <tt>'''DQ2Dataset''' for jobs running on LCG</tt>
  
 
<pre>
 
<pre>
Line 104: Line 307:
 
</pre>
 
</pre>
  
</ol>
+
</ul>
  
 
=== Setting computing backend ===
 
=== Setting computing backend ===
 +
 +
<ul>
 +
<li/>using Stoomboot (PBS cluster) at NIKHEF
 +
 +
<pre>
 +
In [n]: j.backend = PBS()
 +
</pre>
 +
 +
For a long running job, please also do
 +
 +
<pre>
 +
In [n]: j.backend.queue = 'qlong'
 +
</pre>
 +
 +
to avoid running over the walltime limitation of the default PBS queue.
  
 
<ol>
 
<ol>
<li/>using '''Stoomboot cluster'''
+
<li>
 +
[[Image:32px-Nuvola apps important.svg.png|16px|Be careful]] Make sure the grid environment is set up automatically every time you start a new shell (eg. in <tt>~/.profile</tt>) - as the <tt>FileStager</tt> needs the LCG tools like <tt>lcg-cp</tt>:
 +
<pre>
 +
. /global/ices/lcg/current/etc/profile.d/grid_env.sh
 +
</pre>
 +
</li>
 +
<li>
 +
[[Image:32px-Nuvola apps important.svg.png|16px|Be careful]] When submitting the job with Ganga, make sure you are working on a machine which has the <tt>qsub</tt> commando available, eg. <tt>ribble</tt> (aka <tt>login</tt>) or <tt>elel22</tt>.
 +
</li>
 +
</li>
 +
<li>
 +
[[Image:32px-Nuvola apps important.svg.png|16px|Be careful]] It is possible to change the LCG site where data is copied from by the FileStager by setting <tt>config.DQ2.DQ2_LOCAL_SITE_ID = 'SARA-MATRIX_MCDISK'</tt>.
 +
</li>
 +
</ol>
 +
 
 +
<li/>using lxbatch (LSF cluster) at CERN
  
 
<pre>
 
<pre>
In [n]: j.backend = PBS()
+
In [n]: j.backend = LSF()
 +
</pre>
 +
 
 +
For a long running job, please also specify the queue name concerning the job's walltime. For example:
 +
 
 +
<pre>
 +
In [n]: j.backend.queue = '8nh'
 
</pre>
 
</pre>
  
<li/>using '''LCG'''
+
<li/>using the world-wide grid (WLCG/EGEE)
 +
 
 
<pre>
 
<pre>
 
In [n]: j.backend = LCG()
 
In [n]: j.backend = LCG()
 
</pre>
 
</pre>
  
 +
[[Image:32px-Nuvola apps important.svg.png|16px|Be careful]] <tt>'''StagerDataset'''</tt> is not yet supported for jobs on LCG. Please using <tt>'''DQ2Dataset'''</tt> instead. For example:
 +
 +
<ol>
 +
<li>
 +
<pre>
 +
In [n]: j.inputdata = DQ2Dataset()
 +
In [n]: j.inputdata.dataset = []
 +
In [n]: j.inputdata.type = 'FILE_STAGER'
 +
</pre>
 +
</li>
 +
</ol>
 +
 +
[[Image:32px-Nuvola apps important.svg.png|16px|Be careful]] Starting from Ganga 5.0.7, jobs submitted to LCG backend require users to specify one of the following requirements:
 +
 +
<ol>
 +
<li>
 +
<pre>
 +
In [n]: j.backend.requirements.cloud = 'NL'
 +
In [n]: j.splitter = DQ2JobSplitter()
 +
</pre>
 +
meaning that <b><i>let Ganga distribute the jobs within a particular computing cloud.</i></b>
 +
</li>
 +
 +
<li>
 +
<pre>
 +
In [n]: j.backend.CE = 'gazon.nikhef.nl:2119/jobmanager-pbs-atlas'
 +
</pre>
 +
meaning that <b><i>I want the job to be run on a particular computing element (I know what I am doing now!!).</i></b>
 +
</li>
 
</ol>
 
</ol>
 +
 +
</ul>
  
 
=== Submitting job ===
 
=== Submitting job ===
Line 129: Line 400:
 
== After job submission ==
 
== After job submission ==
  
== Working in progress ==
+
=== Checking job status ===
 +
GANGA automatically polls the up-to-date status of your jobs and updates local repository accordingly. A notification will pop up to the user when the job status is changed.
 +
 
 +
In addition, you can get a job summary table by:
 +
 
 +
<pre>
 +
In [n]: jobs
 +
</pre>
 +
 
 +
or a summary table for subjobs:
 +
 
 +
<pre>
 +
In [n]: j.subjobs
 +
</pre>
 +
 
 +
=== Result and output merging ===
 +
For the moment, the completed (sub-)job returns an root summary file. The file is stored in the <tt>summary</tt> sub-directory in the job's output directory.
 +
 
 +
For jobs using <tt>'''StagerJobSplitter'''</tt>, the <tt>'''RootMerger'''</tt> is automatically attached with the job so that when the whole job is <tt>completed</tt>, the summary root files from sub-jobs are merged together.
 +
 
 +
For jobs using <tt>'''DQ2Dataset'''</tt>, the merging process can be done manually when the whole job is <tt>completed</tt>. For example, assuming each sub-job produces a root summary file called <tt>summary/summary_mySample_confFile_exampleaod.conf_nEvts_1000.root</tt>. To merge them, one can do:
 +
 
 +
<pre>
 +
In [n]: merger = RootMerger()
 +
In [n]: merger.files += ['summary/summary_mySample_confFile_exampleaod.conf_nEvts_1000.root']
 +
In [n]: merger.overwrite = True
 +
In [n]: merger.ignorefailed = True
 +
In [n]: merger.merge(j)
 +
</pre>
 +
 
 +
The merged root file has the same name and it will be created in the job's outputdir.
 +
 
 +
=== Killing and removing jobs ===
 +
You can kill a job by calling
 +
 
 +
<pre>
 +
In [n]: j.kill()
 +
</pre>
 +
 
 +
or remove a job by
 +
 
 +
<pre>
 +
In [n]: j.remove()
 +
</pre>
 +
 
 +
== Helper scripts ==
 +
 
 +
== Advance usage ==
 +
=== Changing the default <tt>gangadir</tt> ===
 +
For each job, Ganga maintains the associate files (e.g. job's inputs, outputs, metadata, etc.) in <tt>gangadir</tt>. This may take space (or disk quota) if you have many jobs in Ganga. You may want Ganga to keep those files in another directory where more space is available. To do so, open the <tt>~/.gangarc</tt> file and change the directory as the following:
 +
 
 +
<pre>
 +
gangadir = /project/atlas/Users/yourusernamehere/gangadir
 +
</pre>
 +
 
 +
=== Restricting max. number of events ===
 +
<pre>
 +
In [n]: j.application.max_events = '1000'
 +
</pre>
 +
 
 +
=== Running on more than one dataset ===
 +
The '''<tt>StagerDataset</tt>''' supports wildcard specification in the dataset name. For example, if you want to run on all FDR2 Muon stream datasets, you can set the inputdata like the following:
 +
 
 +
<pre>
 +
In [n]: j.inputdata.dataset += ['fdr08_run2*physics_Muon*']
 +
</pre>
 +
 
 +
=== Dealing with failed sub-jobs ===
 +
It's very possible to have some failed sub-jobs. In this case, GANGA reports the whole job as failed. There is no necessary to resubmit the whole job, you can just resubmit the failed subjobs. Assuming you have a failed job, <tt>j</tt>:
 +
 
 +
<pre>
 +
In [n]: j.subjobs.select(status='failed').resubmit()
 +
</pre>
 +
 
 +
=== Failing jobs manually ===
 +
Some unexpected issues in the job may cause Ganga unable to update the job status to <tt>failed</tt> as it should be. In this case, you can manually fail the job in force
 +
 
 +
<pre>
 +
In [n]: j.force_status("failed", force=True)
 +
</pre>
 +
 
 +
This can avoid Ganga to keep polling the status of the problematic job which may be gone from the backend system.
 +
 
 +
=== The basic trouble shooting ===
 +
GANGA tries to bring the stdout/err back to the client side even when the job is failed remotely on the Grid. So for the failed jobs, you can check them as the following for trouble shooting:
 +
 
 +
<pre>
 +
In [n]: j.peek('stdout','less')
 +
In [n]: j.peek('stderr','cat')
 +
</pre>
 +
 
 +
or
 +
 
 +
<pre>
 +
In [n]: j.peek('stdout.gz','zcat')
 +
In [n]: j.peek('stdout.gz','zcat')
 +
</pre>
 +
 
 +
for the LCG jobs.
 +
 
 +
 
 +
 
 +
=== Distributed analysis user support ===
 +
You can send distributed analysis issues to a single support point: [[hn-atlas-dist-analysis-help@cern.ch]]
 +
 
 +
== Update on new features ==
 +
=== Ganga release > 5.4.0 ===
 +
==== uploading AMA outputs on grid storage ====
 +
By default, the AMA outputs (histograms, ntuples) in the <tt>summary</tt> directory will be shipped back to the user with the job (i.e. files are packed in the job's outputsandbox).  This approach assumes that the output files are not very big (e.g. < 10 MB).
 +
 
 +
However, in the case of ntuple dumping, the outputs can grow rapidly so that the outputsandbox mechanism to deliver output data may not be a proper approach (e.g. one can easily fill-up the disk on WMS in no time).  One proper approach is to store the output files on a grid storage and retrieve it later using DQ2 client.  This gives several benefits:
 +
 
 
<ul>
 
<ul>
<li/>supporting StagerDataset for jobs on the grid (LCG/NG)
+
  <li>Output data are stored on the Grid, meaning that if this is a intermediate output of the analysis, you can save local space by assigning it as inputs to the following grid jobs.</li>
 +
  <li>Output data are managed by DDM. It gives the existing dataset management features to those outputs.</li>
 +
  <li> ... </li>
 
</ul>
 
</ul>
  
== Advanced usage ==
+
To use this feature, simply set
 +
 
 +
<pre>
 +
In [n]: j.outputdata = DQ2Dataset()
 +
</pre>
 +
 
 +
before job submission.
 +
 
 +
Once the job is completed, you will see that <tt>j.outputdata</tt> is filled up with dataset information of the output files. You can trigger the output downloading immediately in Ganga using:
 +
 
 +
<pre>
 +
In [n]: j.outputdata.retrieve()
 +
</pre>
 +
 
 +
or using a separate <tt>dq2-get</tt> command to get it later.
 +
 
 +
==== PANDA backend support ====
 +
Job submission to PANDA is required for running grid jobs on US sites. During STEP09 test, most of the EGEE and NorduGrid sites also deployed PANDA queues so you can also run your jobs on EGEE and NorduGrid sites through PANDA.
 +
 
 +
<span style="color:#800000">To submit AMAAthena jobs to PANDA, you also need <tt>AMAAthena > 00-01-34</tt> where a so-called <tt>CleanupAlg</tt> is introduced to pack the <tt>summary</tt> directory into a single tarball.</span>
 +
 
 +
Please also note that <span style="color:#800000">PANDA always stores job's output on a grid storage </span> so you have to retrieve it later on using either <tt>j.outputdata.retrieve()</tt> in Ganga or <tt>dq2-get</tt> command.
 +
 
 +
To use PANDA backend, one just needs to set the backend object to <tt>Panda()</tt>.  For example,
 +
 
 +
<pre>
 +
In [n]: j.backend = Panda()
 +
</pre>
 +
 
 +
By default, the job is assigned to <tt>'US'</tt> cloud.  You can change it by the following example:
 +
 
 +
<pre>
 +
In [n]: j.backend.requirements.cloud = 'NL'
 +
</pre>
 +
 
 +
It would be recommended that you leave <tt>j.backend.site</tt> to <tt>'AUTO'</tt>; however, if you have idea on what you want to do, it can be set to a specific PANDA site. The site name can be found here: http://panda.cern.ch:25980/server/pandamon/query?overview=showSiteStatusTable
  
 
== More information ==
 
== More information ==
 +
<ul>
 +
<li/>[http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/ GANGA workbook ]
 +
<li/>[https://twiki.cern.ch/twiki/bin/view/Atlas/DistributedAnalysisUsingGanga GANGA tutorials for ATLAS users ]
 +
<li/>[http://ddm-build.cern.ch/ddm/build/testing/doc/guides/dq2-clientapi-cli/html/user/enduser.html The users' guide of the DQ2 enduser tools ]
 +
</ul>
 +
 +
--[[User:Hclee|Hclee]] 16:17, 13 Aug 2008 (MET DST)

Latest revision as of 13:53, 13 January 2010

This page is out-dated!!

Please refer to the new tutorial page: Ganga: running AMAAthena

Introduction

This guide provides step-by-step instructions for running AMAAthena through GANGA. Users will run GANGA on a NIKHEF desktop (e.g. elel22.nikhef.nl) and submit AMAAthena jobs to Stoomboot (a PBS cluster) and to the LCG.

AMAAthena is an Athena package providing a framework for modular analysis. GANGA is an official tool for ATLAS distributed data analysis.

Preparation

Please follow https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/PhysicsAnalysisTools?topic=AMAMainPage to setup CMT and checkout AMAAthena package.

Starting GANGA session

Typing the following commands within the directory: PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt in a clean shell environment (i.e. no environment setup for Athena and CMT).

  • For NIKHEF users
    % source /project/atlas/nikhef/dq2/dq2_setup.sh.NIKHEF
    % export DPNS_HOST=tbn18.nikhef.nl
    % export LFC_HOST=lfc-atlas.grid.sara.nl
    % source /project/atlas/nikhef/ganga/etc/setup.[c]sh
    % ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef
    

    Every time you start with a clean shell, and you'll need to setup ganga with the lines given right above.

  • For CERN lxplus users
    % source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh
    % ganga
    

    More detail for CERN users can be found here: http://ganga.web.cern.ch/ganga/user/index.php

The last command loads a system-wide ATLAS-specific configuration for your Ganga session. You can override the system-wide configuration by providing a ~/.gangarc file. The template of the ~/.gangarc file can be generated by:

% ganga -g

If you see the following prompt:

*** Welcome to Ganga ***
Version: Ganga-5-1-1
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.

In [1]:

you are already in a GANGA session. The GANGA session is actually an IPython shell with GANGA specific extensions (modules), meaning that you can do programming (python only, of course) inside the GANGA session.

Leaving GANGA session

To quit from a GANGA session, just press CTRL-D.

Getting familiar with GANGA

My first Grid job running a HelloWorld shell script

Now go to your project directory

cd /project/atlas/Users/yourusernamehere

and create 'myscript.sh'

#!/bin/sh
echo 'myscript.sh running...'
echo "----------------------"
/bin/hostname
echo "HELLO PLANET!"
echo "----------------------"

and the file 'gangaScript.py'. Do not forget to modify the following to your directory structure+

In[n]: j = Job()
In[n]: j.application=Executable()
In[n]: j.application.exe=File('/project/atlas/Users/yourusernamehere/myscript.sh')
In[n]: j.backend=LCG()
In[n]: j.submit() 

This Ganga Job means the following

  * Line 1 defines the job
  * Line 2 sets it as an Executable
  * Line 3 tell which file to run
  * Line 4 Tell where the job should run
  * Line 5 submits the job

The imprtant point is here that we have chosen LCG() as backend, i.e. the script will be executed on the grid. Now start ganga again and submit the job to the LCG-grid

In[n]: execfile("./gangaScript.py")

the status of the job can be monitored with

In[n]: jobs

After the job is submitted, GANGA is now responsible for monitoring your jobs when it's still running; and for downloading output files (e.g. stdout/stderr) to the local machine when the job is finished.

When your job is completed, the job's output is automatically fetched from the Grid and stored in your gangadir directory. The exact output location can be found by:

In[n]: j.outputdir
Out[n]: /project/atlas/Users/yourusernamehere/gangadir/workspace/yourusernamehere/LocalAMGA/0/output

if 0 was the job ID. This was our first grid-job submitted via ganga!

Working with historical jobs

GANGA internally archive your previously submitted jobs (historical jobs) in the local job repository (i.e. gangadir) so that you don't have to do bookkeeping by yourself. You can freely get in/out GANGA and still have your historical jobs ready for your future work.

The first thing to work with your historical job is to get the job instance from the repository as the following:

In [n]: jobs
Out[1]: 
Job slice:  jobs (12 jobs)
--------------
# fqid      status        name   subjobs      application          backend  backend.actualCE                                                 
#   17   submitted                  1000       Executable              LCG                                                 
#   18   submitted                  2000       Executable              LCG                                                                                     
#   20   completed                    10       Executable              LCG
#   28   submitted                             Executable              LCG
#   29   submitted    test_lcg                 Executable              LCG                                               

The table above lists the historical jobs in your GANGA repository indexed by fqid. For example, if you are interested in the job with id 29, you can get the job instance by

In [n]: j = jobs(29)

then you are all set to work with the job.

Please note that you CANNOT change the attributes of a historical job.

More GANGA jobs to run on different platforms

Now try the following commands in the Ganga shell to gets your hands dirty :) Try to find where the second job runs.

In [n]: j = Job()
In [n]: j.backend=Local()
In [n]: j.submit()
In [n]: jobs

In [n]: j = j.copy()
In [n]: j.backend=PBS()
In [n]: j.submit()
In [n]: jobs

Running AMAAthena in GANGA

The example below assumes:

  1. Users have the following Athena job option files in the run directory of the AMAAthena package
    • AMAAthena_jobOptions.py
    • Trigger_jobOptions.py

    You can find them from the share directory of the AMAAthena package.

  2. Users have the following AMA driver configuration files in the run directory of the AMAAthena package
    • exampleaod.conf
    • reader.conf

    You can find them from the Config directory of the AMAAthena package.

  3. After copying them into the run directory, modify the exampleaod.conf by replacing
    include_file = Config/reader.conf
    

    with

    include_file = reader.conf
    
  4. Analysis is performed on dataset: fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10

GANGA magic functions for cmtsetup

Inside GANGA, one could deal with the complex CMT setup with two magic functions.

The following example shows how to setup the CMT environment for Athena 14.2.20 in 32 bit mode.

In [n]: config.Athena.CMTHOME = '/path/to/your/cmthome'
In [n]: cmtsetup 14.2.20,32
In [n]: setup

GANGA magic function resolving python conflict at CERN

This is specific for solving the python conflict between the LCG UI and the ATLAS release at CERN. If you plan to run jobs on lxplus (with the Local backend) or lxbatch (with the LSF backend), please apply the following magic function before j.submit() to resolve the issue:

In [n]: fixpython

Creating a new GANGA job

In [n]: j = Job()

Setting application

From the AMAAthena/cmt directory, start ganga and do:

In [n]: j.application = AMAAthena()
In [n]: j.application.option_file += [ File('../run/AMAAthena_jobOptions.py'), File('../run/Trigger_jobOptions.py') ]
In [n]: j.application.driver_config.config_file = File('../run/exampleaod.conf')
In [n]: j.application.driver_config.include_file += [ File('../run/reader.conf') ]

Starting from Ganga 5.1.7, you can set the AMA flags using the following way:

In [n]: j.application.driver_flags = 'MuonSample=1 HasTopQuarks=1 DoDiLepton=1'

Finally, preparing the application tarball to be shipped to the grid worker node:

In [n]: j.application.athena_compile=True
In [n]: j.application.prepare()

The line j.application.athena_compile=True requires the source code to be compiled on the worker node. If you want to skip the compilation and run directly the binary code on the worker node, set it to False.

Be careful Running pre-compiled binary code requires you to compile the source codes in UserArea before submitting jobs.

Setting input data

  • StagerDataset
  • NB: Please use StagerDataset with LSF and PBS backends for local jobs. When using the StagerDataset, the AMAAthena job will use the Athena FileStager to copy dataset files from a grid storage.
    In [n]: j.inputdata = StagerDataset()
    In [n]: j.inputdata.dataset += [ 'fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10' ]
    

    You can also use StagerDataset to access the dataset files already existing on local disk. The following example assumes that you have dataset files already sitting in the directory /project/atlas/data/fdr2

    In [n]: j.inputdata = StagerDataset()
    In [n]: j.inputdata.type = 'LOCAL'
    In [n]: j.inputdata.dataset = ['/project/atlas/data/fdr2']
    

    All the files with the name '*.root*' in this directory (and sub-directories) will be included.

  • DQ2Dataset
  • NB: Please use DQ2Dataset for grid jobs. When using the DQ2Dataset, GANGA will handle the file access externally from Athena.
    In [n]: j.inputdata = DQ2Dataset()
    In [n]: j.inputdata.dataset += [ 'fdr08_run2.0052280.physics_Muon.merge.AOD.o3_f8_m10' ]
    

    The DQ2Dataset supports several types of data access mode. By default, it uses DQ2_LOCAL mode to read events by POSIX I/O via a local protocol. One can switch to use FILE_STAGER from Ganga 5.1.4 using the following command:

    In [n]: j.inputdata.type = 'FILE_STAGER'
    

Setting job splitter (optional)

The examples below ask each subjob to process on 2 files in maximum.

  • using StagerJobSplitter with StagerDataset
    In [n]: j.splitter = StagerJobSplitter()
    In [n]: j.splitter.numfiles = 2
    
  • using DQ2JobSplitter with DQ2Dataset for jobs running on LCG
    In [n]: j.splitter = DQ2JobSplitter()
    In [n]: j.splitter.numfiles = 2
    

Setting computing backend

  • using Stoomboot (PBS cluster) at NIKHEF
    In [n]: j.backend = PBS()
    

    For a long running job, please also do

    In [n]: j.backend.queue = 'qlong'
    

    to avoid running over the walltime limitation of the default PBS queue.

    1. Be careful Make sure the grid environment is set up automatically every time you start a new shell (eg. in ~/.profile) - as the FileStager needs the LCG tools like lcg-cp:
      . /global/ices/lcg/current/etc/profile.d/grid_env.sh
      
    2. Be careful When submitting the job with Ganga, make sure you are working on a machine which has the qsub commando available, eg. ribble (aka login) or elel22.
    3. Be careful It is possible to change the LCG site where data is copied from by the FileStager by setting config.DQ2.DQ2_LOCAL_SITE_ID = 'SARA-MATRIX_MCDISK'.
  • using lxbatch (LSF cluster) at CERN
    In [n]: j.backend = LSF()
    

    For a long running job, please also specify the queue name concerning the job's walltime. For example:

    In [n]: j.backend.queue = '8nh'
    
  • using the world-wide grid (WLCG/EGEE)
    In [n]: j.backend = LCG()
    

    Be careful StagerDataset is not yet supported for jobs on LCG. Please using DQ2Dataset instead. For example:

    1. In [n]: j.inputdata = DQ2Dataset()
      In [n]: j.inputdata.dataset = []
      In [n]: j.inputdata.type = 'FILE_STAGER'
      

    Be careful Starting from Ganga 5.0.7, jobs submitted to LCG backend require users to specify one of the following requirements:

    1. In [n]: j.backend.requirements.cloud = 'NL'
      In [n]: j.splitter = DQ2JobSplitter()
      

      meaning that let Ganga distribute the jobs within a particular computing cloud.

    2. In [n]: j.backend.CE = 'gazon.nikhef.nl:2119/jobmanager-pbs-atlas'
      

      meaning that I want the job to be run on a particular computing element (I know what I am doing now!!).

Submitting job

In [n]: j.submit()

After job submission

Checking job status

GANGA automatically polls the up-to-date status of your jobs and updates local repository accordingly. A notification will pop up to the user when the job status is changed.

In addition, you can get a job summary table by:

In [n]: jobs

or a summary table for subjobs:

In [n]: j.subjobs

Result and output merging

For the moment, the completed (sub-)job returns an root summary file. The file is stored in the summary sub-directory in the job's output directory.

For jobs using StagerJobSplitter, the RootMerger is automatically attached with the job so that when the whole job is completed, the summary root files from sub-jobs are merged together.

For jobs using DQ2Dataset, the merging process can be done manually when the whole job is completed. For example, assuming each sub-job produces a root summary file called summary/summary_mySample_confFile_exampleaod.conf_nEvts_1000.root. To merge them, one can do:

In [n]: merger = RootMerger()
In [n]: merger.files += ['summary/summary_mySample_confFile_exampleaod.conf_nEvts_1000.root']
In [n]: merger.overwrite = True
In [n]: merger.ignorefailed = True
In [n]: merger.merge(j)

The merged root file has the same name and it will be created in the job's outputdir.

Killing and removing jobs

You can kill a job by calling

In [n]: j.kill()

or remove a job by

In [n]: j.remove()

Helper scripts

Advance usage

Changing the default gangadir

For each job, Ganga maintains the associate files (e.g. job's inputs, outputs, metadata, etc.) in gangadir. This may take space (or disk quota) if you have many jobs in Ganga. You may want Ganga to keep those files in another directory where more space is available. To do so, open the ~/.gangarc file and change the directory as the following:

gangadir = /project/atlas/Users/yourusernamehere/gangadir

Restricting max. number of events

In [n]: j.application.max_events = '1000'

Running on more than one dataset

The StagerDataset supports wildcard specification in the dataset name. For example, if you want to run on all FDR2 Muon stream datasets, you can set the inputdata like the following:

In [n]: j.inputdata.dataset += ['fdr08_run2*physics_Muon*']

Dealing with failed sub-jobs

It's very possible to have some failed sub-jobs. In this case, GANGA reports the whole job as failed. There is no necessary to resubmit the whole job, you can just resubmit the failed subjobs. Assuming you have a failed job, j:

In [n]: j.subjobs.select(status='failed').resubmit()

Failing jobs manually

Some unexpected issues in the job may cause Ganga unable to update the job status to failed as it should be. In this case, you can manually fail the job in force

In [n]: j.force_status("failed", force=True)

This can avoid Ganga to keep polling the status of the problematic job which may be gone from the backend system.

The basic trouble shooting

GANGA tries to bring the stdout/err back to the client side even when the job is failed remotely on the Grid. So for the failed jobs, you can check them as the following for trouble shooting:

In [n]: j.peek('stdout','less')
In [n]: j.peek('stderr','cat')

or

In [n]: j.peek('stdout.gz','zcat')
In [n]: j.peek('stdout.gz','zcat')

for the LCG jobs.


Distributed analysis user support

You can send distributed analysis issues to a single support point: hn-atlas-dist-analysis-help@cern.ch

Update on new features

Ganga release > 5.4.0

uploading AMA outputs on grid storage

By default, the AMA outputs (histograms, ntuples) in the summary directory will be shipped back to the user with the job (i.e. files are packed in the job's outputsandbox). This approach assumes that the output files are not very big (e.g. < 10 MB).

However, in the case of ntuple dumping, the outputs can grow rapidly so that the outputsandbox mechanism to deliver output data may not be a proper approach (e.g. one can easily fill-up the disk on WMS in no time). One proper approach is to store the output files on a grid storage and retrieve it later using DQ2 client. This gives several benefits:

  • Output data are stored on the Grid, meaning that if this is a intermediate output of the analysis, you can save local space by assigning it as inputs to the following grid jobs.
  • Output data are managed by DDM. It gives the existing dataset management features to those outputs.
  • ...

To use this feature, simply set

In [n]: j.outputdata = DQ2Dataset()

before job submission.

Once the job is completed, you will see that j.outputdata is filled up with dataset information of the output files. You can trigger the output downloading immediately in Ganga using:

In [n]: j.outputdata.retrieve()

or using a separate dq2-get command to get it later.

PANDA backend support

Job submission to PANDA is required for running grid jobs on US sites. During STEP09 test, most of the EGEE and NorduGrid sites also deployed PANDA queues so you can also run your jobs on EGEE and NorduGrid sites through PANDA.

To submit AMAAthena jobs to PANDA, you also need AMAAthena > 00-01-34 where a so-called CleanupAlg is introduced to pack the summary directory into a single tarball.

Please also note that PANDA always stores job's output on a grid storage so you have to retrieve it later on using either j.outputdata.retrieve() in Ganga or dq2-get command.

To use PANDA backend, one just needs to set the backend object to Panda(). For example,

In [n]: j.backend = Panda()

By default, the job is assigned to 'US' cloud. You can change it by the following example:

In [n]: j.backend.requirements.cloud = 'NL'

It would be recommended that you leave j.backend.site to 'AUTO'; however, if you have idea on what you want to do, it can be set to a specific PANDA site. The site name can be found here: http://panda.cern.ch:25980/server/pandamon/query?overview=showSiteStatusTable

More information

--Hclee 16:17, 13 Aug 2008 (MET DST)