Ganga with AMAAthena

From Atlas Wiki
Jump to navigation Jump to search

Introduction

This page will describe how to run AMAAthena jobs with Ganga on different computing platforms (Stoomboot, lxbatch, Grid).

Preparation

  • Make sure you can run AMAAthena standalone on local desktop. Here are instructions about doing it at NIKHEF: Using Athena at NIKHEF
  • Make sure you manage to submit HelloWorld jobs to different computing platforms. Here are instructions: Ganga: basic usage

Starting Ganga

Before starting Ganga, set CMT environment properly. Here is the example commands presuming that you have the setup scripts for CMT in $HOME/cmthome directory.

% source $HOME/cmthome/setup.sh -tag=15.6.1,32
% source $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/cmt/setup.sh

The start Ganga in $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run directory.

% cd $TestArea/PhysicsAnalysis/AnalysisCommon/AMA/AMAAthena/run
% source /project/atlas/nikhef/ganga/etc/setup.[c]sh
% ganga --config-path=/project/atlas/nikhef/ganga/config/Atlas.ini.nikhef

Templates for quick start

There are ready-to-go Ganga scripts made for this tutorial:

  • ama_d3pd_maker.local.gpi:
  • ama_d3pd_maker.pbs.gpi:
  • ama_d3pd_maker.lcg.gpi:
  • ama_d3pd_maker.panda.gpi:

You can execute one of them to generate a new Ganga job, for example:

In [n]: execfile('ama_d3pd_maker.lcg.gpi')
In [n]: j.submit()

The details in those scripts are explained in the following sections in this wiki.

Ganga jobs by yourself

Empty Ganga job creation

To create a new job in Ganga, do

In [n]: j = Job()

and you can set job's name as

In [n]: j.name = 'my_ama_job'

Application configuration

AMAAthena is an Athena "Algorithm", so you can just use the Athena application object in Ganga to configure your AMAAthena job. However, there are steps to be done before setting the Athena application object in Ganga:

  1. copy the top-level job option file of AMAAthena to your working directory:
    • with AutoConfiguration
      % get_files -jo AMAAthena_jobOptions_AUTO.py
    • without AutoConfiguration
      % get_files -jo AMAAthena_jobOptions_new.py
  2. convert user-level AMA configuration file into a Athena job option file. For example, if you have a configuration file called
    % AMAConfigfileConverter data09_wjet_muon_sel.conf data09_wjet_muon_sel.py
  3. create a runtime definition job option called rundef.py and edit it as the following example:
    SampleName = 'data09_900GeV_00140541_MuonswBeam'
    ConfigFile = 'data09_wjet_muon_sel.py'
    FlagList = ''
    EvtMax = -1
    AMAAthenaFlags = ['DATA', 'TRIG']
    

    The variables in rundef.py is explained in the following:

    • SampleName: the user defined sample name. This name will be used in composing the AMA summary output files.
    • ConfigFile: the job option file name converted from the user-level configuration file (the output of step 2)
    • FlagList:
    • EvtMax: the maximum number of event to be processed in the job
    • AMAAthenaFlags: the additional AMA job option files to be included by the main AMA job option file. This is ignored if using AutoConfiguration.

Once you have the above steps done, you can proceed in Ganga to set up the Athena application:

In [n]: j.application = Athena()
In [n]: j.application.max_events = -1
In [n]: j.application.option_file += [ File('rundef.py'), File('AMAAthena_jobOptions_AUTO.py') ]
In [n]: j.application.prepare()

InputDataset configuration

It is encouraged to enable FileStager with your analysis job as it has been proved to be more efficient in majority of cases. To do so, there are two InputDataset object to use in Ganga depending on where you submit your jobs.

  • StagerDataset for local jobs
  • Presuming you have a dataset "data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268" located at "NIKHEF-ELPROD_DATADISK", you can set the inputdata attribute of the Ganga job object as the following:
    In [n]: config.DQ2.DQ2_LOCAL_SITE_ID = 'NIKHEF-ELPROD_DATADISK'
    In [n]: j.inputdata = StagerDataset()
    In [n]: j.inputdata.type = 'DQ2'
    In [n]: j.inputdata.dataset += [ 'data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268' ]
    

    Remarks

    • Always use StagerDataset with LSF and PBS backends for local jobs.
    • StagerDataset is aimed for copying files from local storage. You need to find the local location of the dataset in terms of DDM site name and set it properly in Ganga by config.DQ2.DQ2_LOCAL_SITE_ID

    You can also use StagerDataset to access the dataset files already existing on local disk. The following example assumes that you have dataset files already sitting in the directory /data/atlas3/users/hclee/data/data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268

    In [n]: j.inputdata = StagerDataset()
    In [n]: j.inputdata.type = 'LOCAL'
    In [n]: j.inputdata.dataset = ['/data/atlas3/users/hclee/data/data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268']
    


  • DQ2Dataset for grid jobs
  • Presuming you want to run on a dataset data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268 on the grid, you can set the InputDataset object as the following in Ganga:
    In [n]: j.inputdata = DQ2Dataset()
    In [n]: j.inputdata.dataset += [ 'data09_900GeV.00140541.physics_MuonswBeam.merge.AOD.f170_m268' ]
    In [n]: j.inputdata.type = 'FILE_STAGER'
    

    Remarks

    • Always use DQ2Dataset with Panda and LCG backends.

Splitter configuration

The examples below ask each subjob to process on 2 files in maximum.

  • using StagerJobSplitter with StagerDataset
    In [n]: j.splitter = StagerJobSplitter()
    In [n]: j.splitter.numfiles = 2
    
  • using DQ2JobSplitter with DQ2Dataset for jobs running on LCG
    In [n]: j.splitter = DQ2JobSplitter()
    In [n]: j.splitter.numfiles = 2
    

Backend (platform) configuration

You should be able to switch to different computing platform (Backend in the Ganga terminology) by simply change the backend attribute of a job object. The available backends are:

  • Local: for running jobs locally right on your desktop
  • PBS: for running jobs on a PBS-based computer cluster (e.g. the Stoomboot)
  • LSF: for running jobs on a LSF-based computer cluster (e.g. lxbatch@CERN)
  • LCG: for running jobs on the grid (EGEE sites), jobs are brokered by gLite Workload Management System (WMS)
  • Panda: for running jobs on the grid (EGEE, OSG, NorduGrid sites), jobs are brokered by Panda

For example, to switch to submit jobs to the grid through Panda:

In [n]: j.backend = Panda()

Local

PBS (example for Stoomboot)

LSF (example for lxbatch@CERN)

LCG

Panda