Using ganga at NIKHEF
This page is under construction. The full text of this can be found here
Setting up ganga
You need an afs ticket to run ganga. Also, you need a grid certificate, and you need to setup the grid, as described in the DQ2 at Nikhef wiki. At the same time, assuming you set up the GRID tools according to Martijn’s Wiki, COMMENT OUT THE LINE: source /project/atlas/nikhef/dq2/dq2_setup.csh.NIKHEF If you setup the GRID tools in some other way, make sure the grid tools environment is not loaded. GANGA AND GRID TOOLS ENVIRONMENT CLASH! Apparently, it is a mismatch between the grod tools environment and the Athena environment. You can add the line to an alias or whatever, if you wish. Then setup ATHENA at NIKHEF as described in athena 12.0.6 Wiki.
To setup ganga, add the two following lines to the .cshrc:
setenv GANGA_CONFIG_PATH GangaAtlas/Atlas.ini
#for the local installation
#set path = (/public/public_linux/Ganga/install/4.3.0/bin/ $path)
#for the newest version installed on afs
set path = (/afs/cern.ch/sw/ganga/install/4.3.1/bin/ $path)
setenv LFC_HOST ’lfc03.nikhef.nl’
setenv LCG_CATALOG_TYPE lfc
or if you are working in a sh based shell (such as bash):
export GANGA\_CONFIG\_PATH=GangaAtlas/Atlas.ini
#for the local installation
#PATH=/public/public_linux/Ganga/install/4.3.0/bin/:${PATH}
#for the newest version installed on afs
PATH=/afs/cern.ch/sw/ganga/install/4.3.1/bin/:${PATH}
source LFC_HOST=’lfc03.nikhef.nl’
source LCG_CATALOG_TYPE=lfc
The first time ganga runs, it will ask to create a configuration file $HOME/.gangarc.
Answer yes, and edit the config file as follows:
-
In the section labelled [LCG] uncomment the line:
VirtualOrganisation = atlas
and add the line
DefaultSE = tbn18\.nikhef\.nl -
In the section labeled [Athena] uncomment the line:
# local path to base paths of dist-kits (lxplus example)
ATLAS_SOFTWARE = /data/atlas/offline/ -
In the section labeld [ROOT] uncomment and edit the lines:
location = /data/atlas/offline/12.0.6/sw/lcg/external/root/
version = 5.10.00e
arch = slc3_ia3_gcc323
Running ganga
You can start the ganga CLI by typing ganga on the commandline. This starts a python interface, where you can start defining your jobs. There are a few commands you can use to get around in ganga:
- jobs: Lists all the jobs that are defined in ganga. You can get to an indivudual job by typing:
- jobs[id]: where the id is listed in the second column of the jobs output.
One thing you can do with a job is view it’s status:
jobs[1].status()
This can be ’new’, ’submitted’, ’running’ or ’completed’. Once the job is
completed, you can view it’s output (which is stored by default in
$HOME/gangadir/workspace/Local/ jobid /output) by typing:
In [25]: jobs[0].peek()
Or look at a specific output file by typing:
In [25]: jobs[0].peek(’stderr’,’less’)
where stderr is the name of the file you want to view, and less the program
to view it with. You can kill a job using the kill() method, and remove it
from the jobs list with the remove() method. The most important command
by far is help(). This starts the interactive help program of ganga. After
typing it, you get a help> prompt. Typing index gives you a list of all possible
help subjects. The explanations are rather brief, but it does help you to find
methods of build-in classes of Ganga and it’s plugin. For instance, the atlas
plugin defines classes like DQ2Dataset. For more info on DQ2Dataset you
type DQ2Dataset at the help> prompt.
Running a simple Job
This little piece of code runs a Hello World Job on the LCG grid:
In [0] : j=Job()
In [1] : j.application=Executable(exe=’/bin/echo’,args=[’Hello World’])
In [2] : j.backend=LCG()
In [3] : j.submit()
The application that is run here is a UNIX executable. LCG() is another predefined class that takes care of a lot of details of submitting to the grid. After it is finished, you can type:
In[4] : j.peek(’stdout’,’cat’))
Which will output the expected ”Hello World”. You can also put these lines in a script my script.py, and at the ganga prompt type:
In [4]: execfile(’my_script.py’)
Running an ATHENA job
Running an athena job, storing the output files into a dq2 dataset, requires a bit more work, but still it is not hard. The following script defines a Athena job, splits the job so that there is one job (and hence one outputfile) per inputfile, runs athena with the TopView localOverride.py jobOptions, and stores the output on the grid in a DQ2 dataset called testing Ganga V9.
#Define the ATHENA job
j = Job()
j.name=’TopView Standard Job, Ganga 4.3.1’
j.application=Athena()
j.application.prepare(athena_compile=True)
j.application.option_file=’/project/atlas/users/fkoetsve/TestArea1206/PhysicsAnalysis/TopPhys/TopView/TopView-00-12-12-02/run/LocalOverride_Nikhef_BASIC.py’
#j.application.max_events=’20’
j.splitter=AthenaSplitterJob()
j.splitter.numsubjobs=396
#The merger can be used to merge al the output files into one. See the ganga ATLAS Twiki for details
#j.merger=AthenaOutputMerger()
#Define the inputdata
j.inputdata=DQ2Dataset()
j.inputdata.dataset="trig1_misal1_mc12.005201.Mcatnlo_jim_top_pt200.recon.AOD.v12000601"
#To send job to complete and incomplete dataset location sources, uncomment either the next line, or the line after that
#j.inputdata.min_num_files=100
#j.inputdata.match_ce_all=True
j.inputdata.type=’DQ2_LOCAL’
#Define outputdata
#j.outputdata=ATLASOutputDataset()
j.outputdata=DQ2OutputDataset()
j.outputdata.datasetname=’testing_Ganga_V9’
j.outputdata.outputdata=[’TopViewAANtuple.root’]
#j.outputdata.location=’NIKHEF’
#j.outputsandbox=[’TopViewAANtuple.root’]
#Submit
j.backend=LCG()
j.backend.CE=’ce-fzk.gridka.de:2119/jobmanager-pbspro-atlas’
#j.inputsandbox=[’my_extra_file’ ]
j.application.exclude_from_user_area = []
j.submit()
Explanation of the terms:
- j.Application=Athena(): Defines the job to be an Athena job. Packs the local installation of athena packages, and sends them with the job. The groupArea tag of the athena setup, used e.g. for TopView, does not work (yet). Instead, all the packages defined in the groupArea tag must be installed locally and packed with the job
-
j.splitter=AthenaSplitterJob(): To get one outputfile per input-
file, as must be done to keep naming of files consistent when going
from AOD to NTuple, you need the job to be split in as many sub jobs
as there are inputfiles. You need this splitter plugin to do that. and
set j.splitter.numsub jobs to the number of inputfiles. You can get this
number by typing:
d=DQ2Dataset(dataset=’datasetname’)
d.list_locations_num_files() which gives the number of files of a given dataset at each location - j.merger: can be used to merge all the outputfiles into one
- j.inputdata=DQDataset(): tells the job to get the files from the DQ2 file catalogue
- j.inputdata.match_ce_all=True: If there is no location with a com- plete copy of the dataset, this attribute sends the job to a random location
- j.inputdata.min_num_files=100: instead of sending the job to a random location, this first checks that a given minimum of files is present at that location
- j.ouputdata=DQ2Outputdataset(): tells the job to store the output data on the grid, and register it to the DQ2 registry.
- j.outputdata.outputdata=[’ntuple.root’]: gives a list of file- names that must be stored in the output dataset. Wildcards are not supported. If the jobs is split, the outputfiles are numbered automati- cally.
- j.backend.CE: allows you to specify which Computing Element the job should be send to. The syntax is <server>:<port>/jobmanager -<service>-<queue>
- j.application.exclude_from_user_area = []: allows you to ex- clude packages that you have installed locally from inclusion in the input sandbox (the tar file containing all the files that are send with your job to the CE)