Difference between revisions of "Using ganga at NIKHEF"

From Atlas Wiki
Jump to navigation Jump to search
Line 200: Line 200:
 
</li>
 
</li>
 
<li>
 
<li>
<tt>j.inputdata.match_ce_all=True</tt>: If there is no location with a com-
+
<tt>j.inputdata.match_ce_all=True</tt>: If there is no location with a complete copy of the dataset, this attribute sends the job to a random  
plete copy of the dataset, this attribute sends the job to a random  
 
 
location  
 
location  
 
</li>
 
</li>
Line 214: Line 213:
 
</li>
 
</li>
 
<li>
 
<li>
<tt>j.outputdata.outputdata=[’ntuple.root’]</tt>: gives a list of file-
+
<tt>j.outputdata.outputdata=[’ntuple.root’]</tt>: gives a list of filenames that must be stored in the output dataset. Wildcards are not  
names that must be stored in the output dataset. Wildcards are not  
+
supported. If the jobs is split, the outputfiles are numbered automatically.  
supported. If the jobs is split, the outputfiles are numbered automati-
 
cally.  
 
 
</li>
 
</li>
 
<li>
 
<li>
Line 225: Line 222:
 
</li>
 
</li>
 
<li>
 
<li>
<tt>j.application.exclude_from_user_area = []</tt>: allows you to ex-
+
<tt>j.application.exclude_from_user_area = []</tt>: allows you to exclude packages that you have installed locally from inclusion in the  
clude packages that you have installed locally from inclusion in the  
 
 
input sandbox (the tar file containing all the files that are send with  
 
input sandbox (the tar file containing all the files that are send with  
 
your job to the CE)
 
your job to the CE)

Revision as of 06:58, 4 June 2007

This page is under construction. The full text of this can be found here

Setting up ganga

You need an afs ticket to run ganga. Also, you need a grid certificate, and you need to setup the grid, as described in the DQ2 at Nikhef wiki. At the same time, assuming you set up the GRID tools according to Martijn’s Wiki, COMMENT OUT THE LINE: source /project/atlas/nikhef/dq2/dq2_setup.csh.NIKHEF If you setup the GRID tools in some other way, make sure the grid tools environment is not loaded. GANGA AND GRID TOOLS ENVIRONMENT CLASH! Apparently, it is a mismatch between the grod tools environment and the Athena environment. You can add the line to an alias or whatever, if you wish. Then setup ATHENA at NIKHEF as described in athena 12.0.6 Wiki.

To setup ganga, add the two following lines to the .cshrc:

setenv GANGA_CONFIG_PATH GangaAtlas/Atlas.ini
#for the local installation
#set path = (/public/public_linux/Ganga/install/4.3.0/bin/ $path)
#for the newest version installed on afs
set path = (/afs/cern.ch/sw/ganga/install/4.3.1/bin/ $path)
setenv LFC_HOST ’lfc03.nikhef.nl’
setenv LCG_CATALOG_TYPE lfc

or if you are working in a sh based shell (such as bash):

export GANGA\_CONFIG\_PATH=GangaAtlas/Atlas.ini #for the local installation #PATH=/public/public_linux/Ganga/install/4.3.0/bin/:${PATH} #for the newest version installed on afs PATH=/afs/cern.ch/sw/ganga/install/4.3.1/bin/:${PATH} source LFC_HOST=’lfc03.nikhef.nl’ source LCG_CATALOG_TYPE=lfc
The first time ganga runs, it will ask to create a configuration file $HOME/.gangarc. Answer yes, and edit the config file as follows:

  1. In the section labelled [LCG] uncomment the line:

    VirtualOrganisation = atlas

    and add the line

    DefaultSE = tbn18\.nikhef\.nl

  2. In the section labeled [Athena] uncomment the line:

    # local path to base paths of dist-kits (lxplus example)
    ATLAS_SOFTWARE = /data/atlas/offline/

  3. In the section labeld [ROOT] uncomment and edit the lines:

    location = /data/atlas/offline/12.0.6/sw/lcg/external/root/
    version = 5.10.00e
    arch = slc3_ia3_gcc323


Running ganga

You can start the ganga CLI by typing ganga on the commandline. This starts a python interface, where you can start defining your jobs. There are a few commands you can use to get around in ganga:

  • jobs: Lists all the jobs that are defined in ganga. You can get to an indivudual job by typing:
  • jobs[id]: where the id is listed in the second column of the jobs output.

One thing you can do with a job is view it’s status:

jobs[1].status()

This can be ’new’, ’submitted’, ’running’ or ’completed’. Once the job is completed, you can view it’s output (which is stored by default in $HOME/gangadir/workspace/Local/ jobid /output) by typing:

In [25]: jobs[0].peek()

Or look at a specific output file by typing:

In [25]: jobs[0].peek(’stderr’,’less’)

where stderr is the name of the file you want to view, and less the program to view it with. You can kill a job using the kill() method, and remove it from the jobs list with the remove() method. The most important command by far is help(). This starts the interactive help program of ganga. After typing it, you get a help> prompt. Typing index gives you a list of all possible help subjects. The explanations are rather brief, but it does help you to find methods of build-in classes of Ganga and it’s plugin. For instance, the atlas plugin defines classes like DQ2Dataset. For more info on DQ2Dataset you type DQ2Dataset at the help> prompt.


Running a simple Job

This little piece of code runs a Hello World Job on the LCG grid:

In [0] : j=Job()
In [1] : j.application=Executable(exe=’/bin/echo’,args=[’Hello World’])
In [2] : j.backend=LCG()
In [3] : j.submit()

The application that is run here is a UNIX executable. LCG() is another predefined class that takes care of a lot of details of submitting to the grid. After it is finished, you can type:

In[4] : j.peek(’stdout’,’cat’))

Which will output the expected ”Hello World”. You can also put these lines in a script my script.py, and at the ganga prompt type:

In [4]: execfile(’my_script.py’)

Running an ATHENA job

Running an athena job, storing the output files into a dq2 dataset, requires a bit more work, but still it is not hard. The following script defines a Athena job, splits the job so that there is one job (and hence one outputfile) per inputfile, runs athena with the TopView localOverride.py jobOptions, and stores the output on the grid in a DQ2 dataset called testing Ganga V9.

#Define the ATHENA job j = Job()
j.name=’TopView Standard Job, Ganga 4.3.1’
j.application=Athena()
j.application.prepare(athena_compile=True)
j.application.option_file=’/project/atlas/users/fkoetsve/TestArea1206/PhysicsAnalysis/TopPhys/TopView/TopView-00-12-12-02/run/LocalOverride_Nikhef_BASIC.py’ #j.application.max_events=’20’
j.splitter=AthenaSplitterJob()
j.splitter.numsubjobs=396
#The merger can be used to merge al the output files into one. See the ganga ATLAS Twiki for details #j.merger=AthenaOutputMerger()
#Define the inputdata j.inputdata=DQ2Dataset()
j.inputdata.dataset="trig1_misal1_mc12.005201.Mcatnlo_jim_top_pt200.recon.AOD.v12000601"
#To send job to complete and incomplete dataset location sources, uncomment either the next line, or the line after that
#j.inputdata.min_num_files=100
#j.inputdata.match_ce_all=True
j.inputdata.type=’DQ2_LOCAL’
#Define outputdata #j.outputdata=ATLASOutputDataset()
j.outputdata=DQ2OutputDataset()
j.outputdata.datasetname=’testing_Ganga_V9’
j.outputdata.outputdata=[’TopViewAANtuple.root’]
#j.outputdata.location=’NIKHEF’
#j.outputsandbox=[’TopViewAANtuple.root’]
#Submit j.backend=LCG()
j.backend.CE=’ce-fzk.gridka.de:2119/jobmanager-pbspro-atlas’
#j.inputsandbox=[’my_extra_file’ ]
j.application.exclude_from_user_area = []
j.submit()

Explanation of the terms:

  • j.Application=Athena(): Defines the job to be an Athena job. Packs the local installation of athena packages, and sends them with the job. The groupArea tag of the athena setup, used e.g. for TopView, does not work (yet). Instead, all the packages defined in the groupArea tag must be installed locally and packed with the job
  • j.splitter=AthenaSplitterJob(): To get one outputfile per input- file, as must be done to keep naming of files consistent when going from AOD to NTuple, you need the job to be split in as many sub jobs as there are inputfiles. You need this splitter plugin to do that. and set j.splitter.numsub jobs to the number of inputfiles. You can get this number by typing: d=DQ2Dataset(dataset=’datasetname’)
    d.list_locations_num_files() which gives the number of files of a given dataset at each location
  • j.merger: can be used to merge all the outputfiles into one
  • j.inputdata=DQDataset(): tells the job to get the files from the DQ2 file catalogue
  • j.inputdata.match_ce_all=True: If there is no location with a complete copy of the dataset, this attribute sends the job to a random location
  • j.inputdata.min_num_files=100: instead of sending the job to a random location, this first checks that a given minimum of files is present at that location
  • j.ouputdata=DQ2Outputdataset(): tells the job to store the output data on the grid, and register it to the DQ2 registry.
  • j.outputdata.outputdata=[’ntuple.root’]: gives a list of filenames that must be stored in the output dataset. Wildcards are not supported. If the jobs is split, the outputfiles are numbered automatically.
  • j.backend.CE: allows you to specify which Computing Element the job should be send to. The syntax is <server>:<port>/jobmanager -<service>-<queue>
  • j.application.exclude_from_user_area = []: allows you to exclude packages that you have installed locally from inclusion in the input sandbox (the tar file containing all the files that are send with your job to the CE)