Using ganga at NIKHEF

From Atlas Wiki
Revision as of 10:32, 15 June 2007 by Fkoetsve (talk | contribs)
Jump to navigation Jump to search

Setting up ganga

You need an afs ticket to run ganga. Also, you need a grid certificate, and you need to setup the grid, as described in the DQ2 at Nikhef wiki. At the same time, assuming you set up the GRID tools according to Martijn’s Wiki, COMMENT OUT THE LINE: source /project/atlas/nikhef/dq2/dq2_setup.csh.NIKHEF If you setup the GRID tools in some other way, make sure the grid tools environment is not loaded. GANGA AND GRID TOOLS ENVIRONMENT CLASH! Apparently, it is a mismatch between the grod tools environment and the Athena environment. You can add the line to an alias or whatever, if you wish. Then setup ATHENA at NIKHEF as described in athena 12.0.6 Wiki.



To setup ganga, add the two following lines to the .cshrc:

setenv GANGA_CONFIG_PATH GangaAtlas/Atlas.ini
#for the local installation
#set path = (/public/public_linux/Ganga/install/4.3.0/bin/ $path)
#for the newest version installed on afs
set path = (/afs/cern.ch/sw/ganga/install/4.3.2/bin/ $path)
setenv LFC_HOST ’lfc03.nikhef.nl’
setenv LCG_CATALOG_TYPE lfc

or if you are working in a sh based shell (such as bash):

export GANGA\_CONFIG\_PATH=GangaAtlas/Atlas.ini
#for the local installation
#PATH=/public/public_linux/Ganga/install/4.3.0/bin/:${PATH}
#for the newest version installed on afs
PATH=/afs/cern.ch/sw/ganga/install/4.3.1/bin/:${PATH}
source LFC_HOST=’lfc03.nikhef.nl’
source LCG_CATALOG_TYPE=lfc

The first time ganga runs, it will ask to create a configuration file $HOME/.gangarc. Answer yes, and edit the config file as follows:

  1. In the section labelled [LCG] uncomment the line:

    VirtualOrganisation = atlas

    and add the line DefaultSE = tbn18\.nikhef\.nl

  2. In the section labeled [Athena] uncomment the line:

    # local path to base paths of dist-kits (lxplus example)
    ATLAS_SOFTWARE = /data/atlas/offline/

  3. In the section labeld [ROOT] uncomment and edit the lines:

    location = /data/atlas/offline/12.0.6/sw/lcg/external/root/
    version = 5.10.00e
    arch = slc3_ia3_gcc323
  4. Until ganga 4.3.2 is released, there is a workaround to get ganga working with large input sandboxes. In the section [LCG], add the lines:


Running ganga

You can start the ganga CLI by typing ganga on the commandline. This starts a python interface, where you can start defining your jobs. There are a few commands you can use to get around in ganga:

  • jobs: Lists all the jobs that are defined in ganga. You can get to an indivudual job by typing:
  • jobs[id]: where the id is listed in the second column of the jobs output.

One thing you can do with a job is view it’s status:

jobs[1].status()

This can be ’new’, ’submitted’, ’running’ or ’completed’. Once the job is completed, you can view it’s output (which is stored by default in $HOME/gangadir/workspace/Local/ jobid /output) by typing:

In [25]: jobs[0].peek()

Or look at a specific output file by typing:

In [25]: jobs[0].peek(’stderr’,’less’)

where stderr is the name of the file you want to view, and less the program to view it with. You can kill a job using the kill() method, and remove it from the jobs list with the remove() method. The most important command by far is help(). This starts the interactive help program of ganga. After typing it, you get a help> prompt. Typing index gives you a list of all possible help subjects. The explanations are rather brief, but it does help you to find methods of build-in classes of Ganga and it’s plugin. For instance, the atlas plugin defines classes like DQ2Dataset. For more info on DQ2Dataset you type DQ2Dataset at the help> prompt.


Running a simple Job

This little piece of code runs a Hello World Job on the LCG grid:

In [0] : j=Job()
In [1] : j.application=Executable(exe=’/bin/echo’,args=[’Hello World’])
In [2] : j.backend=LCG()
In [3] : j.submit()

The application that is run here is a UNIX executable. LCG() is another predefined class that takes care of a lot of details of submitting to the grid. After it is finished, you can type:

In[4] : j.peek(’stdout’,’cat’))

Which will output the expected ”Hello World”. You can also put these lines in a script my script.py, and at the ganga prompt type:

In [4]: execfile(’my_script.py’)

Running an ATHENA job

Running an athena job, storing the output files into a dq2 dataset, requires a bit more work, but still it is not hard. The following script defines a Athena job, splits the job so that there is one job (and hence one outputfile) per inputfile, runs athena with the TopView localOverride.py jobOptions, and stores the output on the grid in a DQ2 dataset called testing Ganga V9.

#Define the ATHENA job j = Job()
j.name=’TopView Standard Job, Ganga 4.3.1’
j.application=Athena()
j.application.prepare(athena_compile=True)
j.application.option_file=’/project/atlas/users/fkoetsve/TestArea1206/PhysicsAnalysis/TopPhys/TopView/TopView-00-12-12-02/run/LocalOverride_Nikhef_BASIC.py’ #j.application.max_events=’20’
j.splitter=AthenaSplitterJob()
j.splitter.numsubjobs=396
#The merger can be used to merge al the output files into one. See the ganga ATLAS Twiki for details #j.merger=AthenaOutputMerger()
#Define the inputdata j.inputdata=DQ2Dataset()
j.inputdata.dataset="trig1_misal1_mc12.005201.Mcatnlo_jim_top_pt200.recon.AOD.v12000601"
#To send job to complete and incomplete dataset location sources, uncomment either the next line, or the line after that
#j.inputdata.min_num_files=100
#j.inputdata.match_ce_all=True
j.inputdata.type=’DQ2_LOCAL’
#Define outputdata #j.outputdata=ATLASOutputDataset()
j.outputdata=DQ2OutputDataset()
j.outputdata.datasetname=’testing_Ganga_V9’
j.outputdata.outputdata=[’TopViewAANtuple.root’]
#j.outputdata.location=’NIKHEF’
#j.outputsandbox=[’TopViewAANtuple.root’]
#Submit j.backend=LCG()
j.backend.CE=’ce-fzk.gridka.de:2119/jobmanager-pbspro-atlas’
#j.inputsandbox=[’my_extra_file’ ]
j.application.exclude_from_user_area = []
j.submit()

Explanation of the terms:

  • j.Application=Athena(): Defines the job to be an Athena job. Packs the local installation of athena packages, and sends them with the job. The groupArea tag of the athena setup, used e.g. for TopView, does not work (yet). Instead, all the packages defined in the groupArea tag must be installed locally and packed with the job
  • j.splitter=AthenaSplitterJob(): To get one outputfile per inputfile, as must be done to keep naming of files consistent when going from AOD to NTuple, you need the job to be split in as many sub jobs as there are inputfiles. You need this splitter plugin to do that. and set j.splitter.numsub jobs to the number of inputfiles. You can get this number by typing: d=DQ2Dataset(dataset=’datasetname’)
    d.list_locations_num_files() which gives the number of files of a given dataset at each location
  • j.merger: can be used to merge all the outputfiles into one
  • j.inputdata=DQDataset(): tells the job to get the files from the DQ2 file catalogue
  • j.inputdata.match_ce_all=True: If there is no location with a complete copy of the dataset, this attribute sends the job to a random location
  • j.inputdata.min_num_files=100: instead of sending the job to a random location, this first checks that a given minimum of files is present at that location
  • j.ouputdata=DQ2Outputdataset(): tells the job to store the output data on the grid, and register it to the DQ2 registry.
  • j.outputdata.outputdata=[’ntuple.root’]: gives a list of filenames that must be stored in the output dataset. Wildcards are not supported. If the jobs is split, the outputfiles are numbered automatically.
  • j.backend.CE: allows you to specify which Computing Element the job should be send to. The syntax is <server>:<port>/jobmanager -<service>-<queue>
  • j.application.exclude_from_user_area = []: allows you to exclude packages that you have installed locally from inclusion in the input sandbox (the tar file containing all the files that are send with your job to the CE)

After submitting your job you can type jobs in the ganga commandline, which will show something like:

# id status name subjobs application backend CE
# 41 completed TopView Standard Job 3 Athena LCG ce106.cern.ch:2119/jobmanager-lcglsf-grid_2nh
# 42 completed Athena LCG ce106.cern.ch:2119/jobmanager-lcglsf-grid_2nh

Here you can see all the jobs, their status, the type of job, it’s name, and at which CE it is running. If you want more info, you can type jobs[41] at the commandline, and you will get the complete configuration of the job, even those parameters that were set from default, that you know nothing about. This is very helpfull when debugging ganga. When the status changes to completed (ganga tells you of the change of status of any job sa soon as you issue a new command), you can see any verbal output by typing, just like before:

jobs[41].peek(’stdout’,’cat’)

If the job completed succesfully, you can retrieve the outputdata by typing:

jobs.outputdata.retrieve()

The outputdata is then stored in the directory ${HOME}/gangadir/workspace/Local/<job id>/out As the outputfiles can be large, it is whise to change the location of this directory, by creating a symbolic link called gangadir in your home dir, pointing to somehwhere where large amounts of data can be stored (temporarily).


Using ganga for running TopView

The current version of TopView that is used by the Top group and by us is TopView-00-12-13. As the groupArea tag does not work on the Grid, you need to checkout all the appropriate tags. A tar file with all the correct packages can be found here. N.B. This will cause the lcg_cp problem listed below, use the workaround. This version has another small problem, some parameter file cannot be found when using ganga to submit a job. This has been fixed in later versions of TopView, but no collection of tags is posted for that version yet. Hence a fix is made in to files. From the TopView directory type:

cp /user/fkoetsve/GangaScripts/Analysis_topOptions.py ./share/
cp /user/fkoetsve/GangaScripts/ParamBTagger_module.py python/

In the GangaScript area there is also a SubmitScript.py, which you run by typing fro the run directory:

ganga /user/fkoetsve/GangaScripts/SubmitScript.py --dataset=<datasetname> --number_of_files=<nfiles> --simstyle=<simstyle>.

The number of files you find by typing dq2_ls -f <datasetname> , and it's requirted. The simstyle can be fullsim, fastsim or streamingtest, and is also required. This script has not been tested yet. For testing ganga, I now use a script GangaScripts/TopViewGangaSubmission_Override.py, which runs over the dataset trig1_misal1_mc12.005201.Mcatnlo_jim_top_pt200.recon.AOD.v12000601.

Possible problems (and possible sollutions)

These are some problems that I encoutered, plus there sollution.

(60, ’SSL certificate problem, verify that the CA cert is OK’) This means that the certificate that is used by ganga is wrong. The directory where your certificates are located is stored in the variale X509_CERT_DIR. Send a request to grid.support@nikhef.nl to update the certificates, or download them yourself and change the value of X509_CERT_DIR

[Errno 28] No space left on device Ganga writes to different places: /tmp, but also ${HOME}/gangadir/workspace/Local Cleanup, especially after jobs failed, is not always very tidy. You might need to clean up some files manually at regular intervals. If you want to be able to store bigger files, the easiest way to change the gangadir location is to make a symbolic link in your home directory called gangadir, to whatever location you want.

File ”/global/ices/lcg/glite3.0.12/edg/bin/UIutils.py”, line 377, in errMsg print message( info.logFile , message ) Repeated many lines. The same sollution as the previous section, No space left on device.

<bound method Job.peek of <Ganga.GPIDev.Lib.Job.Job.Job object at 0xb7015f6c>> You forgot the ’()’ after the command, in this case peek. It also happens e.g. with remove().

LCMAPS credential mapping NOT successful This means the input sandox (meaning all the files you are sending with your job to the remote machine) is too large (> 3M B). It then tries to store that sandbox on a Storage Element, which is default some machine at CERN, and you don’t have permission to use that. At the following line to the [LCG] section of $HOME/.gangarc:

DefaultSE = tbn18\.nikhef\.nl

The . needs to be escaped using the \, because the file is read into Python.

lcg cp: Transport endpoint is not connected I think this has to do with an overly large sandbox as well, caused by too many packages checked out in the TestArea. Exclude packages from the input sandbox using: j.application.exclude_from_user_area=["package1","package2"] This turns out to be a bug in ganga, which should be solved in version 4.3.2. There is a workaround now using a temporarily enlarged buffer size at one of the resource brokers. Add these lines to the [LCG] part of .gangarc:

ConfigVO = /user/fkoetsve/rb106_edg_wl_ui.conf
BoundSandboxLimit = 52428800

More Info

The ganga project homepage
Ganga FAQ
Ganga 4.3.0 tutorial (no tutorial for higher versions available yet; check the hypernews forum for extra features)

If you find any problems with this document, please contact me by clicking here