Difference between revisions of "NIKHEF-ELPROD LOCALGROUPDISK"
|Line 101:||Line 101:|
'''Important remark''': reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network. It's important to enable the usage of the
'''Important remark''': reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network. It's important to enable the usage of the [http://root.cern.ch/root/html/TTreeCache.html TTreeCache] when you are reading events from the TTree object in the file. <font color=red>Studies have demonstrated that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth</font>. To enable the TTreeCache with 10 megabyte cache, one can do:
Revision as of 08:34, 29 September 2012
What is the local group grid storage at NIKHEF?
The grid storage for the local group has been setup at NIKHEF. In terms of the DQ2 terminology, the DQ2 site name is called "NIKHEF-ELPROD_LOCALGROUPDISK". It has 20 terabytes of disk space and used previously already by few groups to host D3PD Ntuples.
Unlike other grid storage spaces (that are centrally managed by the ATLAS central grid operation), we are our own manager of the usage of the local group disk. We decide ourselves how to share the space among sub-groups (TOP, SUSY, HIGGS, etc.) at NIKHEF.
How much space left and who is using it?
The following plot shows the disk usage status in past 30 days. The distance between the green and the red curves can be read as the free space remaining on this space.
More accounting information can be found on this page.
Lists of datasets on the local group disk can be found on this page. From the "User" column you know who replicated/created the datasets.
How to use it?
Hereafter are instructions for
replicating grid datasets to NIKHEF-ELPROD_LOCALGROUPDISK
accessing files on NIKHEF-ELPROD_LOCALGROUPDISK with ROOT
The instructions are based on CVMFS so they are general for both stoomboot and NIKHEF desktop machines.
moving datasets to it
If the datasets already exist on the grid, you can simply request the replication of them to the NIKHEF-ELPROD_LOCALGROUPDISK using the DaTRI interface.
Placing dataset replication requests in DaTRI has some requirements:
- You have a valid grid certificate loaded in the browser. If you obtained a grid certificate via Terena, the certificate should have been loaded in your browser when you downloaded it from the Tenera website. If not, there are instructions to convert your certificate from the PEM format certificate (the one you used to do voms-proxy-init) to PKCS#12 format and load it into your browser.
- You have to register yourself in DaTRI service. You can do it from here. If you are not sure whether or not this is done before, check your registration status here.
Once you have everything setup, you can request data replications the DaTRI interface.
The "Request Parameters" are mandatory. The "Data Pattern" takes into account the wildcard symbol ("*") and the container symbol ("/" at the end of the name) . For the "Destination Sites", chose "NL" cloud and the "NIKHEF-ELPROD_LOCALGROUPDISK".
For the "Control Parameters" simply put in "data analysis" into the "justification".
NOTE: All DaTRI requests need to be approved before data transfers take place. If the dataset size of a request is smaller than 500 gigabytes, the request is approved automatically as long as your grid certificate is recognized as a member of the NL group in ATLAS (i.e. you are able to do voms-proxy-init -voms atlas:/atlas/nl). For the requests with dataset size larger than 500 gigabytes, they have to be approved manually by the disk managers. For the moment, the managers are Hurng-Chun Lee and Daniel Geerts. Although the managers will be notified for every requests requiring additional manual approval, it would be appreciated if you could contact the managers in addition for quicker reactions.
accessing datasets/files on it
Assuming an existing dataset data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 on NIKHEF-ELPROD_LOCALGROUPDISK and we want to access the files belong to this datasets from ROOT. One does the following steps:
- setup environment using CVMFS
% source /project/atlas/nikhef/cvmfs/setup.sh % setupATLAS % localSetupDQ2Client % localSetupROOT
- start grid proxy
% voms-proxy-init -voms atlas
- resolve file paths given the dataset name
% dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 | grep 'srm' | sed 's|srm://tbn18.nikhef.nl|rfio://|g'
What the command does is to list the files belong to the dataset replica at "NIKHEF-ELPROD_LOCALGROUPDISK" and substitute the prefix of the file paths with the "rfio" protocol. The output is simply a list of files. If you prefer to use the PoolFileCatalog.xml, you can use the following command to generate the XML file with proper file paths:
% dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK -P -R "srm://tbn18.nikhef.nl^rfio://" data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00
- open files in ROOT
You should use TFile::Open() instead of new TFile() to open files using the rfio protocol. The following example opens up a file on the NIKHEF-ELPROD_LOCALGROUPDISK directly over the network.
root  TFile *f = TFile::Open("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1") Warning in <TClass::TClass>: no dictionary for class AttributeListLayout is available Warning in <TClass::TClass>: no dictionary for class pair<string,string> is available root  cout << physics->GetEntries() << endl; 11677
Assuming an analysis running over multiple files, using TChain would be more convenient. The example below shows how to read multiple files on NIKHEF-ELPROD_LOCALGROUPDISK via TChain object.
root  TChain *c = new TChain("physics"); root  c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1") (Int_t)1 root  c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000001.root.1") (Int_t)1 root  cout << c->GetEntries() << endl; 39391
Important remark: reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network. It's important to enable the usage of the TTreeCache when you are reading events from the TTree object in the file. Studies have demonstrated that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth. To enable the TTreeCache with 10 megabyte cache, one can do:
root c->SetCacheSize(10000000); root c->AddBranchToCache("*",kTRUE);