Difference between revisions of "Localgroupdisk nikhef"

From Atlas Wiki
Jump to navigation Jump to search
 
(15 intermediate revisions by the same user not shown)
Line 31: Line 31:
 
For the "Control Parameters" simply put in "data analysis" into the "justification".
 
For the "Control Parameters" simply put in "data analysis" into the "justification".
  
You will noticed that the data replication is not started immediately after you place the request.  The request needs an additional approval procedure.  If the dataset size of a request is smaller than 500 gigabytes, they are approved automatically if your grid certificate is already registered in the NL group in ATLAS (i.e. you are able to do voms-proxy-init -voms atlas:/atlas/nl).  For the requests with dataset size larger than 500 gigabytes, they have to be approved manually by the disk managers.  For the moment, the managers are Hurng-Chun.Lee@cern.ch and dgeerts@nikhef.nl.  Although the managers will be notified for every requests requiring additional manual approval, it would be appreciated if you could contact the managers in addition for quicker reactions.
+
'''NOTE''': All DaTRI requests need to be approved before data transfers take place.  If the dataset size of a request is smaller than 500 gigabytes, the request is approved automatically as long as your grid certificate is recognized as a member of the NL group in ATLAS (i.e. you are able to do voms-proxy-init -voms atlas:/atlas/nl).  For the requests with dataset size larger than 500 gigabytes, they have to be approved manually by the disk managers.  For the moment, the managers are [[mailto:Hurng-Chun.Lee@cern.ch Hurng-Chun Lee]] and [[mailto:dgeerts@nikhef.nl Daniel Geerts]].  Although the managers will be notified for every requests requiring additional manual approval, it would be appreciated if you could contact the managers in addition for quicker reactions.
  
 
=== creating your own datasets on it ===
 
=== creating your own datasets on it ===
  
 
=== accessing datasets/files on it ===
 
=== accessing datasets/files on it ===
 +
Assuming the existing dataset ''data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00'' on the NIKHEF-ELPROD_LOCALGROUPDISK and we want to access the files belong to this datasets from ROOT.  One do the following steps:
 +
 +
<ul>
 +
<li>'''setup environment using CVMFS'''
 +
<pre>
 +
% source /project/atlas/nikhef/cvmfs/setup.sh
 +
% setupATLAS
 +
% localSetupDQ2Client
 +
% localSetupROOT
 +
</pre>
 +
</li>
 +
 +
<li>'''resolve file paths given the dataset name'''
 +
<pre>
 +
% dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 | grep 'srm' | sed 's|srm://tbn18.nikhef.nl|rfio://|g'
 +
</pre>
 +
 +
What the command does is to list the files belong to the dataset replica at "NIKHEF-ELPROD_LOCALGROUPDISK" and substitute the path prefix to "rfio" protocol.  The output will be simply a list of file paths.  If you prefer to use the PoolFileCatalog.xml, one could use the following command:
 +
 +
<pre>
 +
% dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK -P -R "srm://tbn18.nikhef.nl^rfio://" data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00
 +
</pre>
 +
</li>
 +
 +
<li>'''open the file in ROOT'''
 +
 +
<br/><br/>One should use '''TFile::Open()''' instead of '''new TFile()''' to open files using rfio protocols.  The following example opens up a file at the NIKHEF-ELPROD_LOCALGROUPDISK directly over the network.
 +
 +
<pre>
 +
root [0] TFile *f = TFile::Open("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1")
 +
Warning in <TClass::TClass>: no dictionary for class AttributeListLayout is available
 +
Warning in <TClass::TClass>: no dictionary for class pair<string,string> is available
 +
 +
root [1] cout << physics->GetEntries() << endl;
 +
11677
 +
</pre>
 +
 +
Assuming an analysis running over multiple files, using '''TChain''' would be more convenient.  The example below shows how to read multiple files on NIKHEF-ELPROD_LOCALGROUPDISK via '''TChain''' object.
 +
 +
<pre>
 +
root [2] TChain *c = new TChain("physics");
 +
 +
root [3] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1")
 +
(Int_t)1
 +
 +
root [4] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000001.root.1")
 +
(Int_t)1
 +
 +
root [5] cout << c->GetEntries() << endl;
 +
39391
 +
</pre>
 +
</li>
 +
 +
'''Remark''': reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network.  It's important to enable the usage of the [[http://root.cern.ch/root/html/TTreeCache.html TTreeCache]] when you are using reading events from a TTree.  <font color=red>Previous studies have shown that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth</font>.  To enable the TTreeCache with 10 megabyte cache, one can do:
 +
 +
<pre>
 +
root[6] c->SetCacheSize(10000000);
 +
 +
root[7] c->AddBranchToCache("*",kTRUE);
 +
</pre>
 +
 +
</ul>

Latest revision as of 11:05, 28 September 2012

What is the local group grid storage at NIKHEF?

The grid storage for the local group has been setup at NIKHEF. In terms of the DQ2 terminology, the DQ2 site name is called "NIKHEF-ELPROD_LOCALGROUPDISK". It has 20 terabytes of disk space and used previously already by few groups to host D3PD Ntuples.

Unlike other grid storage spaces (that are centrally managed by the ATLAS central grid operation), we are our own manager of the usage of the local group disk. We decide ourselves how to share the space among sub-groups (TOP, SUSY, HIGGS, etc.) at NIKHEF.

How much space left and who is using it?

The following plot shows the disk usage status in past 30 days. The distance between the green and the red curves can be read as the free space remaining on this space.

http://bourricot.cern.ch/dq2/media/fig/NIKHEF-ELPROD_LOCALGROUPDISK_30.png

More accounting information can be found on this [page].

Lists of datasets on the local group disk can be found on this [page]. From the "User" column you know who replicated/created the datasets.

How to use it?

moving datasets to it

If the datasets already exist on the grid, you can simply request the replication of them to the NIKHEF-ELPROD_LOCALGROUPDISK using the [DaTRI interface].

Placing dataset replication requests in DaTRI has some requirements:

  • You have a valid grid certificate loaded in the browser. When you apply a Dutch grid certification issued by TERENA, you should have the certificate properly loaded into the browser. If not, following this instruction.
  • You have to register yourself in DaTRI service. You can do it from [here]. If you are not sure whether or not this is done before, check your registration status [here].

Once you have everything setup, you can request data replications the [DaTRI interface].

The "Request Parameters" are mandatory. The "Data Pattern" takes into account the wildcard symbol ("*") and the container symbol ("/" at the end of the name) . For the "Destination Sites", chose "NL" cloud and the "NIKHEF-ELPROD_LOCALGROUPDISK".

For the "Control Parameters" simply put in "data analysis" into the "justification".

NOTE: All DaTRI requests need to be approved before data transfers take place. If the dataset size of a request is smaller than 500 gigabytes, the request is approved automatically as long as your grid certificate is recognized as a member of the NL group in ATLAS (i.e. you are able to do voms-proxy-init -voms atlas:/atlas/nl). For the requests with dataset size larger than 500 gigabytes, they have to be approved manually by the disk managers. For the moment, the managers are [Hurng-Chun Lee] and [Daniel Geerts]. Although the managers will be notified for every requests requiring additional manual approval, it would be appreciated if you could contact the managers in addition for quicker reactions.

creating your own datasets on it

accessing datasets/files on it

Assuming the existing dataset data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 on the NIKHEF-ELPROD_LOCALGROUPDISK and we want to access the files belong to this datasets from ROOT. One do the following steps:

  • setup environment using CVMFS
    % source /project/atlas/nikhef/cvmfs/setup.sh
    % setupATLAS
    % localSetupDQ2Client
    % localSetupROOT
    
  • resolve file paths given the dataset name
    % dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 | grep 'srm' | sed 's|srm://tbn18.nikhef.nl|rfio://|g'
    

    What the command does is to list the files belong to the dataset replica at "NIKHEF-ELPROD_LOCALGROUPDISK" and substitute the path prefix to "rfio" protocol. The output will be simply a list of file paths. If you prefer to use the PoolFileCatalog.xml, one could use the following command:

    % dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK -P -R "srm://tbn18.nikhef.nl^rfio://" data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00
    
  • open the file in ROOT

    One should use TFile::Open() instead of new TFile() to open files using rfio protocols. The following example opens up a file at the NIKHEF-ELPROD_LOCALGROUPDISK directly over the network.
    root [0] TFile *f = TFile::Open("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1")
    Warning in <TClass::TClass>: no dictionary for class AttributeListLayout is available
    Warning in <TClass::TClass>: no dictionary for class pair<string,string> is available
    
    root [1] cout << physics->GetEntries() << endl;
    11677
    

    Assuming an analysis running over multiple files, using TChain would be more convenient. The example below shows how to read multiple files on NIKHEF-ELPROD_LOCALGROUPDISK via TChain object.

    root [2] TChain *c = new TChain("physics");
    
    root [3] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1")
    (Int_t)1
    
    root [4] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000001.root.1")
    (Int_t)1
    
    root [5] cout << c->GetEntries() << endl;
    39391
    
  • Remark: reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network. It's important to enable the usage of the [TTreeCache] when you are using reading events from a TTree. Previous studies have shown that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth. To enable the TTreeCache with 10 megabyte cache, one can do:
    root[6] c->SetCacheSize(10000000);
    
    root[7] c->AddBranchToCache("*",kTRUE);