Difference between revisions of "NIKHEF-ELPROD LOCALGROUPDISK"

From Atlas Wiki
Jump to navigation Jump to search
Line 95: Line 95:
 
</li>
 
</li>
  
'''Important remark''': reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network.  It's important to enable the usage of the [[http://root.cern.ch/root/html/TTreeCache.html TTreeCache]] when you are using reading events from a TTree.  <font color=red>Previous studies have shown that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth</font>.  To enable the TTreeCache with 10 megabyte cache, one can do:
+
'''Important remark''': reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network.  It's important to enable the usage of the [[http://root.cern.ch/root/html/TTreeCache.html TTreeCache]] when you are reading events from the TTree object in the file.  <font color=red>Studies have demonstrated that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth</font>.  To enable the TTreeCache with 10 megabyte cache, one can do:
  
 
<pre>
 
<pre>

Revision as of 14:52, 28 September 2012

What is the local group grid storage at NIKHEF?

The grid storage for the local group has been setup at NIKHEF. In terms of the DQ2 terminology, the DQ2 site name is called "NIKHEF-ELPROD_LOCALGROUPDISK". It has 20 terabytes of disk space and used previously already by few groups to host D3PD Ntuples.

Unlike other grid storage spaces (that are centrally managed by the ATLAS central grid operation), we are our own manager of the usage of the local group disk. We decide ourselves how to share the space among sub-groups (TOP, SUSY, HIGGS, etc.) at NIKHEF.

How much space left and who is using it?

The following plot shows the disk usage status in past 30 days. The distance between the green and the red curves can be read as the free space remaining on this space.

http://bourricot.cern.ch/dq2/media/fig/NIKHEF-ELPROD_LOCALGROUPDISK_30.png

More accounting information can be found on this [page].

Lists of datasets on the local group disk can be found on this [page]. From the "User" column you know who replicated/created the datasets.

How to use it?

Hereafter are instructions for

  • replicating grid datasets to NIKHEF-ELPROD_LOCALGROUPDISK
  • accessing files on NIKHEF-ELPROD_LOCALGROUPDISK with ROOT

The instructions are based on CVMFS so they are general for both stoomboot and NIKHEF desktop machines.

moving datasets to it

If the datasets already exist on the grid, you can simply request the replication of them to the NIKHEF-ELPROD_LOCALGROUPDISK using the [DaTRI interface].

Placing dataset replication requests in DaTRI has some requirements:

  • You have a valid grid certificate loaded in the browser.
  • You have to register yourself in DaTRI service. You can do it from [here]. If you are not sure whether or not this is done before, check your registration status [here].

Once you have everything setup, you can request data replications the [DaTRI interface].

The "Request Parameters" are mandatory. The "Data Pattern" takes into account the wildcard symbol ("*") and the container symbol ("/" at the end of the name) . For the "Destination Sites", chose "NL" cloud and the "NIKHEF-ELPROD_LOCALGROUPDISK".

For the "Control Parameters" simply put in "data analysis" into the "justification".

NOTE: All DaTRI requests need to be approved before data transfers take place. If the dataset size of a request is smaller than 500 gigabytes, the request is approved automatically as long as your grid certificate is recognized as a member of the NL group in ATLAS (i.e. you are able to do voms-proxy-init -voms atlas:/atlas/nl). For the requests with dataset size larger than 500 gigabytes, they have to be approved manually by the disk managers. For the moment, the managers are [Hurng-Chun Lee] and [Daniel Geerts]. Although the managers will be notified for every requests requiring additional manual approval, it would be appreciated if you could contact the managers in addition for quicker reactions.

accessing datasets/files on it

Assuming an existing dataset data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 on NIKHEF-ELPROD_LOCALGROUPDISK and we want to access the files belong to this datasets from ROOT. One does the following steps:

  • setup environment using CVMFS
    % source /project/atlas/nikhef/cvmfs/setup.sh
    % setupATLAS
    % localSetupDQ2Client
    % localSetupROOT
    
  • resolve file paths given the dataset name
    % dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 | grep 'srm' | sed 's|srm://tbn18.nikhef.nl|rfio://|g'
    

    What the command does is to list the files belong to the dataset replica at "NIKHEF-ELPROD_LOCALGROUPDISK" and substitute the prefix of the file paths with the "rfio" protocol. The output is simply a list of files. If you prefer to use the PoolFileCatalog.xml, you can use the following command to generate the XML file with proper file paths:

    % dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK -P -R "srm://tbn18.nikhef.nl^rfio://" data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00
    
  • open files in ROOT

    You should use TFile::Open() instead of new TFile() to open files using the rfio protocol. The following example opens up a file on the NIKHEF-ELPROD_LOCALGROUPDISK directly over the network.
    root [0] TFile *f = TFile::Open("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1")
    Warning in <TClass::TClass>: no dictionary for class AttributeListLayout is available
    Warning in <TClass::TClass>: no dictionary for class pair<string,string> is available
    
    root [1] cout << physics->GetEntries() << endl;
    11677
    

    Assuming an analysis running over multiple files, using TChain would be more convenient. The example below shows how to read multiple files on NIKHEF-ELPROD_LOCALGROUPDISK via TChain object.

    root [2] TChain *c = new TChain("physics");
    
    root [3] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1")
    (Int_t)1
    
    root [4] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000001.root.1")
    (Int_t)1
    
    root [5] cout << c->GetEntries() << endl;
    39391
    
  • Important remark: reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network. It's important to enable the usage of the [TTreeCache] when you are reading events from the TTree object in the file. Studies have demonstrated that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth. To enable the TTreeCache with 10 megabyte cache, one can do:
    root[6] c->SetCacheSize(10000000);
    
    root[7] c->AddBranchToCache("*",kTRUE);