Difference between revisions of "Localgroupdisk nikhef"
(21 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== What is the local group grid storage at NIKHEF? == | == What is the local group grid storage at NIKHEF? == | ||
− | The grid storage for the local group has been setup at NIKHEF. In terms of the DQ2 terminology, the DQ2 site name is called "NIKHEF-ELPROD_LOCALGROUPDISK". It | + | The grid storage for the local group has been setup at NIKHEF. In terms of the DQ2 terminology, the DQ2 site name is called "NIKHEF-ELPROD_LOCALGROUPDISK". It has 20 terabytes of disk space and used previously already by few groups to host D3PD Ntuples. |
Unlike other grid storage spaces (that are centrally managed by the ATLAS central grid operation), we are our own manager of the usage of the local group disk. We decide ourselves how to share the space among sub-groups (TOP, SUSY, HIGGS, etc.) at NIKHEF. | Unlike other grid storage spaces (that are centrally managed by the ATLAS central grid operation), we are our own manager of the usage of the local group disk. We decide ourselves how to share the space among sub-groups (TOP, SUSY, HIGGS, etc.) at NIKHEF. | ||
== How much space left and who is using it? == | == How much space left and who is using it? == | ||
− | The following plot shows the disk usage status in past 30 days. | + | The following plot shows the disk usage status in past 30 days. The distance between the green and the red curves can be read as the free space remaining on this space. |
http://bourricot.cern.ch/dq2/media/fig/NIKHEF-ELPROD_LOCALGROUPDISK_30.png | http://bourricot.cern.ch/dq2/media/fig/NIKHEF-ELPROD_LOCALGROUPDISK_30.png | ||
Line 12: | Line 12: | ||
More accounting information can be found on this [[http://bourricot.cern.ch/dq2/accounting/site_view/NIKHEF-ELPROD_LOCALGROUPDISK/30/ page]]. | More accounting information can be found on this [[http://bourricot.cern.ch/dq2/accounting/site_view/NIKHEF-ELPROD_LOCALGROUPDISK/30/ page]]. | ||
− | Lists of datasets on the local group disk can be found on this [[http://www.nikhef.nl/~dgeerts/OldDatasets.html page]]. | + | Lists of datasets on the local group disk can be found on this [[http://www.nikhef.nl/~dgeerts/OldDatasets.html page]]. From the "User" column you know who replicated/created the datasets. |
== How to use it? == | == How to use it? == | ||
=== moving datasets to it === | === moving datasets to it === | ||
+ | If the datasets already exist on the grid, you can simply request the replication of them to the NIKHEF-ELPROD_LOCALGROUPDISK using the [[http://panda.cern.ch/server/pandamon/query?mode=ddm_req DaTRI interface]]. | ||
+ | |||
+ | Placing dataset replication requests in DaTRI has some requirements: | ||
+ | <ul> | ||
+ | <li>You have a valid grid certificate loaded in the browser. When you apply a Dutch grid certification issued by TERENA, you should have the certificate properly loaded into the browser. If not, following [[ this instruction.]] | ||
+ | <li>You have to register yourself in DaTRI service. You can do it from [[http://panda.cern.ch/server/pandamon/query?mode=ddm_user here]]. If you are not sure whether or not this is done before, check your registration status [[http://panda.cern.ch/server/pandamon/query?mode=ddm_user&action=info here]]. | ||
+ | </ul> | ||
+ | |||
+ | Once you have everything setup, you can request data replications the [[http://panda.cern.ch/server/pandamon/query?mode=ddm_req DaTRI interface]]. | ||
+ | |||
+ | The "Request Parameters" are mandatory. The "Data Pattern" takes into account the wildcard symbol ("*") and the container symbol ("/" at the end of the name) . For the "Destination Sites", chose "NL" cloud and the "NIKHEF-ELPROD_LOCALGROUPDISK". | ||
+ | |||
+ | For the "Control Parameters" simply put in "data analysis" into the "justification". | ||
+ | |||
+ | '''NOTE''': All DaTRI requests need to be approved before data transfers take place. If the dataset size of a request is smaller than 500 gigabytes, the request is approved automatically as long as your grid certificate is recognized as a member of the NL group in ATLAS (i.e. you are able to do voms-proxy-init -voms atlas:/atlas/nl). For the requests with dataset size larger than 500 gigabytes, they have to be approved manually by the disk managers. For the moment, the managers are [[mailto:Hurng-Chun.Lee@cern.ch Hurng-Chun Lee]] and [[mailto:dgeerts@nikhef.nl Daniel Geerts]]. Although the managers will be notified for every requests requiring additional manual approval, it would be appreciated if you could contact the managers in addition for quicker reactions. | ||
=== creating your own datasets on it === | === creating your own datasets on it === | ||
=== accessing datasets/files on it === | === accessing datasets/files on it === | ||
+ | Assuming the existing dataset ''data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00'' on the NIKHEF-ELPROD_LOCALGROUPDISK and we want to access the files belong to this datasets from ROOT. One do the following steps: | ||
+ | |||
+ | <ul> | ||
+ | <li>'''setup environment using CVMFS''' | ||
+ | <pre> | ||
+ | % source /project/atlas/nikhef/cvmfs/setup.sh | ||
+ | % setupATLAS | ||
+ | % localSetupDQ2Client | ||
+ | % localSetupROOT | ||
+ | </pre> | ||
+ | </li> | ||
+ | |||
+ | <li>'''resolve file paths given the dataset name''' | ||
+ | <pre> | ||
+ | % dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 | grep 'srm' | sed 's|srm://tbn18.nikhef.nl|rfio://|g' | ||
+ | </pre> | ||
+ | |||
+ | What the command does is to list the files belong to the dataset replica at "NIKHEF-ELPROD_LOCALGROUPDISK" and substitute the path prefix to "rfio" protocol. The output will be simply a list of file paths. If you prefer to use the PoolFileCatalog.xml, one could use the following command: | ||
+ | |||
+ | <pre> | ||
+ | % dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK -P -R "srm://tbn18.nikhef.nl^rfio://" data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 | ||
+ | </pre> | ||
+ | </li> | ||
+ | |||
+ | <li>'''open the file in ROOT''' | ||
+ | |||
+ | <br/><br/>One should use '''TFile::Open()''' instead of '''new TFile()''' to open files using rfio protocols. The following example opens up a file at the NIKHEF-ELPROD_LOCALGROUPDISK directly over the network. | ||
+ | |||
+ | <pre> | ||
+ | root [0] TFile *f = TFile::Open("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1") | ||
+ | Warning in <TClass::TClass>: no dictionary for class AttributeListLayout is available | ||
+ | Warning in <TClass::TClass>: no dictionary for class pair<string,string> is available | ||
+ | |||
+ | root [1] cout << physics->GetEntries() << endl; | ||
+ | 11677 | ||
+ | </pre> | ||
+ | |||
+ | Assuming an analysis running over multiple files, using '''TChain''' would be more convenient. The example below shows how to read multiple files on NIKHEF-ELPROD_LOCALGROUPDISK via '''TChain''' object. | ||
+ | |||
+ | <pre> | ||
+ | root [2] TChain *c = new TChain("physics"); | ||
+ | |||
+ | root [3] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1") | ||
+ | (Int_t)1 | ||
+ | |||
+ | root [4] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000001.root.1") | ||
+ | (Int_t)1 | ||
+ | |||
+ | root [5] cout << c->GetEntries() << endl; | ||
+ | 39391 | ||
+ | </pre> | ||
+ | </li> | ||
+ | |||
+ | '''Remark''': reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network. It's important to enable the usage of the [[http://root.cern.ch/root/html/TTreeCache.html TTreeCache]] when you are using reading events from a TTree. <font color=red>Previous studies have shown that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth</font>. To enable the TTreeCache with 10 megabyte cache, one can do: | ||
+ | |||
+ | <pre> | ||
+ | root[6] c->SetCacheSize(10000000); | ||
+ | |||
+ | root[7] c->AddBranchToCache("*",kTRUE); | ||
+ | </pre> | ||
+ | |||
+ | </ul> |
Latest revision as of 11:05, 28 September 2012
What is the local group grid storage at NIKHEF?
The grid storage for the local group has been setup at NIKHEF. In terms of the DQ2 terminology, the DQ2 site name is called "NIKHEF-ELPROD_LOCALGROUPDISK". It has 20 terabytes of disk space and used previously already by few groups to host D3PD Ntuples.
Unlike other grid storage spaces (that are centrally managed by the ATLAS central grid operation), we are our own manager of the usage of the local group disk. We decide ourselves how to share the space among sub-groups (TOP, SUSY, HIGGS, etc.) at NIKHEF.
How much space left and who is using it?
The following plot shows the disk usage status in past 30 days. The distance between the green and the red curves can be read as the free space remaining on this space.
http://bourricot.cern.ch/dq2/media/fig/NIKHEF-ELPROD_LOCALGROUPDISK_30.png
More accounting information can be found on this [page].
Lists of datasets on the local group disk can be found on this [page]. From the "User" column you know who replicated/created the datasets.
How to use it?
moving datasets to it
If the datasets already exist on the grid, you can simply request the replication of them to the NIKHEF-ELPROD_LOCALGROUPDISK using the [DaTRI interface].
Placing dataset replication requests in DaTRI has some requirements:
- You have a valid grid certificate loaded in the browser. When you apply a Dutch grid certification issued by TERENA, you should have the certificate properly loaded into the browser. If not, following this instruction.
- You have to register yourself in DaTRI service. You can do it from [here]. If you are not sure whether or not this is done before, check your registration status [here].
Once you have everything setup, you can request data replications the [DaTRI interface].
The "Request Parameters" are mandatory. The "Data Pattern" takes into account the wildcard symbol ("*") and the container symbol ("/" at the end of the name) . For the "Destination Sites", chose "NL" cloud and the "NIKHEF-ELPROD_LOCALGROUPDISK".
For the "Control Parameters" simply put in "data analysis" into the "justification".
NOTE: All DaTRI requests need to be approved before data transfers take place. If the dataset size of a request is smaller than 500 gigabytes, the request is approved automatically as long as your grid certificate is recognized as a member of the NL group in ATLAS (i.e. you are able to do voms-proxy-init -voms atlas:/atlas/nl). For the requests with dataset size larger than 500 gigabytes, they have to be approved manually by the disk managers. For the moment, the managers are [Hurng-Chun Lee] and [Daniel Geerts]. Although the managers will be notified for every requests requiring additional manual approval, it would be appreciated if you could contact the managers in addition for quicker reactions.
creating your own datasets on it
accessing datasets/files on it
Assuming the existing dataset data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 on the NIKHEF-ELPROD_LOCALGROUPDISK and we want to access the files belong to this datasets from ROOT. One do the following steps:
- setup environment using CVMFS
% source /project/atlas/nikhef/cvmfs/setup.sh % setupATLAS % localSetupDQ2Client % localSetupROOT
- resolve file paths given the dataset name
% dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00 | grep 'srm' | sed 's|srm://tbn18.nikhef.nl|rfio://|g'
What the command does is to list the files belong to the dataset replica at "NIKHEF-ELPROD_LOCALGROUPDISK" and substitute the path prefix to "rfio" protocol. The output will be simply a list of file paths. If you prefer to use the PoolFileCatalog.xml, one could use the following command:
% dq2-ls -f -p -L NIKHEF-ELPROD_LOCALGROUPDISK -P -R "srm://tbn18.nikhef.nl^rfio://" data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00
- open the file in ROOT
One should use TFile::Open() instead of new TFile() to open files using rfio protocols. The following example opens up a file at the NIKHEF-ELPROD_LOCALGROUPDISK directly over the network.root [0] TFile *f = TFile::Open("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1") Warning in <TClass::TClass>: no dictionary for class AttributeListLayout is available Warning in <TClass::TClass>: no dictionary for class pair<string,string> is available root [1] cout << physics->GetEntries() << endl; 11677
Assuming an analysis running over multiple files, using TChain would be more convenient. The example below shows how to read multiple files on NIKHEF-ELPROD_LOCALGROUPDISK via TChain object.
root [2] TChain *c = new TChain("physics"); root [3] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000006.root.1") (Int_t)1 root [4] c->AddFile("rfio:///dpm/nikhef.nl/home/atlas/atlaslocalgroupdisk/data10_7TeV/NTUP_TOP/r1647_p306_p307_p379_p405/data10_7TeV.00160387.physics_Egamma.merge.NTUP_TOP.r1647_p306_p307_p379_p405_tid255675_00/NTUP_TOP.255675._000001.root.1") (Int_t)1 root [5] cout << c->GetEntries() << endl; 39391
Remark: reading events from the files on NIKHEF-ELPROD_LOCALGROUPDISK means reading data over network. It's important to enable the usage of the [TTreeCache] when you are using reading events from a TTree. Previous studies have shown that setting it to 10-100 megabytes will improve a lot on the data access performance and save lots of the network bandwidth. To enable the TTreeCache with 10 megabyte cache, one can do:
root[6] c->SetCacheSize(10000000); root[7] c->AddBranchToCache("*",kTRUE);