Difference between revisions of "WLCG Accounting"

From PDP/Grid Wiki
Jump to navigationJump to search
 
Line 8: Line 8:
  
 
First gather the computing numbers.  Go to [http://www.nikhef.nl/grid/accuse/ The NDPF accounting page].  On that page, under "Other Accounting Information", first choose [http://www.nikhef.nl/grid/accuse/lcg LCG Monthly Accounting Report and current disk usage].  Fill in the month, ie for April 2009, enter 2009-04 in the box and hit return.  After some seconds there will be two chunks of information returned, the first of which looks like this:
 
First gather the computing numbers.  Go to [http://www.nikhef.nl/grid/accuse/ The NDPF accounting page].  On that page, under "Other Accounting Information", first choose [http://www.nikhef.nl/grid/accuse/lcg LCG Monthly Accounting Report and current disk usage].  Fill in the month, ie for April 2009, enter 2009-04 in the box and hit return.  After some seconds there will be two chunks of information returned, the first of which looks like this:
<nowiki>
+
Aggregate use of the NDPF  from 2009-04-01 up till 2009-05-01 exclusive (kSI2k.days)
Aggregate use of the NDPF  from 2009-04-01 up till 2009-05-01 exclusive (kSI2k.days)
+
VO,CPU,WALL,GROUPS
VO,CPU,WALL,GROUPS
+
alice, 21464, 22339, alice alicesgm
alice, 21464, 22339, alice alicesgm
+
atlas, 22877, 27060, atlas atla atlb atlc atlsgm
atlas, 22877, 27060, atlas atla atlb atlc atlsgm
+
cms, 0, 10, cms cmssgm
cms, 0, 10, cms cmssgm
+
lhcb, 14, 38, lhcb lhcbprd lhcbsgm
lhcb, 14, 38, lhcb lhcbprd lhcbsgm
 
</nowiki>
 
 
This is the computing information in CSV format.  Use [http://www.nikhef.nl/~templon/nik-200904.csv this file] as a template and replace (using perhaps your favorite text editor) the top lines with the answer from the Monthly Accounting Link.
 
This is the computing information in CSV format.  Use [http://www.nikhef.nl/~templon/nik-200904.csv this file] as a template and replace (using perhaps your favorite text editor) the top lines with the answer from the Monthly Accounting Link.
  
 
The second chunk of information is the DPM storage use, this gives the disk usage at Nikhef.  Careful, there are known problems here with the handling of read-only disk space by the tool ... sometimes one has to resort to using dpm-qryconf --si by hand and adding up the numbers.
 
The second chunk of information is the DPM storage use, this gives the disk usage at Nikhef.  Careful, there are known problems here with the handling of read-only disk space by the tool ... sometimes one has to resort to using dpm-qryconf --si by hand and adding up the numbers.
<nowiki>
+
ALICE:          1100 G capacity  544 G free  555 G used
ALICE:          1100 G capacity  544 G free  555 G used
+
ATLASNL:        5500 G capacity  2417 G free  3082 G used
ATLASNL:        5500 G capacity  2417 G free  3082 G used
+
ATLASPRD:      259940 G capacity 159130 G free 100810 G used
ATLASPRD:      259940 G capacity 159130 G free 100810 G used
+
ATLASROF:        2930 G capacity  2930 G free    0 G used
ATLASROF:        2930 G capacity  2930 G free    0 G used
+
GENERIC:        2200 G capacity  1303 G free  896 G used
GENERIC:        2200 G capacity  1303 G free  896 G used
+
LHCB:            1100 G capacity  1100 G free    0 G used
LHCB:            1100 G capacity  1100 G free    0 G used
+
NCF:            2200 G capacity  2200 G free    0 G used
NCF:            2200 G capacity  2200 G free    0 G used
+
OPS:              40 G capacity    34 G free    5 G used
OPS:              40 G capacity    34 G free    5 G used
+
VIRGO:          1100 G capacity  1100 G free    0 G used
VIRGO:          1100 G capacity  1100 G free    0 G used
+
VLEMED:          1980 G capacity  612 G free  1367 G used
VLEMED:          1980 G capacity  612 G free  1367 G used
+
test:              26 G capacity    4 G free    22 G used
test:              26 G capacity    4 G free    22 G used
 
</nowiki>
 
 
For ALICE and LHCb it's easy, the numbers here correspond to the numbers in the template file mentioned above for "D1T0" (disk 1 tape 0 which is the only kind of storage we have at Nikhef).  So just replace the template numbers with the numbers above.  For ATLAS, ATLASNL is local storage and does not count for Tier-1; ATLASROF and ATLASPRD do, add them together and use the sums in the template.  Note if there are new pools added, I am not sure whether the accounting web page picks them up (I think so).  Finally, recall that this page gives the *current* status of DPM, not the status as of the first of the month!!
 
For ALICE and LHCb it's easy, the numbers here correspond to the numbers in the template file mentioned above for "D1T0" (disk 1 tape 0 which is the only kind of storage we have at Nikhef).  So just replace the template numbers with the numbers above.  For ATLAS, ATLASNL is local storage and does not count for Tier-1; ATLASROF and ATLASPRD do, add them together and use the sums in the template.  Note if there are new pools added, I am not sure whether the accounting web page picks them up (I think so).  Finally, recall that this page gives the *current* status of DPM, not the status as of the first of the month!!
  
Line 38: Line 34:
 
Finally I have an excel file that does the proper summing of the internal numbers so that they can be directly plugged in to the WLCG summaries.  [http://www.nikhef.nl/~templon/tot-200904.xls This template file] contains the proper magic; this is the actual version used for April 2009, the other two templates included above are the Nikhef and SARA CSV files for April 2009, so you have here the complete set for April 2009 actually used to produce the numbers.
 
Finally I have an excel file that does the proper summing of the internal numbers so that they can be directly plugged in to the WLCG summaries.  [http://www.nikhef.nl/~templon/tot-200904.xls This template file] contains the proper magic; this is the actual version used for April 2009, the other two templates included above are the Nikhef and SARA CSV files for April 2009, so you have here the complete set for April 2009 actually used to produce the numbers.
  
- open the summing template
+
- open the summing template
- go to the Nikhef tab
+
- go to the Nikhef tab
- open the Nikhef CSV file
+
- open the Nikhef CSV file
- select all the fields and "copy"
+
- select all the fields and "copy"
- go back to the Nikhef tab in the summing sheet, and paste in the new information
+
- go back to the Nikhef tab in the summing sheet, and paste in the new information
- repeat this process with the SARA tab and the SARA CSV file.
+
- repeat this process with the SARA tab and the SARA CSV file.
  
 
Now in the "total" tab you get the correct summed values in columns F,G, and H.  Before using them, go to the WLCG summary, select the pre-filled CPU numbers, and paste "as values" into the fields in column J (the tan-colored area).  Doing this allows you to check how close the WLCG figures are to our own.  It also shows, for our own numbers, what fraction of that number came from SARA, to help debug site-dependence of differences between the WLCG numbers and our own.
 
Now in the "total" tab you get the correct summed values in columns F,G, and H.  Before using them, go to the WLCG summary, select the pre-filled CPU numbers, and paste "as values" into the fields in column J (the tan-colored area).  Doing this allows you to check how close the WLCG figures are to our own.  It also shows, for our own numbers, what fraction of that number came from SARA, to help debug site-dependence of differences between the WLCG numbers and our own.
Line 53: Line 49:
 
Common sources of problems:
 
Common sources of problems:
  
- sometimes if a new group (like ATLAS pilot role) has been added to the farm, it may not yet have been added to what we publish to the GOC DB (so would not be reflected in the pre-filled summaries), or it might not yet have been added to the "accuse" cgi script (so will not be reflected in our own numbers).
+
- sometimes if a new group (like ATLAS pilot role) has been added to the farm, it may not yet have been added to what we publish to the GOC DB (so would not be reflected in the pre-filled summaries), or it might not yet have been added to the "accuse" cgi script (so will not be reflected in our own numbers).
  
- sometimes the "allocated" or "installed" fields are wrong.  This is because somebody forgot to send a mail to Harry Renshall informing him that new capacity had been added.
+
- sometimes the "allocated" or "installed" fields are wrong.  This is because somebody forgot to send a mail to Harry Renshall informing him that new capacity had been added.

Revision as of 08:58, 18 June 2009

External input:

The LCG Office sends monthly accounting summaries to us, to check for correctness. These summaries are filled using information from the GOC DB (for computing) and at some point will likely be filled from info out of the information system for storage. So the task we have is to check whether the summaries are correct.

For Nikhef, the other piece of external input is Ron's once-per-month email of a .CSV file having the SARA numbers.

The procedure

First gather the computing numbers. Go to The NDPF accounting page. On that page, under "Other Accounting Information", first choose LCG Monthly Accounting Report and current disk usage. Fill in the month, ie for April 2009, enter 2009-04 in the box and hit return. After some seconds there will be two chunks of information returned, the first of which looks like this:

Aggregate use of the NDPF  from 2009-04-01 up till 2009-05-01 exclusive (kSI2k.days)
VO,CPU,WALL,GROUPS
alice, 21464, 22339, alice alicesgm
atlas, 22877, 27060, atlas atla atlb atlc atlsgm
cms, 0, 10, cms cmssgm
lhcb, 14, 38, lhcb lhcbprd lhcbsgm

This is the computing information in CSV format. Use this file as a template and replace (using perhaps your favorite text editor) the top lines with the answer from the Monthly Accounting Link.

The second chunk of information is the DPM storage use, this gives the disk usage at Nikhef. Careful, there are known problems here with the handling of read-only disk space by the tool ... sometimes one has to resort to using dpm-qryconf --si by hand and adding up the numbers.

ALICE:           1100 G capacity   544 G free   555 G used
ATLASNL:         5500 G capacity  2417 G free  3082 G used
ATLASPRD:       259940 G capacity 159130 G free 100810 G used
ATLASROF:        2930 G capacity  2930 G free     0 G used
GENERIC:         2200 G capacity  1303 G free   896 G used
LHCB:            1100 G capacity  1100 G free     0 G used
NCF:             2200 G capacity  2200 G free     0 G used
OPS:               40 G capacity    34 G free     5 G used
VIRGO:           1100 G capacity  1100 G free     0 G used
VLEMED:          1980 G capacity   612 G free  1367 G used
test:              26 G capacity     4 G free    22 G used

For ALICE and LHCb it's easy, the numbers here correspond to the numbers in the template file mentioned above for "D1T0" (disk 1 tape 0 which is the only kind of storage we have at Nikhef). So just replace the template numbers with the numbers above. For ATLAS, ATLASNL is local storage and does not count for Tier-1; ATLASROF and ATLASPRD do, add them together and use the sums in the template. Note if there are new pools added, I am not sure whether the accounting web page picks them up (I think so). Finally, recall that this page gives the *current* status of DPM, not the status as of the first of the month!!

Now there is the email from Ron with the SARA numbers. I usually have to make a few changes, firstly to get rid of a leading blank line, secondly to change all the semicolons to commas (difference between USA CSV and NL CSV ... here, comma's are common in currency values so they use semicolon-separated variables instead). Finally, one of the disk classes at SARA has only two of the three VOs, I always wind up adding a single line with all zeroes to get the missing VO. The edited version of Ron's mail should look like this template file.

Finally I have an excel file that does the proper summing of the internal numbers so that they can be directly plugged in to the WLCG summaries. This template file contains the proper magic; this is the actual version used for April 2009, the other two templates included above are the Nikhef and SARA CSV files for April 2009, so you have here the complete set for April 2009 actually used to produce the numbers.

- open the summing template
- go to the Nikhef tab
- open the Nikhef CSV file
- select all the fields and "copy"
- go back to the Nikhef tab in the summing sheet, and paste in the new information
- repeat this process with the SARA tab and the SARA CSV file.

Now in the "total" tab you get the correct summed values in columns F,G, and H. Before using them, go to the WLCG summary, select the pre-filled CPU numbers, and paste "as values" into the fields in column J (the tan-colored area). Doing this allows you to check how close the WLCG figures are to our own. It also shows, for our own numbers, what fraction of that number came from SARA, to help debug site-dependence of differences between the WLCG numbers and our own.

Finally, copy the relevant numbers in columns G and H into the totals on the WLCG pre-filled sheet (make sure to copy "as values". Also, check that the installed capacities for CPU, disk, and tape on the WLCG summary sheet are still correct!

Once satisfied, send the result to LCG office.

Common sources of problems:

- sometimes if a new group (like ATLAS pilot role) has been added to the farm, it may not yet have been added to what we publish to the GOC DB (so would not be reflected in the pre-filled summaries), or it might not yet have been added to the "accuse" cgi script (so will not be reflected in our own numbers).
- sometimes the "allocated" or "installed" fields are wrong.   This is because somebody forgot to send a mail to Harry Renshall informing him that new capacity had been added.