Difference between revisions of "WLCG Accounting"
Line 16: | Line 16: | ||
cms, 0, 10, cms cmssgm | cms, 0, 10, cms cmssgm | ||
lhcb, 14, 38, lhcb lhcbprd lhcbsgm | lhcb, 14, 38, lhcb lhcbprd lhcbsgm | ||
− | This is the computing information in CSV format. Use [http://www.nikhef.nl/~templon/nik- | + | This is the computing information in CSV format. Use [http://www.nikhef.nl/~templon/nik-201312.csv this file] as a template and replace (using perhaps your favorite text editor) the top lines with the answer from the Monthly Accounting Link. |
− | The second chunk of information should be the DPM storage use, this gives the disk usage at Nikhef, but as of this writing (January 2014) it does not work. The fallback | + | The second chunk of information should be the DPM storage use, this gives the disk usage at Nikhef, but as of this writing (January 2014) it does not work. The solution is to use the csv files in /var/adm on tbn18 ... use the one for the end of the month. Here is for example the contents of the file for 31-12-2013, which should be used for the december 2013 accounting report submitted in January 2014: |
+ | [root@tbn18 adm]# cat 20131231.csv | ||
+ | DISK USAGE in GB | ||
+ | T0D1 POOLS,USED,AVAILABLE,TOTAL | ||
+ | atlas,1030525,217595,1248119 | ||
+ | lhcb,0,0,0 | ||
+ | alice,0,0,0 | ||
+ | tape usage in GB,USED,INSTALLED | ||
+ | atlas,0,0 | ||
+ | lhcb,0,0 | ||
+ | alice,0,0 | ||
+ | |||
+ | Now there is the email from Ron with the SARA numbers. Firstly get rid of a leading blank line, secondly change all the semicolons to commas (difference between USA CSV and NL CSV). The edited version of Ron's mail should look like [http://www.nikhef.nl/~templon/sar-201312.csv this template file]. | ||
+ | |||
+ | Finally I have an excel file that does the proper summing of the internal numbers so that they can be directly plugged in to the WLCG summaries. [http://www.nikhef.nl/~templon/tot-200904.xls This template file] contains the proper magic; this is the actual version used for April 2009, the other two templates included above are the Nikhef and SARA CSV files for April 2009, so you have here the complete set for April 2009 actually used to produce the numbers. | ||
+ | |||
+ | - open the summing template | ||
+ | - go to the Nikhef tab | ||
+ | - open the Nikhef CSV file | ||
+ | - select all the fields and "copy" | ||
+ | - go back to the Nikhef tab in the summing sheet, and paste in the new information | ||
+ | - repeat this process with the SARA tab and the SARA CSV file. | ||
+ | |||
+ | Now in the "total" tab you get the correct summed values in columns F,G, and H. Before using them, go to the WLCG summary, select the pre-filled CPU numbers, and paste "as values" into the fields in column J (the tan-colored area). Doing this allows you to check how close the WLCG figures are to our own. It also shows, for our own numbers, what fraction of that number came from SARA, to help debug site-dependence of differences between the WLCG numbers and our own. | ||
+ | |||
+ | Finally, copy the relevant numbers in columns G and H into the totals on the WLCG pre-filled sheet (make sure to copy "as values". Also, check that the installed capacities for CPU, disk, and tape on the WLCG summary sheet are still correct! | ||
+ | |||
+ | Once satisfied, send the result to LCG office. | ||
+ | |||
+ | Common sources of problems: | ||
+ | |||
+ | - sometimes if a new group (like ATLAS pilot role) has been added to the farm, it may not yet have been added to what we publish to the GOC DB (so would not be reflected in the pre-filled summaries), or it might not yet have been added to the "accuse" cgi script (so will not be reflected in our own numbers). | ||
+ | - sometimes the "allocated" or "installed" fields are wrong. This is because somebody forgot to send a mail to Harry Renshall informing him that new capacity had been added. | ||
+ | |||
+ | Here is another fallback for the disk numbers, for reference: resort to using dpm-qryconf --si; one can either adding up the numbers by hand, or use the script dpmqp in Jeff's home directory: | ||
[root@tbn18 ~]# dpm-qryconf --si | ~templon/dpmqp | [root@tbn18 ~]# dpm-qryconf --si | ~templon/dpmqp | ||
['POOL', 'ATLASHOT'] | ['POOL', 'ATLASHOT'] | ||
Line 49: | Line 83: | ||
USED, AVAILABLE, TOTAL | USED, AVAILABLE, TOTAL | ||
1.06052e+06 , 208600 , 1.26912e+06 | 1.06052e+06 , 208600 , 1.26912e+06 | ||
− | |||
The last numeric line above is what needs to go into the disk storage part of the template file above: | The last numeric line above is what needs to go into the disk storage part of the template file above: | ||
Line 58: | Line 91: | ||
alice,0,0,0 | alice,0,0,0 | ||
[ ... ] | [ ... ] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Revision as of 13:08, 23 January 2014
Note: any improvements in this procedure are van harte welkom.
External input:
The LCG Office sends monthly accounting summaries to us, to check for correctness. These summaries are filled using information from the GOC DB (for computing) and at some point will likely be filled from info out of the information system for storage. So the task we have is to check whether the summaries are correct.
For Nikhef, the other piece of external input is Ron's once-per-month email of a .CSV file having the SARA numbers.
The procedure
First gather the computing numbers. Go to The NDPF accounting page. On that page, under "Other Accounting Information", first choose LCG Monthly Accounting Report and current disk usage. Fill in the month, ie for April 2009, enter 2009-04 in the box and hit return. After some seconds there will be two chunks of information returned, the first of which looks like this:
Aggregate use of the NDPF from 2009-04-01 up till 2009-05-01 exclusive (kSI2k.days) VO,CPU,WALL,GROUPS alice, 21464, 22339, alice alicesgm atlas, 22877, 27060, atlas atla atlb atlc atlsgm cms, 0, 10, cms cmssgm lhcb, 14, 38, lhcb lhcbprd lhcbsgm
This is the computing information in CSV format. Use this file as a template and replace (using perhaps your favorite text editor) the top lines with the answer from the Monthly Accounting Link.
The second chunk of information should be the DPM storage use, this gives the disk usage at Nikhef, but as of this writing (January 2014) it does not work. The solution is to use the csv files in /var/adm on tbn18 ... use the one for the end of the month. Here is for example the contents of the file for 31-12-2013, which should be used for the december 2013 accounting report submitted in January 2014:
[root@tbn18 adm]# cat 20131231.csv DISK USAGE in GB T0D1 POOLS,USED,AVAILABLE,TOTAL atlas,1030525,217595,1248119 lhcb,0,0,0 alice,0,0,0 tape usage in GB,USED,INSTALLED atlas,0,0 lhcb,0,0 alice,0,0
Now there is the email from Ron with the SARA numbers. Firstly get rid of a leading blank line, secondly change all the semicolons to commas (difference between USA CSV and NL CSV). The edited version of Ron's mail should look like this template file.
Finally I have an excel file that does the proper summing of the internal numbers so that they can be directly plugged in to the WLCG summaries. This template file contains the proper magic; this is the actual version used for April 2009, the other two templates included above are the Nikhef and SARA CSV files for April 2009, so you have here the complete set for April 2009 actually used to produce the numbers.
- open the summing template - go to the Nikhef tab - open the Nikhef CSV file - select all the fields and "copy" - go back to the Nikhef tab in the summing sheet, and paste in the new information - repeat this process with the SARA tab and the SARA CSV file.
Now in the "total" tab you get the correct summed values in columns F,G, and H. Before using them, go to the WLCG summary, select the pre-filled CPU numbers, and paste "as values" into the fields in column J (the tan-colored area). Doing this allows you to check how close the WLCG figures are to our own. It also shows, for our own numbers, what fraction of that number came from SARA, to help debug site-dependence of differences between the WLCG numbers and our own.
Finally, copy the relevant numbers in columns G and H into the totals on the WLCG pre-filled sheet (make sure to copy "as values". Also, check that the installed capacities for CPU, disk, and tape on the WLCG summary sheet are still correct!
Once satisfied, send the result to LCG office.
Common sources of problems:
- sometimes if a new group (like ATLAS pilot role) has been added to the farm, it may not yet have been added to what we publish to the GOC DB (so would not be reflected in the pre-filled summaries), or it might not yet have been added to the "accuse" cgi script (so will not be reflected in our own numbers). - sometimes the "allocated" or "installed" fields are wrong. This is because somebody forgot to send a mail to Harry Renshall informing him that new capacity had been added.
Here is another fallback for the disk numbers, for reference: resort to using dpm-qryconf --si; one can either adding up the numbers by hand, or use the script dpmqp in Jeff's home directory:
[root@tbn18 ~]# dpm-qryconf --si | ~templon/dpmqp ['POOL', 'ATLASHOT'] CAPACITY 4.40T FREE 0 ( 0.0%) USED, AVAILABLE, TOTAL 1480.090, 2919.910, 4400.000 ['POOL', 'ATLASPRD'] CAPACITY 1.26P FREE 0 ( 0.0%) USED, AVAILABLE, TOTAL 1057520.000, 207090.000, 1264610.000 ['POOL', 'ATLASSGM'] CAPACITY 107.32G FREE 3.47M ( 0.0%) USED, AVAILABLE, TOTAL 107.317, 0.003, 107.320 ['POOL', 'BIOMED'] CAPACITY 2.20T FREE 1.22T ( 55.3%) USED, AVAILABLE, TOTAL 980.000, 1220.000, 2200.000 [ ... ]
After verifying that all the ATLAS pools are at the top, you might do something like this:
[root@tbn18 ~]# dpm-qryconf --si | ~templon/dpmqp | head -14 | \ awk 'BEGIN {used=avail=tot=0}; \ $1 ~ /[0-9]+/ { used+=$1; avail+=$2; tot+=$3 }; \ END { print "USED, AVAILABLE, TOTAL" ; \ print used, ",", avail, ",", tot }' USED, AVAILABLE, TOTAL 1.06052e+06 , 208600 , 1.26912e+06
The last numeric line above is what needs to go into the disk storage part of the template file above:
DISK USAGE in GB,,, T0D1 POOLS,USED,AVAILABLE,TOTAL atlas,1.06052e+06 , 208600 , 1.26912e+06 lhcb,0,0,0 alice,0,0,0 [ ... ]