VIRGOAccounting
VIRGO want to have daily accounting. This page documents how it's set up. It's a bit of a hack, suggestions for improvement are welcome.
The ingredients needed:
- cpu time, wall time, number of cores, and user name for each VIRGO job on day N
- mapping of usernames to "real person" names
Where do we get this:
- the torque accounting log files are already on alento (for the log monitoring script)
- the mappings are located on the CEs.
Mappings
For the mappings, there is already a little program gwhois2 located in ~templon/bin that will spit out the certificate subject associated; this is run in a script to find all VIRGO active mappings and spit out, for each username, the CN part of the certificate subject. Due to the permissions on the mappings, it has to be run as root.
[root@juk ~]# ~templon/bin/virmap virgo000,username removed 1 skip@infn.it virgo001,username removed 2 charlie@infn.it virgo002,username removed 3 buffy@infn.it virgo003,username removed 4 virgo075,pilot/osg-ligo-1.t2.ucsd.edu
The challenge is to have the script run as root, but show up somewhere in userland so that the information can be used for the mapping. We don't expect this information to change rapidly, so for the moment, the script is running as a cron job on juk, and the results are being mailed to JT, who can adapt the other bit of scriptology as needed if there are changes.
[root@juk ~]# crontab -l MAILTO=templon@nikhef.nl 33 11 * * * ~templon/bin/virmap
Accounting data
The Torque server accounting log files contain the necessary information; it can be extracted from the Torque log file via the pjobstats program; one of the functions of that program is to convert E (end) records in a Torque accounting log file, into CSV format. The resulting CSV file can be further processed via any number of means, I chose to use Miller, a command-line tool for processing CSV files.
The following pipe Does The Right Thing:
pjobstats -ml -a resources_used.cput,Exit_status -f <(cat /var/spool/pbs/server_logs/accounting/20180406 /var/spool/pbs/server_logs/accounting/20180407) \ -o /dev/stdout | mlr --icsv --opprint filter '$group=="virgo"' then then filter '$end>1522972800 && $end<=1523059200 put \ '$cputime = hms2sec($resources_used_cput)/3600.; $walltime = hms2sec($resources_used_walltime)/3600.; $wallcore = $walltime*$total_execution_slots' \ then stats1 -a count,sum -f cputime,wallcore -g user user cputime_count cputime_sum wallcore_count wallcore_sum virgo075 118 2.860551 118 174.006114 virgo003 2 0.040000 2 0.247777 virgo001 1 0.015000 1 0.025000
Where 1522972800 is 2018-04-06 00:00 UTC and 1523059200 is 2018-04-07 00:00 UTC.
Assembly
Script to be run in cron job: viracc in ~templon/bin. This script calls viracc.awk which is located in ~templon/lib and has world permissions turned off due to the account mapping information (group readable to tbadmin). Cron job on alento runs this script and puts the accounting data in a semi-private location for retrieval by VIRGO people.