VIRGOAccounting

From PDP/Grid Wiki
Jump to navigationJump to search

VIRGO want to have daily accounting. This page documents how it's set up. It's a bit of a hack, suggestions for improvement are welcome.

The ingredients needed:

  1. cpu time, wall time, number of cores, and user name for each VIRGO job on day N
  2. mapping of usernames to "real person" names

Where do we get this:

  1. the torque accounting log files are already on alento (for the log monitoring script)
  2. the mappings are located on the CEs.

Mappings

For the mappings, there is already a little program gwhois2 located in ~templon/bin that will spit out the certificate subject associated; this is run in a script to find all VIRGO active mappings and spit out, for each username, the CN part of the certificate subject. Due to the permissions on the mappings, it has to be run as root.

 [root@juk ~]# ~templon/bin/virmap
 virgo000,username removed 1 skip@infn.it
 virgo001,username removed 2 charlie@infn.it
 virgo002,username removed 3 buffy@infn.it
 virgo003,username removed 4
 virgo075,pilot/osg-ligo-1.t2.ucsd.edu

The challenge is to have the script run as root, but show up somewhere in userland so that the information can be used for the mapping. We don't expect this information to change rapidly, so for the moment, the script is running as a cron job on juk, and the results are being mailed to JT, who can adapt the other bit of scriptology as needed if there are changes.

 [root@juk ~]# crontab -l
 MAILTO=templon@nikhef.nl
 33 11 * * * ~templon/bin/virmap

Accounting data

The Torque server accounting log files contain the necessary information; it can be extracted from the Torque log file via the pjobstats program; one of the functions of that program is to convert E (end) records in a Torque accounting log file, into CSV format. The resulting CSV file can be further processed via any number of means, I chose to use Miller, a command-line tool for processing CSV files.

The following pipe Does The Right Thing:

  pjobstats -ml -a resources_used.cput,Exit_status,end -f <(cat /var/spool/pbs/server_logs/accounting/20180406 /var/spool/pbs/server_logs/accounting/20180407 ) \
  -o /dev/stdout | mlr --icsv --opprint filter '$group=="virgo"' then filter '$end>1522972800 && $end<=1523059200' then put \
  '$cputime = hms2sec($resources_used_cput)/3600.; $walltime = hms2sec($resources_used_walltime)/3600.; $wallcore = $walltime*$total_execution_slots' \
  then stats1 -a count,sum -f cputime,wallcore -g user
 
 user     cputime_count cputime_sum wallcore_count wallcore_sum
 virgo075 129           3.006109    129            177.899723
 virgo003 2             0.040000    2              0.247777
 virgo001 1             0.015000    1              0.025000

Where 1522972800 is 2018-04-06 00:00 UTC and 1523059200 is 2018-04-07 00:00 UTC.

Assembly

Script to be run in cron job: viracc in ~templon/bin. This script calls viracc.awk which is located in ~templon/lib and has world permissions turned off due to the account mapping information (group readable to tbadmin). Cron job on alento runs this script and puts the accounting data in a semi-private location for retrieval by VIRGO people.