NDPFAccounting

From PDP/Grid Wiki
Jump to navigationJump to search

Systems involved

vlaai gridmapdir NFS create poolmap files based on the gridmapdir state on a daily basis
stro Torque server conversion from Torque accounting files and inserting these into the NDPF accounting database on fulda, using a poolmapfile (it will try to collect one automatically if it can mount the gridmapdir)
klomp MON box conversion of data from accounting database to lcgrecords (UR-WG format), and upload through R-GMA

All relevant script are contained in a single RPM package "ndpf-acctfeeder" that contains both the local and the EGEE scripts (and the accuse client tool). Formally, the dependencies include only perl, perl-DBI, and perl-DBD-MySQL, but there are a few others needed on specific hosts:

pbsnodes
needed for the facility capacity option (default) in ndpf-acctfeed on the Torque server
rgma & grid-proxy-init
needed for ndpf-lcgrecordsfeed on the MON host

They have not been included in the rpm dependencies, so as to be able to have a single RPM that installs everywhere. This RPM does not install any cron jobs, and you must edit these two files where relevant:

/etc/pbsaccdb.conf
on the Torque server, needed for pbsaccdb.pl and pbsstatusdb.pl
/etc/lcgrecords.conf
on the MON box, needed for lcgrecords to read the database password and (optionally) a new default group-to-VO definition

both files, if present, must only be readable by root (uid 0).

NDPF Local Accounting

The local accounting is the most important element, and must (and is :-) fully reliable, because it is used as the basis for the cost reimbursement for projects where we contribute in-kind contributions in the form of compute cycles. These data are collected (yearly) from the NDPF accounting database on a per-VO basis.

Data is inserted into this atabase on a daily basis. The records are (or should) inserted just after midnight, when the pbs/torque accounting files have been closed and are complete. Since the accounting is based on the "E" records in that file, we thus get all completed jobs. Jobs that are still running will not be accounted -- they will be filed only then they are finished.

Master insertion

Insertion in the database requires the collaboration of two components:

  • the mapping from poolaccounts to grid user DNs
  • the extraction of the pbs data from the accounting file, and linking the unix users of the facility to their grid credentials

At the moment, the grid group (FQAN) mappings are not part of this scheme, and only unix groups are stored in the database. The unix group-to-gridVO mapping is only done in the EGEE upload phase. This is partly historical, but since the VO-FQAN mapping side of the grid software is in constant flux anyway it is better like it is done now.

To ease the insertion, a meta-utility has been developed: ndpf-acctfeed. It is to be run on the PBS master (stro) every night, and by default will process yesterday's accounting file:

Usage: ndpf-acctfeed [-v] [-h] [-f] [--mapfile <poolmap>|--gridmapdir <dir>]
           [--date|-d Ymd] [--nocapacity] [-n|--dryrun]
           [--pbsaccdir dir] [--progacc command] [--progcapacity command]

and in accounting.ncm-cron.cron:

15 0 * * * root (date --iso-8601=seconds --utc; PATH=/sbin:/bin:/usr/sbin:/usr/bin:/share/acct/bin; /share/acct/bin/merge_new_accdata) >> /var/log/accounting.ncm-cron.log 2>&1

The utility will do its very best to find a mapping from unix uids to gridDNs. By default it will look for the .poolmap.YYYYMMDD files that are created at midnight on the gridmapdir NFS server in the gridmapdir (currently on vlaai at 23:58 local time). If such a poolmap cannot be found, it will first try the file you specified on the command line without the YYYYMMDD postfix, then will read the gridmapdir (/share/gridmapdir by default) and create a temporary poolmapfile just for this run. If both the gridmapdir and the poolmapfile(s) are unreadable, the utility aborts.

PS: the ndpf-acctfeed meta-tool replaces the historic 'merge_new_accdata' script.

Collecting data from PBS/Torque

The file /etc/pbsaccdb.conf must be present on the collecting system (i.e. the Torque server) and be formatted as described in NDPFAccouting_pbsaccdbconf.


Scaling and the GHzHourEquivalent

Incorporating grid mappings

The grid mappings are matches to the unix uids by the pbsaccdb.pl script, based on a mapfile. This map file is formatted with one mapping per line, with a single TAB character (\t) between the uid and the DN strings, like in:

unixuid       /DNstring

This file is not generated by pbsaccdb.pl, but needs to be prepared and passed as a command-line argument (or a sensible default is taken, based on the date specification in the pbs accounting filename given). Normally, the ndpf-acctfeed meta-utility takes care of matching the poolmapfile and the accounting file based on dates, but also this utility will look for a 'true' mapfile that reflects the actual grid-DN-mappings in use on that date. So, such a file must be generated daily. Note that you do need actual dates mapfiles, since poolaccounts will expire after some time and get re-cycled (usually after 100 days of inactivity).

The mapfiles are generated by the poolmaprun script (part of the managepoolmap package, vlaai currently has managepoolmap-1.1-2 installed) on the NFS server hosting the gridmapdir (today: vlaai). This script will both create the mapfile of today and afterwards release any poolaccount mappings that have been idle for 100 days. This script is run from cron on the gridmapdir server:

/etc/cron.d/manage_gridmapdir:
58 23 * * * root    MAX_AGE=100 CLEANING=1 /usr/local/bin/poolmaprun

The cron job is installed automatically by the RPM. The script must run on the server since the directory is (and should be) root-owned and the files are written (as '.poolmap.YYYYMMDD') to this directory. Really historic poolmaps are then later moved to a subdirectory gridmapdir/.history/

Re-inserting historic data

First make sure you have accurate poolmap files, and then re-run the ndpf-acctfeed program with a date option:

ndpf-acctfeed --date 20081201

and do this for every missing day. It is harmless to re-insert the same day twice (the rows in the database will just be replaced, as the table is keyed on the JobID which is generated like

md5_base64($pbsinfo{qtime}.$MasterFQDN.$JobID)

where qtime is the time the job was put in the queue by the user, the MasterFQDN is the hostname of the Torque master server, and the JobID is the numerical part of the Torque job id. These remain constant once a job is submitted by the user.

EGEE uploads

The uploads to the GOC Accounting system (EGEE/LCG) use the direct R-GMA interface by inserting into the lcgRecords table. All data put there is taken exclusively from the NDPF accounting database, and it does not use any other source of records.

Requirements on the submission host

You must be able to talk to the database in perl, so you need perl_DBI and perl-DBD-MySQL, but you'll also need grid-proxy-init and rgma (in /opt/globus and /opt/glite). If any of these path are non-standard, you must either recompile the RPM, or specify everything on the command line.

Like for pbsaccdb.pl, there is a convenience meta-utility available to automate much of the gory details of uploading data in R-GMA: the ndpf-lcgrecordsfeed. By default, it will upload the records from yesterday, i.e. jobs that ENDED yesterday according to the NDPF Accounting database. For good measure, unless dates have been specified explicitly, it will also (again) upload all data from 3 to 7 days ago.

Run this on the MON box (klomp, for now) as root (uid=0)

Usage: ndpf-lcgrecordsfeed [-v] [-h] [-f] [-n|--dryrun] [--runallasroot] [--accuser uid]
           [--logfile=file|""] [--cert=file] [--key=file] [--proxydir dir]
           [--proxyhours=hrs]
           [--startdate|-s Ymd] [--enddate|-e Ymd] [--[no-]resiliance]

(see the source for even more hairy options). You'll need a /etc/lcgrecords.conf file to instruct the underlying lcgrecords program what to use to talk to the database. For example:

cat /etc/lcgrecords.conf
# lcgrecords.conf
$opt_dbuser="lcgusage";
$opt_dbpasswd="PASSWORD";

And you MUST have a user 'accuser' defined on the system. This user is used to execute the R-GMA command at the end, as we will never trust R-GMA to run as root :-)

The script will: - create a proxy for the accuser user (unless --runallasroot has been specified in a temporary file for 4 hours - extract yesterday's data (or specified dates) from the NDPF accounting database and write to a temporary file - (unless a daterange is specified or --noresiliance): als get all data from 4 to 6 days ago and also write this to the temporary file - execure the RGMA command with the proper environment (X509_USER_PROXY and GLITE_LOCATION) and execure the commands in the temporary datafile using the "-f" option to R-GMA)

The meta-utility, like lcgrecords, is very verbose, and will by default write a log file for each invocation to /var/tmp/lcgfeed.TIMESTAMP.XXXXXX. You can override this choice with the --logfile directive. If you want to see the output on the screen, use

ndpf-lcgrecordsfeed -o "" 

Make sure this script runs daily, but well after the records from the previous day have been inserted into the local NDPF accounting database. For example in /etc/cron.d/append-records:

34 8 * * *     /usr/local/sbin/ndpf-lcgrecordsfeed

Note: this replaces the old append-records script from hooimijt.

Uploading historic data to the GOC

To do this, first make sure the data is actually there in the NDPF accounting database (see there). Then you can re-invoke the meta-utility:

ndpf-lcgrecordsfeed -s 20081201 -e 20081205 

and is will work as usual (including the writing of a log file with the current time as the time stamp. If you upload too much data, R-GMA will run out of memory and data will be lost. So, limit the time scales!


Translating accounting records to UR-WG format

This is done by the lcgrecords program (a specialised version of accuse, actually), and aftr this all hosts will hve the standard performance of 410 SI2k per core. Scaling (to GHzHoursEquivalent) has already been taken case of by the pbsaccdb.pl script and should not be done again here. Anyway, you cannot convert hours to SI2k with re-inspecting the host list, since adding and multiplication in our algebra do not commute.

The output of the lcgrecords script is a set of gLite rgma INSERT commands. Note that the EDG version of rgma is no longer supported after this upgrade!

Upload protocol

Log monitoring and status pages

Have a look at the CESGA accounting GANTT chart once in a while to see if everything is still working.

Cron jobs to do accounting