Revision as of 16:31, 17 June 2011

Systems involved

vlaai	gridmapdir NFS	create poolmap files based on the gridmapdir state on a daily basis
stro	Torque server	conversion from Torque accounting files and inserting these into the NDPF accounting database on fulda, using a poolmapfile (it will try to collect one automatically if it can mount the gridmapdir)
bosui	any EL5 system	extraction of data from database into lcgRecords format and upload through AMQ-OpenWire (but note the program is called 'stompfeeder', vaguely hoping that one time the APEL group implements a usable upload protocol ...)

The relevant scripts for collecting and assembling the accounting information are contained in a single RPM package "ndpf-acctfeeder" that contains both the local and the EGEE scripts (and the accuse client tool). Formally, the dependencies include only perl, perl-DBI, and perl-DBD-MySQL, but there are a few others needed on specific hosts:

pbsnodes: needed for the facility capacity option (default) in ndpf-acctfeed on the Torque server

They have not been included in the rpm dependencies, so as to be able to have a single RPM that installs everywhere. This RPM does not install any cron jobs, and you must edit these two files where relevant:

/etc/pbsaccdb.conf: on the Torque server, needed for pbsaccdb.pl and pbsstatusdb.pl
/etc/stompfeeder.conf: on the uploader box, needed for ndpf-stompfeeder to (optionally) a new default group-to-VO definition, as well as database access passwords and the like unless specified on the command line

both files, if present, must only be readable by root (uid 0).

The new AMQ uploader tool, along with some necessary AMQ libraries, is installed on bosui via the rpm 'nikhef-apel'. Looking in /usr/local/bin/nikhef-apel is reasonably helpful for figuring out how to configure the thing. The AMQ uploader basically pipes SQL statements into the OpenWire uploader CLI, which sends the records off to the UK.

NDPF Local Accounting

The local accounting is the most important element, and must (and is :-) fully reliable, because it is used as the basis for the cost reimbursement for projects where we contribute in-kind contributions in the form of compute cycles. These data are collected (yearly) from the NDPF accounting database on a per-VO basis.

Data is inserted into this atabase on a daily basis. The records are (or should) inserted just after midnight, when the pbs/torque accounting files have been closed and are complete. Since the accounting is based on the "E" records in that file, we thus get all completed jobs. Jobs that are still running will not be accounted -- they will be filed only then they are finished.

Master insertion

Insertion in the database requires the collaboration of two components:

the mapping from poolaccounts to grid user DNs
the extraction of the pbs data from the accounting file, and linking the unix users of the facility to their grid credentials

At the moment, the grid group (FQAN) mappings are not part of this scheme, and only unix groups are stored in the database. The unix group-to-gridVO mapping is only done in the EGEE upload phase. This is partly historical, but since the VO-FQAN mapping side of the grid software is in constant flux anyway it is better like it is done now.

To ease the insertion, a meta-utility has been developed: ndpf-acctfeed. It is to be run on the PBS master (stro) every night, and by default will process yesterday's accounting file:

Usage: ndpf-acctfeed [-v] [-h] [-f] [--mapfile <poolmap>|--gridmapdir <dir>]
           [--date|-d Ymd] [--nocapacity] [-n|--dryrun]
           [--pbsaccdir dir] [--progacc command] [--progcapacity command]

and in accounting.ncm-cron.cron:

15 0 * * * root (PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin; /usr/local/sbin/ndpf-acctfeed) >> /var/log/accounting.ncm-cron.log 2>&1

The utility will do its very best to find a mapping from unix uids to gridDNs. By default it will look for the .poolmap.YYYYMMDD files that are created at midnight on the gridmapdir NFS server in the gridmapdir (currently on vlaai at 23:58 local time). If such a poolmap cannot be found, it will first try the file you specified on the command line without the YYYYMMDD postfix, then will read the gridmapdir (/share/gridmapdir by default) and create a temporary poolmapfile just for this run. If both the gridmapdir and the poolmapfile(s) are unreadable, the utility aborts.

PS: the ndpf-acctfeed meta-tool replaces the historic 'merge_new_accdata' script.

Collecting data from PBS/Torque

The file /etc/pbsaccdb.conf must be present on the collecting system (i.e. the Torque server) and be formatted as described in NDPFAccouting_pbsaccdbconf.

Scaling and the GHzHourEquivalent

Incorporating grid mappings

The grid mappings are matches to the unix uids by the pbsaccdb.pl script, based on a mapfile. This map file is formatted with one mapping per line, with a single TAB character (\t) between the uid and the DN strings, like in:

unixuid       /DNstring

This file is not generated by pbsaccdb.pl, but needs to be prepared and passed as a command-line argument (or a sensible default is taken, based on the date specification in the pbs accounting filename given). Normally, the ndpf-acctfeed meta-utility takes care of matching the poolmapfile and the accounting file based on dates, but also this utility will look for a 'true' mapfile that reflects the actual grid-DN-mappings in use on that date. So, such a file must be generated daily. Note that you do need actual dates mapfiles, since poolaccounts will expire after some time and get re-cycled (usually after 100 days of inactivity).

The mapfiles are generated by the poolmaprun script (part of the managepoolmap package, vlaai currently has managepoolmap-1.1-2 installed) on the NFS server hosting the gridmapdir (today: vlaai). This script will both create the mapfile of today and afterwards release any poolaccount mappings that have been idle for 100 days. This script is run from cron on the gridmapdir server:

/etc/cron.d/manage_gridmapdir:
58 23 * * * root    MAX_AGE=100 CLEANING=1 /usr/local/bin/poolmaprun

The cron job is installed automatically by the RPM. The script must run on the server since the directory is (and should be) root-owned and the files are written (as '.poolmap.YYYYMMDD') to this directory. Really historic poolmaps are then later moved to a subdirectory gridmapdir/.history/

Re-inserting historic data

First make sure you have accurate poolmap files, and then re-run the ndpf-acctfeed program with a date option:

ndpf-acctfeed --date 20081201

and do this for every missing day. It is harmless to re-insert the same day twice (the rows in the database will just be replaced, as the table is keyed on the JobID which is generated like

md5_base64($pbsinfo{qtime}.$MasterFQDN.$JobID)

where qtime is the time the job was put in the queue by the user, the MasterFQDN is the hostname of the Torque master server, and the JobID is the numerical part of the Torque job id. These remain constant once a job is submitted by the user.

EGEE uploads

The uploads to the GOC Accounting system (EGEE/LCG) use the direct R-GMA interface by inserting into the lcgRecords table. All data put there is taken exclusively from the NDPF accounting database, and it does not use any other source of records.

Requirements on the submission host

You must be able to talk to the database in perl, so you need perl_DBI and perl-DBD-MySQL, but you'll also need grid-proxy-init and rgma (in /opt/globus and /opt/glite). If any of these path are non-standard, you must either recompile the RPM, or specify everything on the command line.

Like for pbsaccdb.pl, there is a convenience meta-utility available to automate much of the gory details of uploading data in R-GMA: the ndpf-lcgrecordsfeed. By default, it will upload the records from yesterday, i.e. jobs that ENDED yesterday according to the NDPF Accounting database. For good measure, unless dates have been specified explicitly, it will also (again) upload all data from 3 to 7 days ago.

Run this on the MON box (klomp, for now) as root (uid=0)

Usage: ndpf-lcgrecordsfeed [-v] [-h] [-f] [-n|--dryrun] [--runallasroot] [--accuser uid]
           [--logfile=file|""] [--cert=file] [--key=file] [--proxydir dir]
           [--proxyhours=hrs]
           [--startdate|-s Ymd] [--enddate|-e Ymd] [--[no-]resiliance]

(see the source for even more hairy options). You'll need a /etc/lcgrecords.conf file to instruct the underlying lcgrecords program what to use to talk to the database. For example:

cat /etc/lcgrecords.conf
# lcgrecords.conf
$opt_dbuser="lcgusage";
$opt_dbpasswd="PASSWORD";

And you MUST have a user 'accuser' defined on the system. This user is used to execute the R-GMA command at the end, as we will never trust R-GMA to run as root :-)

The script will: - create a proxy for the accuser user (unless --runallasroot has been specified in a temporary file for 4 hours - extract yesterday's data (or specified dates) from the NDPF accounting database and write to a temporary file - (unless a daterange is specified or --noresiliance): als get all data from 4 to 6 days ago and also write this to the temporary file - execure the RGMA command with the proper environment (X509_USER_PROXY and GLITE_LOCATION) and execure the commands in the temporary datafile using the "-f" option to R-GMA)

The meta-utility, like lcgrecords, is very verbose, and will by default write a log file for each invocation to /var/tmp/lcgfeed.TIMESTAMP.XXXXXX. You can override this choice with the --logfile directive. If you want to see the output on the screen, use

ndpf-lcgrecordsfeed -o ""

Make sure this script runs daily, but well after the records from the previous day have been inserted into the local NDPF accounting database. For example in /etc/cron.d/append-records:

34 8 * * *     /usr/local/sbin/ndpf-lcgrecordsfeed

Note: this replaces the old append-records script from hooimijt.

Uploading historic data to the GOC

To do this, first make sure the data is actually there in the NDPF accounting database (see there). Then you can re-invoke the meta-utility:

ndpf-lcgrecordsfeed -s 20081201 -e 20081205

and is will work as usual (including the writing of a log file with the current time as the time stamp. If you upload too much data, R-GMA will run out of memory and data will be lost. So, limit the time scales!

Translating accounting records to UR-WG format

This is done by the lcgrecords program (a specialised version of accuse, actually), and aftr this all hosts will hve the standard performance of 410 SI2k per core. Scaling (to GHzHoursEquivalent) has already been taken case of by the pbsaccdb.pl script and should not be done again here. Anyway, you cannot convert hours to SI2k with re-inspecting the host list, since adding and multiplication in our algebra do not commute.

The output of the lcgrecords script is a set of gLite rgma INSERT commands. Note that the EDG version of rgma is no longer supported after this upgrade!

Upload protocol

Log monitoring and status pages

Have a look at the CESGA accounting GANTT chart once in a while to see if everything is still working.

Cron jobs to do accounting

There are three cron jobs:

vlaai: installed automatically from the managepoolmap-1.1-2 package in (when missing, the ndpf-acctfeeder will try and instantly create a map file as well)

58 23 * * * root    MAX_AGE=100 CLEANING=1 /usr/local/bin/poolmaprun

stro: installed via quattor in /etc/cron.d/accounting.ncm-cron.cron

15 0 * * * root (date --iso-8601=seconds --utc; PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin; /usr/local/sbin/ndpf-acctfeed) >> /var/log/accounting.ncm-cron.log 2>&1

bosui: installed via quattor in /etc/cron.d/amqrecords.ncm-cron.cron

34 9 * * * root (date --iso-8601=seconds --utc; PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin; LANG=C; unset TZ; date; echo Uploading new accounting data; /usr/local/sbin/ndpf-stompfeeder -v -n -q --dbpassword PASSWORD --dbuser USERNAME --pipe '/usr/local/bin/nikhef-apel' 2>&1 ; date ; echo Accounting upload completed.) >> /var/log/amqrecords.ncm-cron.log 2>&1

on each lcg-CE or GRAM node (with NDPF pbs.in) new: to be installed

45 0 * * * root (date --iso-8601=seconds --utc; PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin; /usr/local/sbin/ndpf-cejoiner.pl /var/log/messages /var/log/messages.1 /var/log/messages.1.gz) >> /var/log/cejoiner.ncm-cron.log 2>&1

on each CREAM CE node new: to be installed, the last 5 days of accounting data is arbitrary ... but yoiu need the max queue+run time at least. Also keep in mind that if you touch the directory, the ls -1tr command no longer gives you the latest n files!

45 0 * * * root (date --iso-8601=seconds --utc; PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin; /usr/local/sbin/ndpf-cejoiner.pl `ls -1tr /opt/glite/var/log/accounting/* | tail -5`) >> /var/log/cejoiner.ncm-cron.log 2>&1

Using the information in other ways

to get CVS files for wLCG manglement

The PHP script at

http://www.nikhef.nl/grid/accuse/

gets the data directly from the NDPF accounting database on fulda. It is exclusively geared towards wLCG management and cannot do much more. For better control, install the ndpf-acctfeeder package and use the accuse command.

to view graphs

The graphs on http://www.nikhef.nl/grid/stats/ndpf-prd/voview-short are generated completely independent from the account database using perl scripts from cron on ribble/naab. They are orthogonal and don't cross.

to view the GOC accounting status

The accounting grantt chart at CESGA is a good point to see if the data is still getting across. Q uick view is, for example:

http://www3.egee.cesga.es/gridsite/accounting/CESGA/gantt.php?option=ROC&optval=1.8&sYear=2008&sMonth=10&eYear=2009&eMonth=12&type=Production&tree=TIER1

with the full interface at http://goc.grid-support.ac.uk/gridsite/accounting/

to get accurate statistics

use the accuse command on bosui, or any other system where ndpf-acctfeeder is installed. Add -h to get a lot of help

to get the data for the VL-e accountant

Use /user/davidg/bin/vle-usage on any ikonet/hefnet host with the mysqql client commands installed.

Database Schema

The schema for the accounting database (for Nikhef at bedstee.nikhef.nl) looks like the graph shown on the right.

Description

The SQL sources are at https://ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.tools/ndpf-acctfeeder/accountingdb-IDextend.sql

The new schema allows for arbitrary credentials to be stored, and data collected from multiple sources, but the new tables like pubIdent and ingressPoint need a collector/aggregator be be run on each and every CE, over all log records written at the queueTime moment, for all jobs that are already in the database based on the endTime, i.e. recorded when the job has actually completed and the record uploaded from the Torque server.

The scripts to do that are in the save SubVersion repository as the database scema. In particular, ndpf-cejoiner.pl can run over both NDPF-style JobManager/pbs.om logs written in /var/log/messages (daemon.info), as well as the CREAM accounting files in /opt/glite/var/log/accounting/. The format is inferred automatically, and it can process gzipped files as well as regular text.

Uploading and registering data

Merging CE ingress information

Additional information is available only on the CE, such as the name of the ingress point (obviously), but also public identity information such as the user DN as seen on the CE, the primary FQAN, etc. For the subjectDN, this provides an alternative to the userid.commonName which is reverse-engineered from the gridmapdir.

help on ndpf-cejoiner.pl

Usage: ./ndpf-cejoiner.pl [-s|--start starttime:Y-m-d] [-e|--end endtime:Y-m-d]
  [-h] [-v] [-c|--config configfile] [-n|--dry]
  [--dbhost hostname] [--dbport port] [--dbname name] [--dbuser name]
  [--dbpassword password] [--pbsserver hostname] [--facility facilityname]
  [--ingressdefault defaultingresspointname]
  [--ingressname|--ceid fixedingresspointname]
  logfile ...

Merge job identity data logged on the CE with existing job records
in the accounting database, using the StartTime, facility name and the
jobID to find the matching job entry and then link it up with a new
or existing pubIdentity record using alinkage table
It will process the log files given on the command line (inasfar as they
exist), but only between <starttime> and <endtime> inclusive, and update
the database with these identities where possible.

  starttime    Y-m-d [HH:MM:SS]       jobs ended from this time onwards
               default: 2010-11-02 00:00:00
  endtime      Y-m-d [HH:MM:SS]       jobs ended up to this time
               default: 2010-11-03 00:00:00
  dbhost       hostname               hostname of the database server
               default: bedstee.nikhef.nl
  dbport       portnumber             portnumber of the database server
               default: 3306
  dbname       name                   name of the accouting database
               default: accounting
  dbuser       name                   username to access the acc database
               default: accounter
  dbpassword   string                 password for the database
               default: do NOT use the command line, use a secured config

  facility     string                 name of the NDPF facility
               default: lcg2elprod

  ingressdefault  string              name of this ingress point if no
                                      explicit name is found
                  default: bosui.nikhef.nl:2119/jobmanager-pbs (for lcg-CE)
                  default: bosui.nikhef.nl  (all other ingress points)
  ingressname  string                 force ingressname to be <name>
               default: none

The <logfile> can be either plain-text line oriented, or a gzipped log
file. The log file is parsed for lines matching an LRMS registration and
associated user subject and group information.
The log file can be either a NDPF-enhanced syslog file containing the
tokens produced by the JobManager/pbs.pm management script, or a CREAM CE
accounting file (from /opt/glite/var/log/accounting/)

Lines in these log files pertaining to jobs not in the database, or
to jobs that have not ended within the time window specified, will
be ignored. This means that you MUST iterate over a few days(?) worth
of historic CE accounting logs, since the association information is
usually written on job queue time, not the job end time.
Also, this script must be run *after* the job accounting records
have been published ot the database from the LRMS server itself (i.e.
after you have run pbsaccdb.pl on the LRMS server).

By default, the time window is the whole of the previous calender day.

@@ Line 160: / Line 160: @@
 There are three cron jobs:
-; vlaai : installed automatically from the managepoolmap-1.1-2 package in
+; vlaai : installed automatically from the managepoolmap-1.1-2 package in (when missing, the ndpf-acctfeeder will try and instantly create a map file as well)
 23 * * * root    MAX_AGE=100 CLEANING=1 /usr/local/bin/poolmaprun

Difference between revisions of "NDPFAccounting"