Historical WN information

From PDP/Grid Wiki
Jump to navigationJump to search

Introduction

This page resulted from a desire to look at historical trends of purchases, replacements, and node failures in the NDPF. The raw data is present in the NDPF accounting database; in order to get the data into a form suitable for real analysis, some cleaning needed to be done: during the early days of the NDPF we had various node-naming conventions, a naive SQL query will yield multiple answers for the same physical node. This page describes the process of producing the desired dataset, which is included at the end of the page.

SQL query

The basic task is, for each worker node ever seen, to find out the first time a job was ever run on it, and the last time a job was ever seen on it. The way to make a FOR loop (for each worker node) in SQL is an inner join. The "first time" query:

mysql -u dbuser --password="dbpasswd" -h dbhost accounting -e
    "select t1.Nodelist,t1.StartTime from job t1 INNER JOIN (
     select Nodelist, MIN(StartTime) AS MinDate from job where CPUcount=1 group by Nodelist
     )
     groupedt1 ON t1.Nodelist=groupedt1.Nodelist and t1.StartTime=groupedt1.MinDate;" > first-raw.txt

The "where CPUcount=1" gets rid of a lot of noise caused by multi-node jobs; "wn-class1-04/3+wn-class2-14/7" is a valid Nodelist (and has CPUcount>1). Even though not all combinations have actually been observed, there are still about 90k distinct Nodelist values in the database as of September 2014.

The "last time" query:

mysql -u dbuser --password="dbpasswd" -h dbhost accounting -e
    "select t1.Nodelist,t1.EndTime from job t1 INNER JOIN (
     select Nodelist, MAX(EndTime) AS MaxDate from job where CPUcount=1 group by Nodelist
     )
     groupedt1 ON t1.Nodelist=groupedt1.Nodelist and t1.EndTime=groupedt1.MaxDate;" > last-raw.txt

Cleaning the Data

Cleaning the data was nontrivial:

  • "node1" class nodes (D0 MC nodes) appeared both with and without ".nikhef.nl" on the end.
  • Halloweens appeared both as node15-NN.farmnet.nikhef.nl and wn-hall-0NN.farm.nikhef.nl
  • "node16" class nodes (GFRC/NCF/AMD) appeared both with and without ".nikhef.nl" on the end.
  • Bulldozers appeared both as node17-NN.farmnet.nikhef.nl and wn-bull-0NN.farm.nikhef.nl
  • "node18" class nodes (pizza0) appeared variously as node18-N, node18-N.farmnet, and node18-N.farmnet.nikhef.nl
  • Some nodes have been used as tests only (e.g. boszwijn)
  • The database claims that a significant number of nodes (of various classes, St. Maarten being the worst) ran their first job on New Year's Day 1970.

That last bit can be fixed by a slight modification of the "last time" query:

mysql -u dbuser --password="dbpasswd" -h dbhost accounting -e
    "select t1.Nodelist,t1.StartTime from job t1 INNER JOIN (
     select Nodelist, MIN(StartTime) AS MinDate from job where (CPUcount=1 and StartTime>'2001-01-01') group by Nodelist
     )
     groupedt1 ON t1.Nodelist=groupedt1.Nodelist and t1.StartTime=groupedt1.MinDate;"