Difference between revisions of "Historical WN information"

From PDP/Grid Wiki
Jump to navigationJump to search
(Created page with "This page resulted from a desire to look at historical trends of purchases, replacements, and node failures in the NDPF. The raw data is present in the NDPF accounting databa...")
 
Line 1: Line 1:
 +
== Introduction ==
 +
 
This page resulted from a desire to look at historical trends of purchases, replacements, and node failures in the NDPF.  The raw data is present in the NDPF accounting database; in order to get the data into a form suitable for real analysis, some cleaning needed to be done: during the early days of the NDPF we had various node-naming conventions, a naive SQL query will yield multiple answers for the same physical node.  This page describes the process of producing the desired dataset, which is included at the end of the page.
 
This page resulted from a desire to look at historical trends of purchases, replacements, and node failures in the NDPF.  The raw data is present in the NDPF accounting database; in order to get the data into a form suitable for real analysis, some cleaning needed to be done: during the early days of the NDPF we had various node-naming conventions, a naive SQL query will yield multiple answers for the same physical node.  This page describes the process of producing the desired dataset, which is included at the end of the page.
 +
 +
== SQL query ==
 +
 +
The basic task is, for each worker node ever seen, to find out the first time a job was ever run on it, and the last time a job was ever seen on it.  The way to make a FOR loop (for each worker node) in SQL is an inner join.  The "first time" query:
 +
 +
  mysql -u dbuser --password="dbpasswd" -h dbhost accounting -e "select t1.Nodelist,t1.EndTime from job t1 INNER JOIN (select Nodelist, MIN(EndTime) AS MaxDT from job where CPUcount=1 group by Nodelist) groupedt1 ON t1.Nodelist=groupedt1.Nodelist and t1.EndTime=groupedt1.MaxDT;" > first-raw.txt

Revision as of 09:54, 9 September 2014

Introduction

This page resulted from a desire to look at historical trends of purchases, replacements, and node failures in the NDPF. The raw data is present in the NDPF accounting database; in order to get the data into a form suitable for real analysis, some cleaning needed to be done: during the early days of the NDPF we had various node-naming conventions, a naive SQL query will yield multiple answers for the same physical node. This page describes the process of producing the desired dataset, which is included at the end of the page.

SQL query

The basic task is, for each worker node ever seen, to find out the first time a job was ever run on it, and the last time a job was ever seen on it. The way to make a FOR loop (for each worker node) in SQL is an inner join. The "first time" query:

 mysql -u dbuser --password="dbpasswd" -h dbhost accounting -e "select t1.Nodelist,t1.EndTime from job t1 INNER JOIN (select Nodelist, MIN(EndTime) AS MaxDT from job where CPUcount=1 group by Nodelist) groupedt1 ON t1.Nodelist=groupedt1.Nodelist and t1.EndTime=groupedt1.MaxDT;" > first-raw.txt