Difference between revisions of "DANS Data Management"

From BiGGrid Wiki
Jump to navigation Jump to search
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
==DANS Data Management==
 
==DANS Data Management==
  
One of the main goals of [http://www.dans.knaw.nl DANS] is to to longer-term management of a variety of data. The goal of the DANS Data Management project is to allow DANS to reliably back up their data onto BiG Grid storage resources.  
+
One of the main goals of [http://www.dans.knaw.nl DANS] is to do longer-term management of a variety of data. The goal of the DANS Data Management project is to allow DANS to reliably back up their data onto BiG Grid storage resources.  
  
 
'Reliably' means that the consistency of the data can be verified by DANS engineers at any given time. For this, a set of tools and procedures have been developed to allow DANS engineers to
 
'Reliably' means that the consistency of the data can be verified by DANS engineers at any given time. For this, a set of tools and procedures have been developed to allow DANS engineers to
Line 9: Line 9:
 
* verify the MD5 checksums of all data stored inside the tarballs on the grid
 
* verify the MD5 checksums of all data stored inside the tarballs on the grid
  
===Uploading data===
+
===DANS Workflow phase 1: Uploading data===
  
Data can be uploaded from 'twister11.dans.knaw.nl' using the [[DANS_Upload Upload]] procedure.
+
[[File:DANS_Workflow1.png|640px]]
* create a new directory with the name of the archive. As an example we use the 'Crome' archive:
 
  mkdir -p ~/dans/Crome/
 
* In this directory create another directory with the same name; this directory will contain the list of files and directories that need to be uploaded
 
  cd ~/dans/Crome
 
  mkdir Crome
 
* copy over the scripts from the repository
 
  cp -a ~/dans/scripts/* .
 
  
===Compressing data===
+
Data can be uploaded from the 'DANS archief' using the [[Dans Data Upload]] procedure.
  
 +
===DANS Workflow phase 2: Compressing data===
  
===Verifying data===
+
[[File:DANS_Workflow2.png|640px]]
 +
 
 +
After successfully uploading the data , the data stored on the grid can be compressed using the [[Dans Data Compress]] procedure.
 +
This is primarily done to save disk space on the grid storage infrastructure. It will also help in improving data download speeds under certain circumstances.
 +
 
 +
===DANS Workflow phase 3: Verifying data===
 +
 
 +
[[File:DANS_Workflow3.png|640px]]
 +
 
 +
Periodically the integrity of the data stored on "the grid" needs to be verified. For this, an extensive 'md5sum' verification procedure is available. Read more about it in the [[Dans Data Verify]] procedure.
 +
 
 +
==DANS Job Scripts==
 +
A set of shell scripts was developed for automating the above workflows phases. The latest version of these scripts (including RHEL5 binaries of <tt>adler32sum</tt> and <tt>md5deep</tt> can be downloaded [http://www.nikhef.nl/~janjust/dans/dans-dm-scripts.tar.gz here]. You can find documentation on these scripts on the [[DANS Job Scripts]] page.

Latest revision as of 11:22, 11 April 2013

DANS Data Management

One of the main goals of DANS is to do longer-term management of a variety of data. The goal of the DANS Data Management project is to allow DANS to reliably back up their data onto BiG Grid storage resources.

'Reliably' means that the consistency of the data can be verified by DANS engineers at any given time. For this, a set of tools and procedures have been developed to allow DANS engineers to

  • upload data from DANS to the grid (in 'tarball' format)
  • compress the data stored on the grid
  • verify the MD5 checksums of all data stored inside the tarballs on the grid

DANS Workflow phase 1: Uploading data

DANS Workflow1.png

Data can be uploaded from the 'DANS archief' using the Dans Data Upload procedure.

DANS Workflow phase 2: Compressing data

DANS Workflow2.png

After successfully uploading the data , the data stored on the grid can be compressed using the Dans Data Compress procedure. This is primarily done to save disk space on the grid storage infrastructure. It will also help in improving data download speeds under certain circumstances.

DANS Workflow phase 3: Verifying data

DANS Workflow3.png

Periodically the integrity of the data stored on "the grid" needs to be verified. For this, an extensive 'md5sum' verification procedure is available. Read more about it in the Dans Data Verify procedure.

DANS Job Scripts

A set of shell scripts was developed for automating the above workflows phases. The latest version of these scripts (including RHEL5 binaries of adler32sum and md5deep can be downloaded here. You can find documentation on these scripts on the DANS Job Scripts page.