Difference between revisions of "Dans Data Verify"

From BiGGrid Wiki
Jump to navigation Jump to search
Line 45: Line 45:
  
 
===Comparing the checksums===
 
===Comparing the checksums===
 +
After all jobs submitted by the '<tt>check-tar</tt>' script have successfully completed you can compare the checksums of the files found on the grid against the checksums of the local files. The local checksums were calculated before the files were uploaded to the grid, as part of the [[Dans_Data_Upload|Data Upload]] procedure.
 +
 +
$ ./compare-checksums
 +
Scanning for DANS 'soundbites' jobs:
 +
00167: check-archive.sh "soundbites    1    8" Comparing md5sums: Equal
 +
00169: check-archive.sh "soundbites    9  16" Comparing md5sums: Equal
 +
00170: check-archive.sh "soundbites  17  24" Comparing md5sums: Equal
 +
00171: check-archive.sh "soundbites  25  32" Comparing md5sums: Equal
 +
00172: check-archive.sh "soundbites  33  40" Comparing md5sums: Equal
 +
00173: check-archive.sh "soundbites  41  48" Comparing md5sums: Equal
 +
00174: check-archive.sh "soundbites  49  56" Comparing md5sums: Equal
 +
00175: check-archive.sh "soundbites  57  64" Comparing md5sums: Equal
 +
00176: check-archive.sh "soundbites  65  72" Comparing md5sums: Equal
 +
00177: check-archive.sh "soundbites  73  80" Comparing md5sums: Equal
 +
00178: check-archive.sh "soundbites  81  84" Comparing md5sums: Equal

Revision as of 14:44, 15 November 2012

In the step of this DANS workflow the .tar.gz files are checked, to verify that the md5sum checksums of all files contained in the .tar.gz files match the checksums of the files as found on the DANS data server. This verification step is done on the grid itself, hence we need to submit a set of jobs to the grid. The 'check-tar' script does this automatically.

$ ./check-tar
Found 84 tar.gz files in lfc.grid.sara.nl:/grid/dans/soundbites
Splitting into 11 jobs, start=1, end=84
Delegating proxy
Submitting DANS job 156: https://wms1.grid.sara.nl:9000/R5PWohEeStwR4u-3I-ojNA
Submitting DANS job 157  https://wms1.grid.sara.nl:9000/qcHwM00A7wmSWCUfcGhcnQ
Submitting DANS job 158  https://wms1.grid.sara.nl:9000/lxzCOTFcFsXlqXyC3IwYcQ
Submitting DANS job 159  https://wms1.grid.sara.nl:9000/k30pquu-p5mDnf2nW-UW5A
Submitting DANS job 160  https://wms1.grid.sara.nl:9000/ZmdEOuBMesm_3e-L9jBSVg
Submitting DANS job 161  https://wms1.grid.sara.nl:9000/9yNPzoZYU6ykx-8YcOb9PA
Submitting DANS job 162  https://wms1.grid.sara.nl:9000/V1DWV2qbmRe-CiRB5g06YA
Submitting DANS job 163  https://wms1.grid.sara.nl:9000/0GJ7YyfwHC1bTPzabG2LSw
Submitting DANS job 164  https://wms1.grid.sara.nl:9000/vKIqcsPzIt3ugtJTB8qa6Q
Submitting DANS job 165  https://wms1.grid.sara.nl:9000/QQmeZGZr3eOeagGvLfjh7g
Submitting DANS job 166  https://wms1.grid.sara.nl:9000/BSx6PWo3xIqsccVqDb3cPA

The 'soundbites' archive consists of 84 .tar.gz files which need to be checked. Each gridjob will verify 8 tarballs, hence a total of 11 jobs were submitted. After the jobs have been submitted to the grid you can track the status of these jobs using the 'job-status' script:

$ ./job-status
00156: https://wms1.grid.sara.nl:9000/R5PWohEeStwR4u-3I-ojNA         Status=Running
00157  https://wms1.grid.sara.nl:9000/qcHwM00A7wmSWCUfcGhcnQ         Status=Running
00158  https://wms1.grid.sara.nl:9000/lxzCOTFcFsXlqXyC3IwYcQ         Status=Running
00159  https://wms1.grid.sara.nl:9000/k30pquu-p5mDnf2nW-UW5A         Status=Running
00160  https://wms1.grid.sara.nl:9000/ZmdEOuBMesm_3e-L9jBSVg         Status=Running
00161  https://wms1.grid.sara.nl:9000/9yNPzoZYU6ykx-8YcOb9PA         Status=Running
00162  https://wms1.grid.sara.nl:9000/V1DWV2qbmRe-CiRB5g06YA         Status=Running
00163  https://wms1.grid.sara.nl:9000/0GJ7YyfwHC1bTPzabG2LSw         Status=Running
00164  https://wms1.grid.sara.nl:9000/vKIqcsPzIt3ugtJTB8qa6Q         Status=Running
00165  https://wms1.grid.sara.nl:9000/QQmeZGZr3eOeagGvLfjh7g         Status=Running
00166  https://wms1.grid.sara.nl:9000/BSx6PWo3xIqsccVqDb3cPA         Status=Running

The grid job ids, starting with https://, look like URLs and that's exactly what they are. The user who submits the job can view the status of that job using a webbrowser, provided that the user's grid certificate is installed in that browser.

See the section DANS Job Scripts for more details on both the 'check-tar' and the 'job-status' scripts.

Job output

When a grid job is finished the 'job-status' script automatically retrieves the output:

00166  https://wms1.grid.sara.nl:9000/BSx6PWo3xIqsccVqDb3cPA         Status=Done (Exit code=0)
       Retrieving job output into $HOME/dans/gridjobs/00166/output

The status message 'Done (Exit code=0)' means that the job ran successfully and returned an exit code 0, which indicates success.

Wait for all jobs to complete successfully before continuing to the next step.

Comparing the checksums

After all jobs submitted by the 'check-tar' script have successfully completed you can compare the checksums of the files found on the grid against the checksums of the local files. The local checksums were calculated before the files were uploaded to the grid, as part of the Data Upload procedure.

$ ./compare-checksums 
Scanning for DANS 'soundbites' jobs:
00167: check-archive.sh "soundbites    1    8" Comparing md5sums: Equal
00169: check-archive.sh "soundbites    9   16" Comparing md5sums: Equal
00170: check-archive.sh "soundbites   17   24" Comparing md5sums: Equal
00171: check-archive.sh "soundbites   25   32" Comparing md5sums: Equal
00172: check-archive.sh "soundbites   33   40" Comparing md5sums: Equal
00173: check-archive.sh "soundbites   41   48" Comparing md5sums: Equal
00174: check-archive.sh "soundbites   49   56" Comparing md5sums: Equal
00175: check-archive.sh "soundbites   57   64" Comparing md5sums: Equal
00176: check-archive.sh "soundbites   65   72" Comparing md5sums: Equal
00177: check-archive.sh "soundbites   73   80" Comparing md5sums: Equal
00178: check-archive.sh "soundbites   81   84" Comparing md5sums: Equal