Difference between revisions of "Dans Data Verify"
(Created page with "In the step of this DANS workflow the .tar.gz files are checked, to verify that the md5sum checksums of all files contained in the .tar.gz files match the checksums of the files ...") |
|||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | In the step of this DANS workflow the .tar.gz files are checked, to verify that the md5sum checksums of all files contained in the .tar.gz files match the checksums of the files as found on the DANS data server. This verification step is done on the grid itself, hence we need to submit a set of jobs to the grid. The 'check-tar' script does this automatically. | + | In the step of this DANS workflow the .tar.gz files are checked, to verify that the md5sum checksums of all files contained in the .tar.gz files match the checksums of the files as found on the DANS data server. This verification step is done on the grid itself, hence we need to submit a set of jobs to the grid. The 'check-tar' script does this automatically. Before starting either the '<tt>compress-tar</tt>' script or the '<tt>check-tar</tt>' script for the first time a special ''gridjobs'' directory needs to be created. The default location for this directory is |
+ | $HOME/dans/gridjobs | ||
+ | so a | ||
+ | $ mkdir -p $HOME/dans/gridjobs | ||
+ | is sufficient. Next, launch the '<tt>check-tar</tt>' script: | ||
$ ./check-tar | $ ./check-tar | ||
Found 84 tar.gz files in lfc.grid.sara.nl:/grid/dans/soundbites | Found 84 tar.gz files in lfc.grid.sara.nl:/grid/dans/soundbites | ||
Splitting into 11 jobs, start=1, end=84 | Splitting into 11 jobs, start=1, end=84 | ||
Delegating proxy | Delegating proxy | ||
− | Submitting DANS job | + | 00167 Submitting DANS job 67: https://graskant.nikhef.nl:9000/PJFSZ_piRH8WBnVUXUyVAQ |
− | Submitting DANS job | + | 00168 Submitting DANS job 68: https://grasveld.nikhef.nl:9000/rm19k1kWjtXGaGEjcl6AYA |
− | Submitting DANS job | + | 00169 Submitting DANS job 69: https://wms2.grid.sara.nl:9000/dN_QM8ZxIS3JCqa63QmILw |
− | Submitting DANS job | + | 00170 Submitting DANS job 70: https://wms2.grid.sara.nl:9000/n_qLv1K_CNC26JQdarqd-A |
− | Submitting DANS job | + | 00171 Submitting DANS job 71: https://wms2.grid.sara.nl:9000/bS-C2WTuj1nFw7xpSt2hXw |
− | Submitting DANS job | + | 00172 Submitting DANS job 72: https://wms2.grid.sara.nl:9000/Q0cBRixMFsrFgpLyZtK_4g |
− | Submitting DANS job | + | 00173 Submitting DANS job 73: https://wms2.grid.sara.nl:9000/ooUaSiBLargL74brqKeS7w |
− | Submitting DANS job | + | 00174 Submitting DANS job 74: https://wms2.grid.sara.nl:9000/ux8PPNk71_Utf53a8x-_Yg |
− | Submitting DANS job | + | 00175 Submitting DANS job 75: https://wms2.grid.sara.nl:9000/zqBI1QXfOMBKEP7sWDD-aA |
− | Submitting DANS job | + | 00176 Submitting DANS job 76: https://wms2.grid.sara.nl:9000/OtU_naI5EYBilqML81krsA |
− | Submitting DANS job | + | 00177 Submitting DANS job 77: https://wms2.grid.sara.nl:9000/fl05LVqrQS9JzkXsn73R4Q |
− | + | 00178 Submitting DANS job 78: https://wms2.grid.sara.nl:9000/XVlE9d-ufjWwiZc2EyFsDA | |
The 'soundbites' archive consists of 84 .tar.gz files which need to be checked. Each gridjob will verify 8 tarballs, hence a total of 11 jobs were submitted. | The 'soundbites' archive consists of 84 .tar.gz files which need to be checked. Each gridjob will verify 8 tarballs, hence a total of 11 jobs were submitted. | ||
After the jobs have been submitted to the grid you can track the status of these jobs using the '<tt>job-status</tt>' script: | After the jobs have been submitted to the grid you can track the status of these jobs using the '<tt>job-status</tt>' script: | ||
+ | $ ./job-status | ||
+ | 00167 https://graskant.nikhef.nl:9000/PJFSZ_piRH8WBnVUXUyVAQ Status=Running | ||
+ | 00168 https://grasveld.nikhef.nl:9000/rm19k1kWjtXGaGEjcl6AYA Status=Running | ||
+ | 00169 https://wms2.grid.sara.nl:9000/dN_QM8ZxIS3JCqa63QmILw Status=Running | ||
+ | 00170 https://wms2.grid.sara.nl:9000/n_qLv1K_CNC26JQdarqd-A Status=Running | ||
+ | 00171 https://wms2.grid.sara.nl:9000/bS-C2WTuj1nFw7xpSt2hXw Status=Running | ||
+ | 00172 https://wms2.grid.sara.nl:9000/Q0cBRixMFsrFgpLyZtK_4g Status=Running | ||
+ | 00173 https://wms2.grid.sara.nl:9000/ooUaSiBLargL74brqKeS7w Status=Running | ||
+ | 00174 https://wms2.grid.sara.nl:9000/ux8PPNk71_Utf53a8x-_Yg Status=Running | ||
+ | 00175 https://wms2.grid.sara.nl:9000/zqBI1QXfOMBKEP7sWDD-aA Status=Running | ||
+ | 00176 https://wms2.grid.sara.nl:9000/OtU_naI5EYBilqML81krsA Status=Running | ||
+ | 00177 https://wms2.grid.sara.nl:9000/fl05LVqrQS9JzkXsn73R4Q Status=Running | ||
+ | 00178 https://wms2.grid.sara.nl:9000/XVlE9d-ufjWwiZc2EyFsDA Status=Running | ||
+ | The grid job ids, starting with https://, look like URLs and that's exactly what they are. The user who submits the job can view the status of that job using a webbrowser, provided that the user's grid certificate is installed in that browser. | ||
+ | |||
+ | See the section [[DANS Job Scripts]] for more details on both the '<tt>check-tar</tt>' and the '<tt>job-status</tt>' scripts. | ||
+ | |||
+ | ===Job output=== | ||
+ | When a grid job is finished the '<tt>job-status</tt>' script automatically retrieves the output: | ||
+ | 00172 https://wms2.grid.sara.nl:9000/Q0cBRixMFsrFgpLyZtK_4g Status=Done (Exit code=0) | ||
+ | Retrieving job output into $HOME/dans/gridjobs/00172/output | ||
+ | The status message 'Done (Exit code=0)' means that the job ran successfully and returned an exit code 0, which indicates success. | ||
+ | |||
+ | Wait for all jobs to complete successfully before continuing to the next step. | ||
+ | |||
+ | ===Comparing the checksums=== | ||
+ | After all jobs submitted by the '<tt>check-tar</tt>' script have successfully completed you can compare the checksums of the files found on the grid against the checksums of the local files. The local checksums were calculated before the files were uploaded to the grid, as part of the [[Dans_Data_Upload|Data Upload]] procedure. | ||
+ | $ ./compare-checksums | ||
+ | Scanning for DANS 'soundbites' jobs: | ||
+ | 00167: check-archive.sh "soundbites 1 8" Comparing md5sums: Equal | ||
+ | 00169: check-archive.sh "soundbites 9 16" Comparing md5sums: Equal | ||
+ | 00170: check-archive.sh "soundbites 17 24" Comparing md5sums: Equal | ||
+ | 00171: check-archive.sh "soundbites 25 32" Comparing md5sums: Equal | ||
+ | 00172: check-archive.sh "soundbites 33 40" Comparing md5sums: Equal | ||
+ | 00173: check-archive.sh "soundbites 41 48" Comparing md5sums: Equal | ||
+ | 00174: check-archive.sh "soundbites 49 56" Comparing md5sums: Equal | ||
+ | 00175: check-archive.sh "soundbites 57 64" Comparing md5sums: Equal | ||
+ | 00176: check-archive.sh "soundbites 65 72" Comparing md5sums: Equal | ||
+ | 00177: check-archive.sh "soundbites 73 80" Comparing md5sums: Equal | ||
+ | 00178: check-archive.sh "soundbites 81 84" Comparing md5sums: Equal | ||
+ | Found 11 DANS jobs | ||
+ | This output shows that the MD5 checksums for all files found in the 'soundbites' archive on the grid are equal to the checksums that were generated when this archive was uploaded for the first time. |
Latest revision as of 09:06, 28 November 2012
In the step of this DANS workflow the .tar.gz files are checked, to verify that the md5sum checksums of all files contained in the .tar.gz files match the checksums of the files as found on the DANS data server. This verification step is done on the grid itself, hence we need to submit a set of jobs to the grid. The 'check-tar' script does this automatically. Before starting either the 'compress-tar' script or the 'check-tar' script for the first time a special gridjobs directory needs to be created. The default location for this directory is
$HOME/dans/gridjobs
so a
$ mkdir -p $HOME/dans/gridjobs
is sufficient. Next, launch the 'check-tar' script:
$ ./check-tar Found 84 tar.gz files in lfc.grid.sara.nl:/grid/dans/soundbites Splitting into 11 jobs, start=1, end=84 Delegating proxy 00167 Submitting DANS job 67: https://graskant.nikhef.nl:9000/PJFSZ_piRH8WBnVUXUyVAQ 00168 Submitting DANS job 68: https://grasveld.nikhef.nl:9000/rm19k1kWjtXGaGEjcl6AYA 00169 Submitting DANS job 69: https://wms2.grid.sara.nl:9000/dN_QM8ZxIS3JCqa63QmILw 00170 Submitting DANS job 70: https://wms2.grid.sara.nl:9000/n_qLv1K_CNC26JQdarqd-A 00171 Submitting DANS job 71: https://wms2.grid.sara.nl:9000/bS-C2WTuj1nFw7xpSt2hXw 00172 Submitting DANS job 72: https://wms2.grid.sara.nl:9000/Q0cBRixMFsrFgpLyZtK_4g 00173 Submitting DANS job 73: https://wms2.grid.sara.nl:9000/ooUaSiBLargL74brqKeS7w 00174 Submitting DANS job 74: https://wms2.grid.sara.nl:9000/ux8PPNk71_Utf53a8x-_Yg 00175 Submitting DANS job 75: https://wms2.grid.sara.nl:9000/zqBI1QXfOMBKEP7sWDD-aA 00176 Submitting DANS job 76: https://wms2.grid.sara.nl:9000/OtU_naI5EYBilqML81krsA 00177 Submitting DANS job 77: https://wms2.grid.sara.nl:9000/fl05LVqrQS9JzkXsn73R4Q 00178 Submitting DANS job 78: https://wms2.grid.sara.nl:9000/XVlE9d-ufjWwiZc2EyFsDA
The 'soundbites' archive consists of 84 .tar.gz files which need to be checked. Each gridjob will verify 8 tarballs, hence a total of 11 jobs were submitted. After the jobs have been submitted to the grid you can track the status of these jobs using the 'job-status' script:
$ ./job-status 00167 https://graskant.nikhef.nl:9000/PJFSZ_piRH8WBnVUXUyVAQ Status=Running 00168 https://grasveld.nikhef.nl:9000/rm19k1kWjtXGaGEjcl6AYA Status=Running 00169 https://wms2.grid.sara.nl:9000/dN_QM8ZxIS3JCqa63QmILw Status=Running 00170 https://wms2.grid.sara.nl:9000/n_qLv1K_CNC26JQdarqd-A Status=Running 00171 https://wms2.grid.sara.nl:9000/bS-C2WTuj1nFw7xpSt2hXw Status=Running 00172 https://wms2.grid.sara.nl:9000/Q0cBRixMFsrFgpLyZtK_4g Status=Running 00173 https://wms2.grid.sara.nl:9000/ooUaSiBLargL74brqKeS7w Status=Running 00174 https://wms2.grid.sara.nl:9000/ux8PPNk71_Utf53a8x-_Yg Status=Running 00175 https://wms2.grid.sara.nl:9000/zqBI1QXfOMBKEP7sWDD-aA Status=Running 00176 https://wms2.grid.sara.nl:9000/OtU_naI5EYBilqML81krsA Status=Running 00177 https://wms2.grid.sara.nl:9000/fl05LVqrQS9JzkXsn73R4Q Status=Running 00178 https://wms2.grid.sara.nl:9000/XVlE9d-ufjWwiZc2EyFsDA Status=Running
The grid job ids, starting with https://, look like URLs and that's exactly what they are. The user who submits the job can view the status of that job using a webbrowser, provided that the user's grid certificate is installed in that browser.
See the section DANS Job Scripts for more details on both the 'check-tar' and the 'job-status' scripts.
Job output
When a grid job is finished the 'job-status' script automatically retrieves the output:
00172 https://wms2.grid.sara.nl:9000/Q0cBRixMFsrFgpLyZtK_4g Status=Done (Exit code=0) Retrieving job output into $HOME/dans/gridjobs/00172/output
The status message 'Done (Exit code=0)' means that the job ran successfully and returned an exit code 0, which indicates success.
Wait for all jobs to complete successfully before continuing to the next step.
Comparing the checksums
After all jobs submitted by the 'check-tar' script have successfully completed you can compare the checksums of the files found on the grid against the checksums of the local files. The local checksums were calculated before the files were uploaded to the grid, as part of the Data Upload procedure.
$ ./compare-checksums Scanning for DANS 'soundbites' jobs: 00167: check-archive.sh "soundbites 1 8" Comparing md5sums: Equal 00169: check-archive.sh "soundbites 9 16" Comparing md5sums: Equal 00170: check-archive.sh "soundbites 17 24" Comparing md5sums: Equal 00171: check-archive.sh "soundbites 25 32" Comparing md5sums: Equal 00172: check-archive.sh "soundbites 33 40" Comparing md5sums: Equal 00173: check-archive.sh "soundbites 41 48" Comparing md5sums: Equal 00174: check-archive.sh "soundbites 49 56" Comparing md5sums: Equal 00175: check-archive.sh "soundbites 57 64" Comparing md5sums: Equal 00176: check-archive.sh "soundbites 65 72" Comparing md5sums: Equal 00177: check-archive.sh "soundbites 73 80" Comparing md5sums: Equal 00178: check-archive.sh "soundbites 81 84" Comparing md5sums: Equal Found 11 DANS jobs
This output shows that the MD5 checksums for all files found in the 'soundbites' archive on the grid are equal to the checksums that were generated when this archive was uploaded for the first time.