Difference between revisions of "Dans Data Compress"
Jump to navigation
Jump to search
Line 16: | Line 16: | ||
After the jobs have been submitted to the grid you can track the status of these jobs using the '<tt>job-status</tt>' script: | After the jobs have been submitted to the grid you can track the status of these jobs using the '<tt>job-status</tt>' script: | ||
$ ./job-status | $ ./job-status | ||
− | + | 00143 https://wms1.grid.sara.nl:9000/pMAeF4iD5K2lb-pVCnJ3cg Status=Running | |
− | + | 00144 https://wms1.grid.sara.nl:9000/Ti3gKeBMdsyA8sbTkSf-Ww Status=Running | |
− | + | 00145 https://wms1.grid.sara.nl:9000/EFtjzbMx9exvZqb8nJRLSw Status=Running | |
− | + | 00146 https://wms1.grid.sara.nl:9000/qffuMRJbGLhasmTvMBJ4hA Status=Running | |
− | + | 00147 https://wms1.grid.sara.nl:9000/1J3mYfFmyewSvmZiz9r9-w Status=Running | |
− | + | 00148 https://wms1.grid.sara.nl:9000/pTC_FRSp-qGQXJGumA7QgQ Status=Running | |
− | + | ||
+ | ===Notes=== | ||
+ | * the order in which the job are executed on the grid is not necessarily the same as the order in which they are submitted. | ||
+ | * the grid job ids, starting with https://, look like URLs and that's exactly what they are. The user who submits the job can view the status of that job using a webbrowser, provided that the user's grid certificate is installed in that browser. | ||
+ | |||
See the section [[DANS Job Scripts]] for more details on both the '<tt>compress-tar</tt>' and the '<tt>job-status</tt>' scripts. | See the section [[DANS Job Scripts]] for more details on both the '<tt>compress-tar</tt>' and the '<tt>job-status</tt>' scripts. | ||
Revision as of 09:32, 13 November 2012
After the tarballs have been uploaded to the grid, the next step is to compress the tarballs to so-called .tar.gz files to save space. This compression step is done on the grid itself, hence we need to submit a set of jobs to the grid. The 'compress-tar' script does this automatically.
$ ./compress-tar Found 84 tar balls in /grid/dans/soundbites Splitting into 6 jobs, start=1, end=84 Delegating proxy Submitting DANS job 143: https://wms1.grid.sara.nl:9000/pMAeF4iD5K2lb-pVCnJ3cg Submitting DANS job 144: https://wms1.grid.sara.nl:9000/Ti3gKeBMdsyA8sbTkSf-Ww Submitting DANS job 145: https://wms1.grid.sara.nl:9000/EFtjzbMx9exvZqb8nJRLSw Submitting DANS job 146: https://wms1.grid.sara.nl:9000/qffuMRJbGLhasmTvMBJ4hA Submitting DANS job 147: https://wms1.grid.sara.nl:9000/1J3mYfFmyewSvmZiz9r9-w Submitting DANS job 148: https://wms1.grid.sara.nl:9000/pTC_FRSp-qGQXJGumA7QgQ
The 'soundbites' archive consists of 84 tarballs which need to be compressed. Each gridjob will compress 15 tarballs, hence a total of 6 jobs were submitted. After the jobs have been submitted to the grid you can track the status of these jobs using the 'job-status' script:
$ ./job-status 00143 https://wms1.grid.sara.nl:9000/pMAeF4iD5K2lb-pVCnJ3cg Status=Running 00144 https://wms1.grid.sara.nl:9000/Ti3gKeBMdsyA8sbTkSf-Ww Status=Running 00145 https://wms1.grid.sara.nl:9000/EFtjzbMx9exvZqb8nJRLSw Status=Running 00146 https://wms1.grid.sara.nl:9000/qffuMRJbGLhasmTvMBJ4hA Status=Running 00147 https://wms1.grid.sara.nl:9000/1J3mYfFmyewSvmZiz9r9-w Status=Running 00148 https://wms1.grid.sara.nl:9000/pTC_FRSp-qGQXJGumA7QgQ Status=Running
Notes
- the order in which the job are executed on the grid is not necessarily the same as the order in which they are submitted.
- the grid job ids, starting with https://, look like URLs and that's exactly what they are. The user who submits the job can view the status of that job using a webbrowser, provided that the user's grid certificate is installed in that browser.
See the section DANS Job Scripts for more details on both the 'compress-tar' and the 'job-status' scripts.
Important notes
- The tarballs need to be compressed only once. It is done on the grid because it is much faster to do it that way, instead of on the DANS dataserver.
- After the .tar.gz files have been created and have been verified (see "Phase 3" for more details) the original tarballs need to be deleted from the grid storage. This is not done automatically. An effective commandline to delete all files named '.tar' from a single directory on the LFC is
lfc-ls /grid/dans/$ARCHIVE | grep ".tar$" > tarball-list lcg-del -a `cat tarball-list` CHECK!
The 'lcg-del' command can take quite some time to complete.