Dans Data Compress
After the tarballs have been uploaded to the grid, the next step is to compress the tarballs to so-called .tar.gz files to save space. This compression step is done on the grid itself, hence we need to submit a set of jobs to the grid. The 'compress-tar' script does this automatically.
$ ./compress-tar Found 84 tar balls in /grid/dans/soundbites Splitting into 6 jobs, start=1, end=84 Delegating proxy Submitting DANS job 149: https://wms1.grid.sara.nl:9000/5To794h9GaRL-mPH9E7TpQ Submitting DANS job 150: https://wms1.grid.sara.nl:9000/W9YklyQ6MFeKsvzeHBVZXg Submitting DANS job 151: https://wms1.grid.sara.nl:9000/bmcj7Ja548EAHT4NmrNnPg Submitting DANS job 152: https://wms1.grid.sara.nl:9000/O0BB1AQuQQ8llmYEmAdIZQ Submitting DANS job 153: https://wms1.grid.sara.nl:9000/UdxeMCitwmPKKhoVXcB_ug Submitting DANS job 154: https://wms1.grid.sara.nl:9000/LlxU9gTVTMvxi2sSjsqr9g
The 'soundbites' archive consists of 84 tarballs which need to be compressed. Each gridjob will compress 15 tarballs, hence a total of 6 jobs were submitted. After the jobs have been submitted to the grid you can track the status of these jobs using the 'job-status' script:
$ ./job-status 00149 https://wms1.grid.sara.nl:9000/5To794h9GaRL-mPH9E7TpQ Status=Running 00150 https://wms1.grid.sara.nl:9000/W9YklyQ6MFeKsvzeHBVZXg Status=Running 00151 https://wms1.grid.sara.nl:9000/bmcj7Ja548EAHT4NmrNnPg Status=Running 00152 https://wms1.grid.sara.nl:9000/O0BB1AQuQQ8llmYEmAdIZQ Status=Running 00153 https://wms1.grid.sara.nl:9000/UdxeMCitwmPKKhoVXcB_ug Status=Running 00154 https://wms1.grid.sara.nl:9000/LlxU9gTVTMvxi2sSjsqr9g Status=Running
Notes
- the order in which the job are executed on the grid is not necessarily the same as the order in which they are submitted.
- the grid job ids, starting with https://, look like URLs and that's exactly what they are. The user who submits the job can view the status of that job using a webbrowser, provided that the user's grid certificate is installed in that browser.
See the section DANS Job Scripts for more details on both the 'compress-tar' and the 'job-status' scripts.
Job output
When a grid job is finished the 'job-status' script automatically retrieves the output:
00154 https://wms1.grid.sara.nl:9000/LlxU9gTVTMvxi2sSjsqr9g Status=Done (Exit code=0) Retrieving job output into $HOME/dans/gridjobs/00054/output
The status message 'Done (Exit code=0)' means that the job ran successfully and returned an exit code 0, which indicates success. In the directory '$HOME/dans/gridjobs/00054/output' there are two files, an empty file 'stderr' and the job's output file 'stdout':
2012/11/13-14:47:45 Job start: [soundbites 76 84] Retrieving file lfn://grid/dans/soundbites/soundbites-0076.tar Storing file lfn://grid/dans/soundbites/soundbites-0076.tar.gz guid:f5c00083-554a-48b5-b4c5-645b68b75402 [...] Retrieving file lfn://grid/dans/soundbites/soundbites-0084.tar Storing file lfn://grid/dans/soundbites/soundbites-0084.tar.gz guid:3d90fd6c-0db5-4e65-b769-1aefaccac9dc 2012/11/13-16:21:48 Job end
Important notes
- The tarballs need to be compressed only once. It is done on the grid because it is much faster to do it that way, instead of on the DANS dataserver.
- After the .tar.gz files have been created and have been verified (see "Phase 3" for more details) the original tarballs need to be deleted from the grid storage. This is not done automatically. An effective commandline-set to delete all files named '.tar' from a single directory on the LFC is
$ lfcpath=/grid/dans/$ARCHIVE $ lfc-ls $lfcpath | grep ".tar$" | sed 's|^|$lfcpath|' > tarball-list $ lcg-del -a -f tarball-list
Not the use of the '|' character in the 'sed' command as the separator character, instead of the usual '/' character. This way there is no need to escape the slashes in the LFC path variable.
The 'lcg-del' command can take quite some time to complete.