Difference between revisions of "Dans Data Compress"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
− | After the tarballs have been uploaded to the grid, the next step is to compress the | + | After the tarballs have been uploaded to the grid, the next step is to compress the tarballs to so-called .tar.gz files to save space. This compression step is done |
− | tarballs to so-called .tar.gz files to save space. This compression step is done | ||
on the grid itself, hence we need to submit a set of jobs to the grid. | on the grid itself, hence we need to submit a set of jobs to the grid. | ||
− | The 'compress-tar' script does this automatically | + | The 'compress-tar' script does this automatically. |
− | |||
− | |||
− | |||
− | |||
− | + | $ ./compress-tar | |
− | |||
− | + | <INSERT SAMPLE OUTPUT HERE> | |
− | + | After the jobs have been submitted to the grid you can track the status of these jobs using the '<tt>job-status</tt>' script. | |
− | |||
− | |||
− | |||
− | |||
− | + | See the section [[DANS Job Scripts]] for more details on both the '<tt>compress-tar</tt> and the <tt>job-status</tt> scripts. | |
− | |||
− | |||
− | |||
− | |||
==Important notes== | ==Important notes== |
Revision as of 16:21, 9 November 2012
After the tarballs have been uploaded to the grid, the next step is to compress the tarballs to so-called .tar.gz files to save space. This compression step is done on the grid itself, hence we need to submit a set of jobs to the grid. The 'compress-tar' script does this automatically.
$ ./compress-tar
<INSERT SAMPLE OUTPUT HERE>
After the jobs have been submitted to the grid you can track the status of these jobs using the 'job-status' script.
See the section DANS Job Scripts for more details on both the 'compress-tar and the job-status scripts.
Important notes
- The tarballs need to be compressed only once. It is done on the grid because it is much faster to do it that way, instead of on the DANS dataserver.
- After the .tar.gz files have been created and have been verified (see "Phase 3" for more details) the original tarballs need to be deleted from the grid storage. This is not done automatically. An effective commandline to delete all files named '.tar' from a single directory on the LFC is
lfc-ls /grid/dans/$ARCHIVE | grep ".tar$" > tarball-list lcg-del -a `cat tarball-list` CHECK!
The 'lcg-del' command can take quite some time to complete.