Difference between revisions of "Dans Data Compress"

From BiGGrid Wiki
Jump to navigation Jump to search
Line 2: Line 2:
 
on the grid itself, hence we need to submit a set of jobs to the grid.  
 
on the grid itself, hence we need to submit a set of jobs to the grid.  
 
The 'compress-tar' script does this automatically.
 
The 'compress-tar' script does this automatically.
 
 
  $ ./compress-tar
 
  $ ./compress-tar
 
+
Found 84 tar balls in /grid/dans/soundbites
<INSERT SAMPLE OUTPUT HERE>
+
Splitting into 6 jobs, start=1, end=84
 
+
Delegating proxy
After the jobs have been submitted to the grid you can track the status of these jobs using the '<tt>job-status</tt>' script.  
+
Submitting DANS job 137: https://wms3.grid.sara.nl:9000/hZTcpbTVjlz5fk9jwZuaMQ
 
+
Submitting DANS job 138: https://wms3.grid.sara.nl:9000/KG3c7ajdkAirK-0DelH8FQ
See the section [[DANS Job Scripts]] for more details on both the '<tt>compress-tar</tt> and the <tt>job-status</tt> scripts.
+
Submitting DANS job 139: https://wms3.grid.sara.nl:9000/O2vAt-4-qUPpHlm47DqaOg
 +
Submitting DANS job 140: https://wms3.grid.sara.nl:9000/GVZ1kWP_E1mCRVlISMg6WQ
 +
Submitting DANS job 141: https://wms3.grid.sara.nl:9000/tn5EXd2wf_CbVhUyPQF2nw
 +
Submitting DANS job 142: https://wms3.grid.sara.nl:9000/1l50LbjqZOTxREOi9JWopA
 +
The 'soundbites' archive consists of 84 tarballs which need to be compressed. Each gridjob will compress 15 tarballs, hence a total of 6 jobs were submitted.
 +
After the jobs have been submitted to the grid you can track the status of these jobs using the '<tt>job-status</tt>' script:
 +
$ ./job-status
 +
00137  https://wms3.grid.sara.nl:9000/hZTcpbTVjlz5fk9jwZuaMQ        Status=Running
 +
00138  https://wms3.grid.sara.nl:9000/KG3c7ajdkAirK-0DelH8FQ        Status=Scheduled
 +
00139  https://wms3.grid.sara.nl:9000/O2vAt-4-qUPpHlm47DqaOg        Status=Running
 +
00140  https://wms3.grid.sara.nl:9000/GVZ1kWP_E1mCRVlISMg6WQ        Status=Scheduled
 +
00141  https://wms3.grid.sara.nl:9000/tn5EXd2wf_CbVhUyPQF2nw        Status=Running
 +
00142  https://wms3.grid.sara.nl:9000/1l50LbjqZOTxREOi9JWopA        Status=Scheduled
 +
As you can see the order in which the job are executed on the grid is not necessarily the same as the order in which they are submitted.
 +
See the section [[DANS Job Scripts]] for more details on both the '<tt>compress-tar</tt>' and the '<tt>job-status</tt>' scripts.
  
 
==Important notes==
 
==Important notes==

Revision as of 13:21, 12 November 2012

After the tarballs have been uploaded to the grid, the next step is to compress the tarballs to so-called .tar.gz files to save space. This compression step is done on the grid itself, hence we need to submit a set of jobs to the grid. The 'compress-tar' script does this automatically.

$ ./compress-tar
Found 84 tar balls in /grid/dans/soundbites
Splitting into 6 jobs, start=1, end=84
Delegating proxy
Submitting DANS job 137: https://wms3.grid.sara.nl:9000/hZTcpbTVjlz5fk9jwZuaMQ
Submitting DANS job 138: https://wms3.grid.sara.nl:9000/KG3c7ajdkAirK-0DelH8FQ
Submitting DANS job 139: https://wms3.grid.sara.nl:9000/O2vAt-4-qUPpHlm47DqaOg
Submitting DANS job 140: https://wms3.grid.sara.nl:9000/GVZ1kWP_E1mCRVlISMg6WQ
Submitting DANS job 141: https://wms3.grid.sara.nl:9000/tn5EXd2wf_CbVhUyPQF2nw
Submitting DANS job 142: https://wms3.grid.sara.nl:9000/1l50LbjqZOTxREOi9JWopA

The 'soundbites' archive consists of 84 tarballs which need to be compressed. Each gridjob will compress 15 tarballs, hence a total of 6 jobs were submitted. After the jobs have been submitted to the grid you can track the status of these jobs using the 'job-status' script:

$ ./job-status
00137  https://wms3.grid.sara.nl:9000/hZTcpbTVjlz5fk9jwZuaMQ         Status=Running
00138  https://wms3.grid.sara.nl:9000/KG3c7ajdkAirK-0DelH8FQ         Status=Scheduled
00139  https://wms3.grid.sara.nl:9000/O2vAt-4-qUPpHlm47DqaOg         Status=Running
00140  https://wms3.grid.sara.nl:9000/GVZ1kWP_E1mCRVlISMg6WQ         Status=Scheduled
00141  https://wms3.grid.sara.nl:9000/tn5EXd2wf_CbVhUyPQF2nw         Status=Running
00142  https://wms3.grid.sara.nl:9000/1l50LbjqZOTxREOi9JWopA         Status=Scheduled

As you can see the order in which the job are executed on the grid is not necessarily the same as the order in which they are submitted. See the section DANS Job Scripts for more details on both the 'compress-tar' and the 'job-status' scripts.

Important notes

  • The tarballs need to be compressed only once. It is done on the grid because it is much faster to do it that way, instead of on the DANS dataserver.
  • After the .tar.gz files have been created and have been verified (see "Phase 3" for more details) the original tarballs need to be deleted from the grid storage. This is not done automatically. An effective commandline to delete all files named '.tar' from a single directory on the LFC is
    lfc-ls /grid/dans/$ARCHIVE | grep ".tar$" > tarball-list
    lcg-del -a `cat tarball-list`
 CHECK!

The 'lcg-del' command can take quite some time to complete.