Dans Data Upload
Revision as of 11:52, 10 May 2012 by Janjust@nikhef.nl (talk | contribs) (→How to upload a DANS archive to the grid)
How to upload a DANS archive to the grid
Note: as a convention, commands that need to be typed in are preceded by a UNIX prompt-sign '$' . Output if shown without a preceding prompt sign.
- create a new directory with the name of the archive. As an example we use the 'Crome' archive. We refer to the name of the archive using the environment variable '${ARCHIVE}' :
$ export ARCHIVE=Crome $ mkdir -p ~/dans/${ARCHIVE}
- In this directory create another directory with the same name; this directory will contain the list of files and directories that need to be uploaded
$ cd ~/dans/${ARCHIVE} $ mkdir ${ARCHIVE}
- copy over the scripts from the repository
$ cp -a ~/dans/scripts/* .
- generate a sorted list of files. Note: All further actions are done based on this list!
$ find -L ${ARCHIVE} -type f | sort > ${ARCHIVE}-files.txt
- check the list of files , remove any entries such as '.Trash' folders if desired.
- generate a list of tarball.lst files. Each tarball.lst file contains a subset of entries from the ${ARCHIVE}-files.txt file that , when tarred up into a single .tar file, is roughly 8 GB in size. The output files are named '${ARCHIVE}-<N>.tar.lst' where <N> is a 4 digit counter starting at 1:
$ ./gen-tar-list ${ARCHIVE}-files.txt Crome-0001.tar.lst Crome-0002.tar.lst ... Crome-0072.tar.lst
- Now the final big step: run the 'upload-tar' script, which will
- generate the tarballs
- generate md5 checksums for all files in each tarball
- upload each tarball to the grid.
This script will take a long time to run, depending on how many tarballs there are.
Note: For this step a valid grid proxy is required!
$ ./upload-tar Checksumming tarball contents Generating ${ARCHIVE}-0001.tar Uploading ${ARCHIVE}-0001.tar guid:86f11fb4-9b57-4fae-a787-8019663e248c Moving ${ARCHIVE}-0001.tar.lst and ${ARCHIVE}-0001.tar.md5sum to directory "done" Checksumming tarball contents Generating ${ARCHIVE}-0002.tar Uploading ${ARCHIVE}-0002.tar ... Generating ${ARCHIVE}-0072.tar Uploading ${ARCHIVE}-0072.tar guid:e6781de9-9709-4bf6-a019-b5fa4a1fb3c8 Moving ${ARCHIVE}-0072.tar.lst and ${ARCHIVE}-0072.tar.md5sum to directory "done"
For each tarball that is successfully processed the '${ARCHIVE}-<N>.tar.lst' file is moved to a separate directory 'done' . This way the 'upload-tar' script can be stopped and restarted at will, as it will continue processing '${ARCHIVE}-<N>.tar.lst' files until all have been moved to the 'done' directory.
- Check the contents of the 'done' directory , especially the contents of the '${ARCHIVE}-<N>.tar.md5sum' files:
$ cat done/${ARCHIVE}-0001.tar.md5sum # Crome-0001.tar START 939131fac0d40184b5681e18f7b9856c Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_01/407_0001_01.MP4 03a1c56e97923f083bd554981a555a0f Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_01/407_0001_01.SMI 462e8ebbfe70127e46a9b447a14706f4 Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_01/407_0001_01I01.PPN 581c66b2da8852111d449300926c1d52 Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_01/407_0001_01M01.XML 9d5c4d62b4d410e4e1f44372cefe2132 Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_01/407_0001_01R01.BIM b0ca419552a5bb9004ff1849c7edbb3e Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_02/407_0002_01.MP4 68c36464994cebe84df5ff2b38320a32 Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_02/407_0002_01.SMI 6d72650cd3cfd3481a97af6d1aacfef7 Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_02/407_0002_01I01.PPN 1c1be7c0bd0d11302536663079c0fd69 Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_02/407_0002_01M01.XML 0924e3973a329a0824a400d8ac78d0b8 Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_02/407_0002_01R01.BIM d1d08d0818ffe4c5ae3ebb0d0ea349a0 Crome/crome_0045_zeljko_obradovic/crome_0045_zeljko_obradovic_03/407_0002_02.MP4 # Crome-0001.tar END
By combining all '${ARCHIVE}-0001.tar.md5sum' files a full list of all md5sums can be generated and compared to the output of an 'md5deep' command:
$ for i in done/${ARCHIVE}-*.tar.md5sum ; do grep -v '#' $i ; done | sort > total.md5sum