Difference between revisions of "Dans Data Upload"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
==How to upload a DANS archive to the grid== | ==How to upload a DANS archive to the grid== | ||
− | * create a new directory with the name of the archive. As an example we use the 'Crome' archive. We refer to the name of the archive using the environment variable '$ARCHIVE' : | + | * create a new directory with the name of the archive. As an example we use the 'Crome' archive. We refer to the name of the archive using the environment variable '${ARCHIVE}' : |
$ export ARCHIVE=Crome | $ export ARCHIVE=Crome | ||
− | $ mkdir -p ~/dans/$ARCHIVE | + | $ mkdir -p ~/dans/${ARCHIVE} |
* In this directory create another directory with the same name; this directory will contain the list of files and directories that need to be uploaded | * In this directory create another directory with the same name; this directory will contain the list of files and directories that need to be uploaded | ||
− | $ cd ~/dans/$ARCHIVE | + | $ cd ~/dans/${ARCHIVE} |
− | $ mkdir $ARCHIVE | + | $ mkdir ${ARCHIVE} |
* copy over the scripts from the repository | * copy over the scripts from the repository | ||
$ cp -a ~/dans/scripts/* . | $ cp -a ~/dans/scripts/* . | ||
* generate a '''sorted''' list of files. '''Note''': All further actions are done based on this list! | * generate a '''sorted''' list of files. '''Note''': All further actions are done based on this list! | ||
− | $ find -L $ARCHIVE -type f | sort > ${ARCHIVE}-files.txt | + | $ find -L ${ARCHIVE} -type f | sort > ${ARCHIVE}-files.txt |
* check the list of files , remove any entries such as '.Trash' folders if desired. | * check the list of files , remove any entries such as '.Trash' folders if desired. | ||
− | * generate a list of tarball.lst files. Each tarball.lst file contains a subset of entries from the ${ARCHIVE}-files.txt file that , when tarred up into a single .tar file, is roughly 8 GB in size. The output files are named '$ARCHIVE-<N>.tar.lst' where <N> is a 4 digit counter starting at 1: | + | * generate a list of tarball.lst files. Each tarball.lst file contains a subset of entries from the ${ARCHIVE}-files.txt file that , when tarred up into a single .tar file, is roughly 8 GB in size. The output files are named '${ARCHIVE}-<N>.tar.lst' where <N> is a 4 digit counter starting at 1: |
$ ./gen-tar-list ${ARCHIVE}-files.txt | $ ./gen-tar-list ${ARCHIVE}-files.txt | ||
Crome-0001.tar.lst | Crome-0001.tar.lst | ||
Line 23: | Line 23: | ||
** generate md5 checksums for all files in each tarball | ** generate md5 checksums for all files in each tarball | ||
** upload each tarball to the grid. | ** upload each tarball to the grid. | ||
+ | This script will take a long time to run, depending on how many tarballs there are. | ||
'''Note''': For this step a valid grid proxy is required! | '''Note''': For this step a valid grid proxy is required! | ||
$ ./upload-tar | $ ./upload-tar | ||
Checksumming tarball contents | Checksumming tarball contents | ||
+ | Generating ${ARCHIVE}-0001.tar | ||
+ | ... | ||
+ | |||
+ | For each tarball that is successfully processed the '${ARCHIVE}-<N>.tar.lst' file is moved to a separate directory 'done' . This way the 'upload-tar' script can be stopped and restarted at will, as it will continue processing '${ARCHIVE}-<N>.tar.lst' files until all have been moved to the 'done' directory. |
Revision as of 10:54, 10 May 2012
How to upload a DANS archive to the grid
- create a new directory with the name of the archive. As an example we use the 'Crome' archive. We refer to the name of the archive using the environment variable '${ARCHIVE}' :
$ export ARCHIVE=Crome $ mkdir -p ~/dans/${ARCHIVE}
- In this directory create another directory with the same name; this directory will contain the list of files and directories that need to be uploaded
$ cd ~/dans/${ARCHIVE} $ mkdir ${ARCHIVE}
- copy over the scripts from the repository
$ cp -a ~/dans/scripts/* .
- generate a sorted list of files. Note: All further actions are done based on this list!
$ find -L ${ARCHIVE} -type f | sort > ${ARCHIVE}-files.txt
- check the list of files , remove any entries such as '.Trash' folders if desired.
- generate a list of tarball.lst files. Each tarball.lst file contains a subset of entries from the ${ARCHIVE}-files.txt file that , when tarred up into a single .tar file, is roughly 8 GB in size. The output files are named '${ARCHIVE}-<N>.tar.lst' where <N> is a 4 digit counter starting at 1:
$ ./gen-tar-list ${ARCHIVE}-files.txt Crome-0001.tar.lst Crome-0002.tar.lst ... Crome-0072.tar.lst
- Now the final big step: run the 'upload-tar' script, which will
- generate the tarballs
- generate md5 checksums for all files in each tarball
- upload each tarball to the grid.
This script will take a long time to run, depending on how many tarballs there are. Note: For this step a valid grid proxy is required!
$ ./upload-tar Checksumming tarball contents Generating ${ARCHIVE}-0001.tar ...
For each tarball that is successfully processed the '${ARCHIVE}-<N>.tar.lst' file is moved to a separate directory 'done' . This way the 'upload-tar' script can be stopped and restarted at will, as it will continue processing '${ARCHIVE}-<N>.tar.lst' files until all have been moved to the 'done' directory.