Using the Grid/Grid storage
Grid Storage Illustration
Intro
Each cluster on the Grid is equipped with a Storage Element or SE where data is stored. On which SE's you can store data depends on your VO, and can be inspected with the following command:
$> lcg-infosites --vo lsgrid se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 981270000 118902 n.a gb-se-lumc.lumc.nl 8250147054295 282442433 n.a srm.grid.sara.nl 1320000000 4429468 n.a gb-se-nki.els.sara.nl 376360000 44957579 n.a gb-se-kun.els.sara.nl 536752010 118902 n.a se.grid.rug.nl 1190000000 18978219 n.a gb-se-ams.els.sara.nl 1290000000 19570444 n.a gb-se-wur.els.sara.nl 1320000000 4372571 n.a gb-se-amc.amc.nl 375067564 44362835 n.a srm.grid.sara.nl
Sometimes it is useful to store your data on more than one SE. This is called replication. This allows jobs to run close to where the data is located, reducing long range network traffic.
SE's identify files by Storage URL or SURL, but these SURL's are not very descriptive; e.g. a SURL might look like this:
srm://gb-se-ams.els.sara.nl/dpm/els.sara.nl/home/lsgrid/generated/2008-02-28/file2868595b-ae21-4d6a-8a60-0ed4a668b809
The logical file catalog (LFC) allows you to give more descriptive names to your files, and to order your files in a directory structure. Bear in mind that the LFC is not itself a storage system. It's just a database that keeps track of your files.
About the command line tools
You can manipulate your data with a set of command line tools. Some of these commands start with lfc-..., while others start with lcg-..., which can be confusing at first.
- lcg-...
- All these commands operate on the data itself. Additionally, some of these commands have "side effects" in the LFC: e.g. the lcg-cr command uploads a file to an SE and registers this file in the LFC under a certain name.
- lfc-...
- These commands only operate on the LFC, and don't manipulate data. E.g. the command lfc-mkdir creates a new directory in the LFC.
Important: before you can store, retrieve and replicate data, you need:
- a valid Grid proxy
- the environment variable LFC_HOST set to you LFC server. For all SARA's user interface machines, this should be lfc.grid.sara.nl. You could put the following in your $HOME/.bash_profile:
export LFC_HOST='lfc.grid.sara.nl'
- LCG command line utilities reference guide is here
- Or an overview with explantation lcg: here
- lfc commands here
Listing files and directories
For each of the supported VO's, a separate "top level" directory exists under the /grid/ directory. E.g. to see all the files that are stored for the lsgrid VO, make sure you have a running lsgrid VOMS proxy and then type:
$> lfc-ls -l /grid/lsgrid/ drwxrwxr-x 2 30125 3010 0 Feb 05 12:56 arni drwxrwxr-x 3 30146 3010 0 Mar 06 15:21 dutilh drwxrwxr-x 3 30147 3010 0 Feb 22 16:12 emc-gwatest ... ... ...
Rather than having to type an absolute path for every file and directory you use, it is instead possible to define a home directory from which you may use relative file/directory paths. You can do this by setting the environment variable LFC_HOME:
$> export LFC_HOME='/grid/lsgrid'
Creating a new directory
In the next examples, you should of course replace your_username and your_vo, so don't just cut and paste !
Before you can register any file of your own, you must create a new directory in the file catalog:
$> lfc-mkdir /grid/your_vo/your_username
To check that you have created your directory type:
$> export LFC_HOME=/grid/your_vo $> lfc-ls -l
and you should see your directory (plus possibly those of others).
Storing a file on a Storage Element (SE)
Use the lcg-cr command (cr stands for copy-register) to upload a file to an SE. This command not only uploads the file, but also registers it into the LFC, so you can easily find it later. For example, to store a local file text_file.txt on an SE (in this example the SRM at SARA), and register it in the LFC with a logical name, do the following:
$> echo 'Put something here' > text_file.txt $> lcg-cr --vo your_vo -d srm://srm.grid.sara.nl:8443/pnfs/grid.sara.nl/data/your_vo/your_dir/text_file.txt \ -l lfn:/grid/your_vo/your_username/text_file.txt "file://$PWD/text_file.txt" guid:624d679c-c81e-4054-9031-f4d5cd1b8f2b
The guid is a unique identifier for the file you just uploaded. As you will see below, replica's of the same file share this guid.
You should now be able to see this file in the LFC:
$> lfc-ls -l /grid/your_vo/your_username/text_file.txt -rw-rw-r-- 1 30085 3010 19 Mar 12 15:16 /grid/your_vo/your_username/text_file.txt
Before continuing it is worth noting the difference between the command used to store the file and the creation of the directory in previous. The directory created is just a virtual directory and only exists within the catalog of LFN's. The file, on the other hand, physically exists on an SE but has an additional "virtual" filename in the catalog.
File replication
A single file can be stored on multiple SE's. Running jobs can retrieve the files they need from an SE nearby, thus giving faster access times to the data. This also helps protect against failures/access difficulties with a particular SE.
To find out which replica's exist for a certain file, use the lcg-lr command. (lr stands for List Replicas.)
$> lcg-lr lfn:/grid/your_vo/your_username/text_file.txt srm://your_se.your.domain.edu/dpm/your.domain.edu/home/your_vo/generated/2008-03-12/file4391629d-a8f8-48f2-9dc4-fec8ee81093d srm://otherse.your.domain.edu/dpm/your.domain.edu/home/your_vo/generated/2008-03-12/file51e95256-2f51-4e00-bc0d-9d466d27059b
In this example, there are two replica's of the file.
To replicate this file to yet another SE, do this:
$> lcg-rep --vo your_vo -d yet_another_se lfn:/grid/your_vo/your_username/text_file.txt
If you now run the lcg-lr command again, you should see that there's a third, new, replica of the same file.
Note how the path for each replica is different. This demonstrates how the use of a "lfn" avoids the need to understand the local filesystem where the replica is actually stored.
Retrieving files from a Storage Element
To download a file you already uploaded, use the lcg-cp command. (cp stands for copy.)
$> lcg-cp --vo your_vo lfn:/grid/your_vo/your_username/text_file.txt file://$PWD/text_file.txt
Unfortunately, this command doesn't necessarily download the file from a nearby SE. If the file is replicated by various SE's, then the file is downloaded from a random SE on which a replica of the file resides.
To make sure you download the replica from a nearby SE, do this:
$> SURL=`lcg-lr "lfn:/grid/your_vo/your_username/text_file.txt" | grep "//$VO_YOURVO_DEFAULT_SE/"` $> lcg-cp --vo your_vo "$SURL" file://$PWD/text_file.txt
The first line lists all replica SURL's of the file, and greps for the name of the nearby SE. Not the file LFN but the SURL is then used to retrieve the file. The $VO_YOURVO_DEFAULT_SE environment variable always contains the name of the SE nearest to your current User interface or Worker node. To find out which variable to use, try this:
$> set | grep _DEFAULT_SE VO_ALICE_DEFAULT_SE=srm.grid.sara.nl VO_ATLAS_DEFAULT_SE=srm.grid.sara.nl VO_BEAPPS_DEFAULT_SE=srm.grid.sara.nl ... ... ... VO_VLIBU_DEFAULT_SE=srm.grid.sara.nl
Your VO should be listed there.
Deleting files from a Storage Element
To delete a file from all storage elements, do
$> lcg-del -a lfn:/grid/your_vo/your_username/text_file.txt
The -a option makes sure that all replicas of a file are removed. See the lcg-del man-page for more options.
Troubleshooting
The logical file catalog is a place where you register files, so you can find their replicas that a physically stored on a storage element.
If the physical storage is removed or lost, and you don't have any other replica's, you end up with only a registration in the lfc.