Difference between revisions of "FileStager"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 1: Line 1:
 
== Introduction to the FileStager package ==
 
== Introduction to the FileStager package ==
  
The File Stager is created to solve the problem of the slow analysis jobs run on an offline batch cluster, such as [http://www.nikhef.nl/grid/stats/stbc/ Stoomboot]. It is based on the idea of caching the input files on a closer location, to reduce the latency of the remote access (typically using rfio or dcap protocol) to these files by the analysis jobs. It works in a streamlined fashion: processing of the event data inside a "staged" local file happens concurrently (in parallel, overlapped) with staging the next input file from the Grid. Copying of a file should start at the same time that the previous file is opened for processing.  
+
The File Stager is created to solve the problem of the slow analysis jobs run on an offline batch cluster, such as [http://www.nikhef.nl/grid/stats/stbc/ Stoomboot]. It is based on the idea of caching the input files on a closer location, to reduce the latency of the remote access (typically using rfio or dcap protocol) to these files by the analysis jobs. It works in a streamlined fashion: processing of the event data inside a "staged" local file happens concurrently (in parallel, overlapped) with staging the next input file from the Grid. Copying of a file should start at the same time that the previous file is opened for processing. For example: file_2 is staged while file_1 is processed, file_3 is staged while file_2 is processed an so on.
 +
 
 +
Under normal circumstances (network not saturated, amount of event processing ~10ms, file sizes ~2GB) staging of a file finishes earlier than processing, so the stager effectively performs like having the files on a local disk. If staging of a file is not finished before it is necessary for processing, the FileStager will block until the transfer is completed. The only necessary waiting time is during staging of the very first file.
 +
A staged file is kept on the local storage only for the time required by the job to process that file, after which it is automatically removed. In case the job fails/crashes, any orphan staged files associated with a job will be removed by a separate independent garbage-collection process. 
  
Under normal circumstances (network not saturated, amount of event processing ~10ms, file sizes ~2GB) staging finishes earlier than processing, so the stager effectively performs like having the files on a local disk. If staging of a file is not finished before it is necessary for processing, the FileStager will block until the transfer is completed. The only necessary waiting time is during staging of the very first file.
 
The staged files are kept on the local storage only for
 
 
The stager also works with Event Tag Collections, in which case the .root file is scanned to obtain the original Grid files that contain the event data.
 
The stager also works with Event Tag Collections, in which case the .root file is scanned to obtain the original Grid files that contain the event data.
  
The input files specified in the job options should be picked up by the FileStager package and translated to local URLs  which the EventSelector will pick up for the job.
+
The input files specified in the job options should be picked up by the FileStager package and translated to local URLs  which the EventSelector will use for the job.
  
 
== Requirements for using the File Stager ==
 
== Requirements for using the File Stager ==
Line 15: Line 16:
 
*Grid environment setup correctly:
 
*Grid environment setup correctly:
 
       source /global/ices/lcg/glite3.2.4/external/etc/profile.d/grid-env.sh
 
       source /global/ices/lcg/glite3.2.4/external/etc/profile.d/grid-env.sh
 +
 +
 +
 +
== Setting up the File Stager ==
 +
  
 
== Troubleshooting ==
 
== Troubleshooting ==

Revision as of 14:25, 23 June 2010

Introduction to the FileStager package

The File Stager is created to solve the problem of the slow analysis jobs run on an offline batch cluster, such as Stoomboot. It is based on the idea of caching the input files on a closer location, to reduce the latency of the remote access (typically using rfio or dcap protocol) to these files by the analysis jobs. It works in a streamlined fashion: processing of the event data inside a "staged" local file happens concurrently (in parallel, overlapped) with staging the next input file from the Grid. Copying of a file should start at the same time that the previous file is opened for processing. For example: file_2 is staged while file_1 is processed, file_3 is staged while file_2 is processed an so on.

Under normal circumstances (network not saturated, amount of event processing ~10ms, file sizes ~2GB) staging of a file finishes earlier than processing, so the stager effectively performs like having the files on a local disk. If staging of a file is not finished before it is necessary for processing, the FileStager will block until the transfer is completed. The only necessary waiting time is during staging of the very first file. A staged file is kept on the local storage only for the time required by the job to process that file, after which it is automatically removed. In case the job fails/crashes, any orphan staged files associated with a job will be removed by a separate independent garbage-collection process.

The stager also works with Event Tag Collections, in which case the .root file is scanned to obtain the original Grid files that contain the event data.

The input files specified in the job options should be picked up by the FileStager package and translated to local URLs which the EventSelector will use for the job.

Requirements for using the File Stager

  • A valid grid certificate
  • Grid environment setup correctly:
     source /global/ices/lcg/glite3.2.4/external/etc/profile.d/grid-env.sh


Setting up the File Stager

Troubleshooting

    Error with lcg_cp utility!Error message: Some error message from the Lcg_util middleware
    Checking file size:File does not exist, or problems with gfal library.

This is usually indication that no valid certificate exists, or that the grid environment is not setup correctly. Try:

    echo $LCG_LOCATION

to see if this environment variable is set correctly. It should point to some directory like

   /global/ices/lcg/glite3.2.4/lcg

Also try:

  lcg-cp -v --vo lhcb srm://aFiletoStage file:/tmp/localFileCopy

If this isn't working correctly, ask a grid expert to solve the problem.

   No permission to write in temporary directory <someusername>. Switching back to default <$TMPDIR>, or else </tmp>.
   Default tmpdir set to <someusername>

The temporary directory is where the staged Grid files are stored, in the course of the job execution. It is in the format BaseTmpdir/<username>_pIDXXXX By default, the following sequence of locations is suggested for the BaseTmpdir:

  1. If the BaseTmpdir property of the FileStagerSvc is set,AND the job has permission to write in that directory, then this location will be used.
  Else
     1.1 If any of the following environment variables are set, the first available one will be used: TMPDIR, EDG_WL_SCRATCH,OSG_WN_TMP,WORKDIR
  Else
     1.2 /tmp directory is used (be careful, as it is normally just a small partition with a few Gigabytes of diskspace)

You can override the default location of the BaseTmpdir by including the following line in the jobOption file

   FileStagerSvc().BaseTmpdir="/some/temp/directory/with/sufficient/space"