Checksumming support in SRM implementations

From PDP/Grid Wiki
Jump to navigationJump to search

Some applications have requested the availability of checksumming support on files stored on the storage elements in the Netherlands. The first VO to request this was Atlas, but other VOs are also very interested. Unfortunately, checksumming support seems to be non-homogenously implemented across the different storage backend systems.

This Wiki page is the result of a short investigation into the checksumming differences between dCache, DPM, StoRM and CASTOR. First a short overview is given of the checksumming features of each storage system. At the end the output of the (relevant) commands used in this investigation is shown.

Why only dCache, DPM, StoRM and CASTOR?

At the time of writing (August 25th 2009) the VO dteam had access to 319 SEs in total, of which

  • 195 DPM
  • 64 dCache
  • 35 StoRM
  • 16 CASTOR
  • 3 unknown (but at least one looks like DPM)
  • 6 down

So I did not have access to other SRM systems.

Note: so-called gridftp checksums are checksums that can be calculated on-the-fly using a special GridFTP command:

QUOTE CKSM ADLER32 0 -1 full-path

dCache

  • srmping SARA: v2.2 dCache production-1.9.3-3
  • seems to support only adler32 checksums (which is what WLCG seems to want) ; according to Ron md5 is also supported but this requires a server reconfiguration
  • checksums
    • are computed when the file is transferred to dCache or during a move between pool nodes.
    • are stored in the dCache namespace
    • can be retrieved using the srmls -l command

DPM

  • srmping Nikhef: v2.2 DPM 1.7.0-5
  • supports adler32, md5 and crc32 checksums
  • supports only gridftp checksums; adler32 and crc32 are supported using a special DPM-DSI GridFTP plugin
  • checksums
    • can be computed using the lcg-get-checksum command
    • are stored in the DPM namespace but can never be retrieved from there ( !! ) This is fixed in DPM 1.7.2+
    • are not displayed when using the srmls -l command. This is fixed for DPM 1.7.2+ servers
    • are recalculated every time. This is fixed for DPM 1.7.2+ servers

StoRM

  • srmping Groningen: v2.2 StoRM <FE:1.5.0-1.sl4><BE:1.5.1-2.sl4>
  • from version 1.5 on supports md5 or adler32 checksums
  • checksums
    • can be retrieved using the lcg-get-checksum command; however, the checksum type needs to be specified otherwise you get the wrong checksum
    • are displayed when using the srmls -l command to list a single file. If multiple files are listed then the checksum is only displayed for the first file
    • are recalculated every time

CASTOR

  • srmping: v2.2 CASTOR v2_7_15 2.1.7
  • supports only adler32 checksums
  • checksums
    • are computed in the background
    • cannot be specified with the --checksum option
    • can be retrieved using the srmls -l command
    • are stored in the CASTOR name space
  • gridftp checksums are not supported

Command output

dCache

lcg-cr output:

$ lcg-cr -l /grid/pvier/janjust/my-dcache-file-checksum-adler32 -d $SRM file:/user/janjust/myfile --checksum 
guid:d156757d-ecca-472e-adb9-64ebefde23c4

srm-ls output:

$ srmls -l $SRM/pnfs/....
 8508 $SRM/pnfs/....
 space token(s) :15253459
 storage type:PERMANENT
 retention policy:CUSTODIAL
 access latency:NEARLINE
 locality:NEARLINE
 - Checksum value:  cd5d9820
 - Checksum type:  adler32
 UserPermission: uid=18010 PermissionsRW
 GroupPermission: gid=1276 PermissionsR
 WorldPermission: R
 created at:2009/08/22 00:02:18
 modified at:2009/08/22 00:02:18
 - Assigned lifetime (in seconds):  -1
 - Lifetime left (in seconds):  -1
 - Original SURL:  $SRM/pnfs/....
 - Status:  null
 - Type:  FILE

lcg-get-checksum output:

$ lcg-get-checksum $SRM/pnfs/....
cd5d9820	$SRM/pnfs/....

DPM

lcg-cr output:

$ lcg-cr -l /grid/pvier/janjust/my-dpm-file-checksum-adler32 -d $SRM file:/user/janjust/myfile --checksum 
guid:2352aac8-37b9-4cfd-ba91-5eaee71fd5f1

srm-ls output:

$ srmls -l $SRM
 8508 /dpm/science.uu.nl/....
space token(s) :none found
 storage type:PERMANENT
retentionpolicyinfo : null
 locality:ONLINE
  UserPermission: uid=/O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser PermissionsRW
  GroupPermission: gid=pvier PermissionsRW
 WorldPermission: R
created at:2009/08/21 23:56:27
modified at:2009/08/21 23:56:27
 - Lifetime left (in seconds):  -1
 - Original SURL:  /dpm/science.uu.nl/....
- Status:  null
- Type:  FILE

lcg-get-checksum output:

$ lcg-get-checksum $SRM/dpm/science.uu.nl/....
cd5d9820	$SRM/dpm/science.uu.nl/....

StoRM

(below is partial success for StoRM v1.5+)

lcg-cr output:

$ lcg-cr -l /grid/pvier/janjust/my-storm-file-checksum-adler32 -d $SRM file:/user/janjust/myfile --checksum 
[LCG-UTIL][lcg_cr4][] Destination may be corrupted: 
Source checksum (cd5d9820) != Destination checksum (d41d8cd98f00b204e9800998ecf8427e )
[LCG-UTIL][lcg_cp4][] $SRM/pvier/.... has been DELETED, please try again (temporary network problem? )
lcg_cr: Resource temporarily unavailable

a different file:

$ lcg-cr -l /grid/pvier/janjust/my-storm-file-checksum-adler32 -d $SRM file:/user/janjust/wms_soap_msgs.txt --checksum 
[LCG-UTIL][lcg_cr4][] Destination may be corrupted:
Source checksum (8d9e8485) != Destination checksum (d41d8cd98f00b204e9800998ecf8427e)
[LCG-UTIL][lcg_cp4][] $SRM/pvier/.... has been DELETED, please try again (temporary network problem?)
lcg_cr: Resource temporarily unavailable

Interesting: the destination checksum is the same, even though it's a completely different file ?!?!?!?!

srm-ls output: not shown as the file was not successfully copied.

For another file that was copied to StoRM without --checksum:

$ lcg-get-checksum $SRM/pvier/....
d41d8cd98f00b204e9800998ecf8427e	$SRM/pvier/....

Wait! That's the exact same checksum as shown above !!

Let's try a different StoRM SRM:

$ lcg-get-checksum srm://storm-fe-cms.cr.cnaf.infn.it/dteam/testfile-cp-20090727-171238-6e877e21562d1.txt
d41d8cd98f00b204e9800998ecf8427e	srm://storm-fe-cms.cr.cnaf.infn.it/dteam/testfile-cp-20090727-171238-6e877e21562d1.txt

$ lcg-get-checksum srm://storm-fe-cms.cr.cnaf.infn.it/dteam/helloworld.txt1248346365.88
d41d8cd98f00b204e9800998ecf8427e	srm://storm-fe-cms.cr.cnaf.infn.it/dteam/helloworld.txt1248346365.88

Again, the same checksum. Note that these files were already present on the SRM system.

So at least we know it's not a site-specific issue ;-) There is a magic StoRM checksum value:

 MagicStormChecksum = d41d8cd98f00b204e9800998ecf8427e

In StoRM v1.5+ there is partial support for checksumming provided that

  • the service is turned on by the local StoRM administrator
  • you know the type of checksum {s}he configured.

The StoRM SRM was configured with MD5 checksumming support:

$ lcg-cr -d $SRM/myfile-with-md5-cksum -l lfn://grid/pvier/janjust/my-storm-md5-file \
    file:/user/janjust/myfile  --checksum --checksum-type md5
guid:2f918356-2376-42b1-97bc-a4595a4987bc

OK, so the file is uploaded.

srm-ls output:

$ srmls -l $SRM/myfile-with-md5-cksum
  8508 /pvier/janjust/myfile-with-md5-cksum
 space token(s) :none found
  storage type:PERMANENT
  retention policy:REPLICA
  access latency:ONLINE
  locality:ONLINE
  - Checksum value:  14bd7bc6962e01631dc5c63b728453c0
  - Checksum type:  md5
   UserPermission: uid=11523 PermissionsRW
   GroupPermission: gid=11523 PermissionsRW
  WorldPermission: RW
 modified at:2010/03/23 16:07:28
   - Assigned lifetime (in seconds):  -1
  - Lifetime left (in seconds):  -1
  - Original SURL:  /pvier/janjust/myfile-with-md5-cksum
 - Status:  Successful request completion
 - Type:  FILE

(Note: md5sum myfile also returns 14bd7bc6962e01631dc5c63b728453c0 )

But:

$ srmls -l $SRM                      
  0 /pvier/janjust/
  [...]
 - Type:  FILE
      8508 /pvier/janjust/myfile-with-md5-cksum
 space token(s) :none found
      storage type:PERMANENT
      retention policy:REPLICA
      access latency:ONLINE
      locality:ONLINE
       UserPermission: uid=11523 PermissionsRW
       GroupPermission: gid=11523 PermissionsRW
      WorldPermission: RW
     modified at:2010/03/23 16:07:28
       - Assigned lifetime (in seconds):  -1
      - Lifetime left (in seconds):  -1
      - Original SURL:  /pvier/janjust/myfile-with-md5-cksum
 - Status:  Successful request completion

When using srmls -l on a directory StORM returns only the checksum for the first file in that directory and skips the checksums for all other files!

lcg-get-checksum output:

$lcg-get-checksum  $SRM/...
d41d8cd98f00b204e9800998ecf8427e        $SRM/...

That's still the MagicStormChecksum. If we explicitly specify the checksum type:

$ lcg-get-checksum --checksum-type md5 $SRM/...
14bd7bc6962e01631dc5c63b728453c0        srm://srm.grid.rug.nl:8444/pvier/...

We do get the right checksum.

Conclusion:

  • from StoRM v1.5+ on checksumming support is partially working
  • a user needs to know in advance which checksum type is configured by the StoRM administrator
  • checksums are still calculated on the fly , so it is still very well possible to take down a StoRM SRM by requesting too many checksums
  • 'srm-ls -l' shows only the checksum of the first file

CASTOR

lcg-cr output:

$ lcg-cr -d $SRM/myfile2 -l lfn:/grid/pvier/janjust/my-castor-file2 file://$PWD/myfile --checksum
[LCG-UTIL][lcg_get_checksum_surls][] $SRM/castor/....:
Communication error on send 
guid:64344a07-f3f3-41e5-9079-c86bf61a874e
lcg_cr: Communication error on send 

Afterwards the file is stored in SRM but not in the LFC.

srmls output (after 10 minutes; the checksum calculation happens in the background):

$ srmls -l $SRM/myfile
 8508 /castor/....
 space token(s) :none found
 type: null
 retentionpolicyinfo : null
  locality:ONLINE_AND_NEARLINE
  - Checksum value:  0xcd5d9820
  - Checksum type:  adler32 
   UserPermission: uid=dteam001 PermissionsRWX
   GroupPermission: gid=dteam PermissionsRWX
  WorldPermission: RW
 created at:2009/08/20 17:11:06

lcg-get-checksum output:

$ lcg-get-checksum s$SRM/castor/....
0xcd5d9820	$SRM/castor/....

Notice that the checksum is preceded here by 0x !!