Checksumming support in SRM implementations
Some applications have requested the availability of checksumming support on files stored on the storage elements in the Netherlands. The first VO to request this was Atlas, but other VOs are also very interested. Unfortunately, checksumming support seems to be non-homogenously implemented across the different storage backend systems.
This Wiki page is the result of a short investigation into the checksumming differences between dCache, DPM, StoRM and CASTOR. First a short overview is given of the checksumming features of each storage system. At the end the output of the (relevant) commands used in this investigation is shown.
Why only dCache, DPM, StoRM and CASTOR? At the time of writing (August 25th 2009) the VO dteam had access to
- 319 SEs in total, of which
- 195 DPM
- 64 dCache
- 35 StoRM
- 16 CASTOR
- 3 unknown (but at least one looks like DPM)
- 6 down
So I did not have access to other SRM systems.
Note: so-called gridftp checksums are checksums that can be calculated on-the-fly using a special GridFTP command
dCache
- srmping: v2.2 dCache production-1.9.3-3
- seems to support only adler32 checksums (which is what WLCG seems to want) ; according to Ron md5 is also support but this requires a server reconfiguration
- checksums
- are computed when the file is transferred to dCache or during a move between pool nodes.
- are stored in the dCache namespace
- can be retrieved using the srmls -l command
DPM
- srmping: v2.2 DPM 1.7.0-5
- supports adler32, md5 and crc32 checksums
- supports only gridftp checksums; adler32 and crc32 are supported using a special DPM-DSI GridFTP plugin
- checksums
- can be computed using the lcg-get-checksum command
- are stored in the DPM namespace but can never be retrieved from there ( !! )
- are not displayed when using the srmls -l command
- are recalculated every time
StoRM
- srmping: v2.2 StoRM <FE:1.4.0-01.sl4><BE:1.4.0-00>
- supports only md5 gridftp checksums
- checksums
- can be retrieved using the lcg-get-checksum command; however, the checksum is always the same
- are not displayed when using the srmls -l command
- are recalculated every time
CASTOR
- srmping: v2.2 CASTOR v2_7_15 2.1.7
- supports only adler32 checksums
- checksums
- are computed in the background
- cannot be specified with the --checksum option
- can be retrieved using the srmls -l command
- are stored in the CASTOR name space
- gridftp checksums are not supported
Command output
dCache
lcg-cr output:
$ lcg-cr -l /grid/pvier/janjust/my-dcache-file-checksum-adler32 -d $SRM file:/user/janjust/myfile --checksum guid:d156757d-ecca-472e-adb9-64ebefde23c4
srm-ls output:
$ srmls -l $SRM/pnfs/.... 8508 $SRM/pnfs/.... space token(s) :15253459 storage type:PERMANENT retention policy:CUSTODIAL access latency:NEARLINE locality:NEARLINE - Checksum value: cd5d9820 - Checksum type: adler32 UserPermission: uid=18010 PermissionsRW GroupPermission: gid=1276 PermissionsR WorldPermission: R created at:2009/08/22 00:02:18 modified at:2009/08/22 00:02:18 - Assigned lifetime (in seconds): -1 - Lifetime left (in seconds): -1 - Original SURL: $SRM/pnfs/.... - Status: null - Type: FILE
lcg-get-checksum output:
$ lcg-get-checksum $SRM/pnfs/.... cd5d9820 $SRM/pnfs/....
DPM
lcg-cr output:
$ lcg-cr -l /grid/pvier/janjust/my-dpm-file-checksum-adler32 -d $SRM file:/user/janjust/myfile --checksum guid:2352aac8-37b9-4cfd-ba91-5eaee71fd5f1
srm-ls output:
$ srmls -l $SRM 8508 /dpm/science.uu.nl/.... space token(s) :none found storage type:PERMANENT retentionpolicyinfo : null locality:ONLINE UserPermission: uid=/O=dutchgrid/O=users/O=nikhef/CN=Jan Just Keijser PermissionsRW GroupPermission: gid=pvier PermissionsRW WorldPermission: R created at:2009/08/21 23:56:27 modified at:2009/08/21 23:56:27 - Lifetime left (in seconds): -1 - Original SURL: /dpm/science.uu.nl/.... - Status: null - Type: FILE
lcg-get-checksum output:
$ lcg-get-checksum $SRM/dpm/science.uu.nl/.... cd5d9820 $SRM/dpm/science.uu.nl/....
StoRM
lcg-cr output:
$ lcg-cr -l /grid/pvier/janjust/my-storm-file-checksum-adler32 -d $SRM file:/user/janjust/myfile --checksum [LCG-UTIL][lcg_cr4][] Destination may be corrupted: Source checksum (cd5d9820) != Destination checksum (d41d8cd98f00b204e9800998ecf8427e ) [LCG-UTIL][lcg_cp4][] $SRM/pvier/.... has been DELETED, please try again (temporary network problem? ) lcg_cr: Resource temporarily unavailable
a different file:
$ lcg-cr -l /grid/pvier/janjust/my-storm-file-checksum-adler32 -d $SRM file:/user/janjust/wms_soap_msgs.txt --checksum [LCG-UTIL][lcg_cr4][] Destination may be corrupted: Source checksum (8d9e8485) != Destination checksum (d41d8cd98f00b204e9800998ecf8427e) [LCG-UTIL][lcg_cp4][] $SRM/pvier/.... has been DELETED, please try again (temporary network problem?) lcg_cr: Resource temporarily unavailable
Interesting: the destination checksum is the same, even though it's a completely different file ?!?!?!?!
srm-ls output: not shown as the file was not successfully copied.
For another file that was copied to StoRM without --checksum:
$ lcg-get-checksum $SRM/pvier/.... d41d8cd98f00b204e9800998ecf8427e $SRM/pvier/....
Wait! That's the exact same checksum as shown above !!
Let's try a different StoRM SRM:
$ lcg-get-checksum srm://storm-fe-cms.cr.cnaf.infn.it/dteam/testfile-cp-20090727-171238-6e877e21562d1.txt d41d8cd98f00b204e9800998ecf8427e srm://storm-fe-cms.cr.cnaf.infn.it/dteam/testfile-cp-20090727-171238-6e877e21562d1.txt $ lcg-get-checksum srm://storm-fe-cms.cr.cnaf.infn.it/dteam/helloworld.txt1248346365.88 d41d8cd98f00b204e9800998ecf8427e srm://storm-fe-cms.cr.cnaf.infn.it/dteam/helloworld.txt1248346365.88
Again, the same checksum. Note that these files were already present on the SRM system.
So at least we know it's not a site-specific issue ;-)
CASTOR
lcg-cr output:
$ lcg-cr -d $SRM/myfile2 -l lfn:/grid/pvier/janjust/my-castor-file2 file://$PWD/myfile --checksum [LCG-UTIL][lcg_get_checksum_surls][] $SRM/castor/....: Communication error on send guid:64344a07-f3f3-41e5-9079-c86bf61a874e lcg_cr: Communication error on send
Afterwards the file is stored in SRM but not in the LFC.
srmls output (after 10 minutes; the checksum calculation happens in the background):
$ srmls -l $SRM/myfile 8508 /castor/.... space token(s) :none found type: null retentionpolicyinfo : null locality:ONLINE_AND_NEARLINE - Checksum value: 0xcd5d9820 - Checksum type: adler32 UserPermission: uid=dteam001 PermissionsRWX GroupPermission: gid=dteam PermissionsRWX WorldPermission: RW created at:2009/08/20 17:11:06
lcg-get-checksum output:
$ lcg-get-checksum s$SRM/castor/.... 0xcd5d9820 $SRM/castor/....
Notice that the checksum is preceded here by 0x !!