Difference between revisions of "SintMaarten network"
| Line 23: | Line 23: | ||
= Analysis = | = Analysis = | ||
| − | + | At first it was thought that the <tt>lcg-cp</tt> command itself was causing the error: | |
| − | At first it was thought that the <tt>lcg-cp</tt> command itself was causing the error | + | * when copying the file using <tt>lcg-cp</tt> the command timed out after several minutes |
| + | * when copying the exact same file using <tt>globus-url-copy</tt> the command finished in less than a minute | ||
| + | However when using | ||
| + | globus-url-copy -nodcau | ||
| + | the command ''also'' timed out. It is worth noting that | ||
| + | * -nodcau means 'no data channel authentication' ; it makes file transfers less secure but it works better when firewalls are in place | ||
| + | * <tt>lcg-cp</tt> also disabled data channel authentication | ||
| + | |||
= Solution = | = Solution = | ||
Revision as of 11:30, 24 November 2009
In October 2009 the SintMaarten cluster was commissioned. This cluster is based on HP blades. Soon after commissioning a serious performance issue was reported:
[BG-NLT1-Support] #287: bad gridftp transfer rate - smrt wns
This page is the result of the analysis of this performance issue.
Problem report
The performance issue reported was seen when copying a file from the Nikhef storage system to a SintMaarten worker node. Transfer speeds at first were OK but dropped to very low levels after about 120 Mb of data, eventually causing timeouts in the lcg-cp command used. Copying the exact same file from the exact same storage element to a slightly older worker node did not experience this problem:
===
wn-smrt-006 (Bad!)
===
# lcg-cp --vo atlas -v srm://.... file://.....
[snip]
# streams: 1
62914560 bytes 1279.98 KB/sec avg 512.00 KB/sec inst
vs
===
wn-val-066 (Good!)
===
# lcg-cp --vo atlas -v srm://.... file://.....
[snip]
# streams: 1
1672478720 bytes 68053.21 KB/sec avg 70142.84 KB/sec inst
Analysis
At first it was thought that the lcg-cp command itself was causing the error:
- when copying the file using lcg-cp the command timed out after several minutes
- when copying the exact same file using globus-url-copy the command finished in less than a minute
However when using
globus-url-copy -nodcau
the command also timed out. It is worth noting that
- -nodcau means 'no data channel authentication' ; it makes file transfers less secure but it works better when firewalls are in place
- lcg-cp also disabled data channel authentication