Difference between revisions of "Job matching on graszode vs graspol"

From PDP/Grid Wiki
Jump to navigationJump to search
m (Added extra notes)
 
Line 52: Line 52:
 
==== Resolution ====
 
==== Resolution ====
 
The WMS still has to be monitored, until being sure that this issue doesn't repeat again.
 
The WMS still has to be monitored, until being sure that this issue doesn't repeat again.
 +
 +
==== Notes ====
 +
 +
A couple of notes were posted in the bug:
 +
 +
  Marco Cecchi: Hi, next time it happens please check the ism dump file. Before restarting, just wait for another purchase cycle.
 +
 +
  Stephen Burke:I'm not sure it's exactly the same, in this case it would be a partial failure because apparently the match worked without the data
 +
                requirement, so either the problem is the dynamic lookup from the LFC or it's a loss of the close SE info only.

Latest revision as of 10:56, 17 February 2010

Issue

The execution of glite-wms-job-list-match against graszode and graspol is giving different result. With one of them I get the list with the CE/queue, but with the other I get an empty list. The jdl is the following:

 Type = "job";
 JobType = "normal";
 Executable = "/bin/sh";
 Arguments  = "mach2dat.sh ergo test.dat test.ped ergo.chr1_1.mlinfo ergo.chr1_1.dose.gz";
 StdOutput = "lfn.out";
 StdError  = "lfn.err";
 InputSandbox  = {"mach2dat.sh", "test.dat", "test.ped"};
 OutputSandbox = {"lfn.out", "lfn.err"};
 DataCatalog = "http://lfc.grid.sara.nl:8085";
 InputData   = {
       "lfn:/grid/lsgrid/aabuseiris/grimp/bin/mach2dat",
       "lfn:/grid/lsgrid/aabuseiris/grimp/datasets/ergo/ergo.chr1_1.mlinfo",
       "lfn:/grid/lsgrid/aabuseiris/grimp/datasets/ergo/ergo.chr1_1.dose.gz"
 };
 DataAccessProtocol = {"rfio","gsiftp","gsidcap","https"};


These are the results:

 $ glite-wms-job-list-match -d fbernabe -e https://graszode.nikhef.nl:7443/glite_wms_wmproxy_server ego.jdl
 Connecting to the service https://graszode.nikhef.nl:7443/glite_wms_wmproxy_server
 ==========================================================================
    COMPUTING ELEMENT IDs LIST 
  The following CE(s) matching your job requirements have been found:
    *CEId*
  - gb-ce-emc.erasmusmc.nl:2119/jobmanager-pbs-express
  - gb-ce-emc.erasmusmc.nl:2119/jobmanager-pbs-medium
  - gb-ce-rug.sara.usor.nl:8443/cream-pbs-express
  - gb-ce-rug.sara.usor.nl:8443/cream-pbs-medium
  - gb-ce-ams.els.sara.nl:2119/jobmanager-pbs-express
  - gb-ce-ams.els.sara.nl:2119/jobmanager-pbs-medium
 ==========================================================================
 $ glite-wms-job-list-match -d fbernabe -e https://graspol.nikhef.nl:7443/glite_wms_wmproxy_server ego.jdl
 Connecting to the service https://graspol.nikhef.nl:7443/glite_wms_wmproxy_server
 ==================== glite-wms-job-list-match failure ====================
   No Computing Element matching your job requirements has been found!
 ==========================================================================

Cause

The bug https://savannah.cern.ch/bugs/index.php?57421 could give more information about it.


Workaround

The restart of glite-wms-wm on the problematic WMS solves the issue.


Resolution

The WMS still has to be monitored, until being sure that this issue doesn't repeat again.

Notes

A couple of notes were posted in the bug:

 Marco Cecchi: Hi, next time it happens please check the ism dump file. Before restarting, just wait for another purchase cycle.
 Stephen Burke:I'm not sure it's exactly the same, in this case it would be a partial failure because apparently the match worked without the data 
               requirement, so either the problem is the dynamic lookup from the LFC or it's a loss of the close SE info only.