Difference between revisions of "Job matching on graszode vs graspol"

From PDP/Grid Wiki
Jump to navigationJump to search
(Described issue of ticket 312.)
 
m (Added extra notes)
 
(One intermediate revision by the same user not shown)
Line 25: Line 25:
 
   Connecting to the service https://graszode.nikhef.nl:7443/glite_wms_wmproxy_server
 
   Connecting to the service https://graszode.nikhef.nl:7443/glite_wms_wmproxy_server
 
   ==========================================================================
 
   ==========================================================================
    COMPUTING ELEMENT IDs LIST  
+
    COMPUTING ELEMENT IDs LIST  
 
   The following CE(s) matching your job requirements have been found:
 
   The following CE(s) matching your job requirements have been found:
*CEId*
+
    *CEId*
 
   - gb-ce-emc.erasmusmc.nl:2119/jobmanager-pbs-express
 
   - gb-ce-emc.erasmusmc.nl:2119/jobmanager-pbs-express
 
   - gb-ce-emc.erasmusmc.nl:2119/jobmanager-pbs-medium
 
   - gb-ce-emc.erasmusmc.nl:2119/jobmanager-pbs-medium
Line 41: Line 41:
 
     No Computing Element matching your job requirements has been found!
 
     No Computing Element matching your job requirements has been found!
 
   ==========================================================================
 
   ==========================================================================
 
  
 
==== Cause ====
 
==== Cause ====
Line 53: Line 52:
 
==== Resolution ====
 
==== Resolution ====
 
The WMS still has to be monitored, until being sure that this issue doesn't repeat again.
 
The WMS still has to be monitored, until being sure that this issue doesn't repeat again.
 +
 +
==== Notes ====
 +
 +
A couple of notes were posted in the bug:
 +
 +
  Marco Cecchi: Hi, next time it happens please check the ism dump file. Before restarting, just wait for another purchase cycle.
 +
 +
  Stephen Burke:I'm not sure it's exactly the same, in this case it would be a partial failure because apparently the match worked without the data
 +
                requirement, so either the problem is the dynamic lookup from the LFC or it's a loss of the close SE info only.

Latest revision as of 12:56, 17 February 2010

Issue

The execution of glite-wms-job-list-match against graszode and graspol is giving different result. With one of them I get the list with the CE/queue, but with the other I get an empty list. The jdl is the following:

 Type = "job";
 JobType = "normal";
 Executable = "/bin/sh";
 Arguments  = "mach2dat.sh ergo test.dat test.ped ergo.chr1_1.mlinfo ergo.chr1_1.dose.gz";
 StdOutput = "lfn.out";
 StdError  = "lfn.err";
 InputSandbox  = {"mach2dat.sh", "test.dat", "test.ped"};
 OutputSandbox = {"lfn.out", "lfn.err"};
 DataCatalog = "http://lfc.grid.sara.nl:8085";
 InputData   = {
       "lfn:/grid/lsgrid/aabuseiris/grimp/bin/mach2dat",
       "lfn:/grid/lsgrid/aabuseiris/grimp/datasets/ergo/ergo.chr1_1.mlinfo",
       "lfn:/grid/lsgrid/aabuseiris/grimp/datasets/ergo/ergo.chr1_1.dose.gz"
 };
 DataAccessProtocol = {"rfio","gsiftp","gsidcap","https"};


These are the results:

 $ glite-wms-job-list-match -d fbernabe -e https://graszode.nikhef.nl:7443/glite_wms_wmproxy_server ego.jdl
 Connecting to the service https://graszode.nikhef.nl:7443/glite_wms_wmproxy_server
 ==========================================================================
    COMPUTING ELEMENT IDs LIST 
  The following CE(s) matching your job requirements have been found:
    *CEId*
  - gb-ce-emc.erasmusmc.nl:2119/jobmanager-pbs-express
  - gb-ce-emc.erasmusmc.nl:2119/jobmanager-pbs-medium
  - gb-ce-rug.sara.usor.nl:8443/cream-pbs-express
  - gb-ce-rug.sara.usor.nl:8443/cream-pbs-medium
  - gb-ce-ams.els.sara.nl:2119/jobmanager-pbs-express
  - gb-ce-ams.els.sara.nl:2119/jobmanager-pbs-medium
 ==========================================================================
 $ glite-wms-job-list-match -d fbernabe -e https://graspol.nikhef.nl:7443/glite_wms_wmproxy_server ego.jdl
 Connecting to the service https://graspol.nikhef.nl:7443/glite_wms_wmproxy_server
 ==================== glite-wms-job-list-match failure ====================
   No Computing Element matching your job requirements has been found!
 ==========================================================================

Cause

The bug https://savannah.cern.ch/bugs/index.php?57421 could give more information about it.


Workaround

The restart of glite-wms-wm on the problematic WMS solves the issue.


Resolution

The WMS still has to be monitored, until being sure that this issue doesn't repeat again.

Notes

A couple of notes were posted in the bug:

 Marco Cecchi: Hi, next time it happens please check the ism dump file. Before restarting, just wait for another purchase cycle.
 Stephen Burke:I'm not sure it's exactly the same, in this case it would be a partial failure because apparently the match worked without the data 
               requirement, so either the problem is the dynamic lookup from the LFC or it's a loss of the close SE info only.