Difference between revisions of "Passing job requirements through the WMS"

From PDP/Grid Wiki
Jump to navigationJump to search
 
Line 13: Line 13:
 
where ''j'' is the total number of megabytes required.
 
where ''j'' is the total number of megabytes required.
  
 +
For specifying required CPU time or wall clock time, use e.g.:
 +
 +
CERequirements = "other.GlueCEPolicyMaxCPUTime >= 5 && other.GlueCEPolicyMaxWallClockTime >= 8";
 +
 +
to ask for five minutes of CPU time and eight minutes of total run time.
  
 
== Requesting more memory for Grid jobs ==
 
== Requesting more memory for Grid jobs ==

Latest revision as of 15:36, 6 February 2014

Abstract

Contrary to available documentation on CREAM and WMS middleware, the only known way to pass multi-core and memory requirements through the WMS is

  1. by using the JDL parameters SMPGranularity, CPUNumber and CERequirements as indicated by the example below and
  2. by submitting to a CREAM CE and not a Globus GRAM5 CE.

The typical stanza in the JDL for multi-core jobs would read:

CPUNumber = m;
SMPGranularity = n;

where m and n are integers indicating the total number of cores for the job and the number of cores per node. For high-memory jobs the following line is required:

CERequirements = "other.GlueHostMainMemoryRAMSize >= j";

where j is the total number of megabytes required.

For specifying required CPU time or wall clock time, use e.g.:

CERequirements = "other.GlueCEPolicyMaxCPUTime >= 5 && other.GlueCEPolicyMaxWallClockTime >= 8";

to ask for five minutes of CPU time and eight minutes of total run time.

Requesting more memory for Grid jobs

Statement of the problem

Different users have different needs. Scheduling the computational resources of computer cores and memory in a grid environment in a way that treats all users fairly, requires enforced limitations on the consumption of said resources.

By default, a job slot is treated as a simple unit of computation: you get one core and some predetermined amount of memory for a fixed duration as set by the queue properties. Want more cores? You get more memory. Want more memory? You are also allocated more cores. But increasingly we see that users just wish to run a single process on hardware with lots of memory. This leaves cores idling which other users could still use.

The solution

Job schedulers can do a remarkable job of mixing and matching job requirements and placing jobs on nodes and assigning resources, if the jobs indicate exactly what they really need. For multi-core job scheduling this means the opportunistic back-filling of short jobs. For high-memory jobs, this means co-locating with low-memory jobs. If a user indicates her requirements, the scheduler may accommodate.

The Catch

There is a catch. Grid jobs are submitted through systems such as the CREAM CE and the EMI WMS. The jobs are expressed in a language called JDL (job description language) and over time, there have been attempts to treat certain expressions in this language as parameters that should be passed to the local batch system (a.k.a. LRMS). Not all of the advertised ways work equally well.

What doesn't work

Let's start by mentioning what doesn't work. This has been tested on a reasonably up-to-date system, but not the latest and greatest. The software in production comes from the UMD-3 fold.

glite-wms-ui-commands 3.4.0 UMD3
glite-wms-core 3.5.0 UMD3
emi-cream-ce 1.2.2 UMD3

According to the CREAM documentation, A requirement like

Requirements = "other.GlueHostMainMemoryRAMSize > 100"

which states that the host should have at least 100MB of main memory, should be forwarded to the CREAM as a CeRequirement if the WMS configuration (/etc/glite-wms/glite_wms.conf) lists the parameter in the CeForwardParameters setting (in the WorkloadManager section). This doesn't work. Not sure if it ever did, but it certainly did not with the above versions of the software. Since this was tracked in Savannah bug #42288, this should have worked so it might be a regression.

Another way, although not widely advertised, involves the generic parameter passing from the WMS through the PropagateToLRMS

PropagateToLRMS= {
[ name = "smpgranularity"; value = jdl.SMPGranularity ],
[ name = "wholenodes"; value = jdl.WholeNodes ; requires = jdl.WholeNodes == true; ],
[ name = "hostsmpsize"; value = ce.GlueHostArchitectureSMPSize ],
[ name = "mpi_type"; value = jdl.MpiType; requires = ce.GlueCEInfoLRMSType == "lsf"; ],
[ name = "hostmainmem"; value = ce.GlueHostMainMemoryRAMSize; requires = ce.GlueCEInfoLRMSType == "pbs"; ]
}; 

See Savannah bug #58878. This also does not work.

WMS to GRAM5

What certainly doesn't work is passing parameters to GRAM5 from the WMS. Memory requirements are not passed at all, and multi-core requirements are somehow boggled as well. A JDL statement like

SMPGranularity = 4;
CPuNumber = 4;

is passed straight to CREAM and correctly translated to the PBS script

#PBS -l nodes=1:ppn=4

but when sent to GRAM5, the Globus RSL reads

(queue=short)(jobtype=single)(count=4)(hostCount=4)

which is translated to

#PBS -l nodes=4

and that is certainly not correct.

This information can be found by running commands like glite-wms-job-info, which shows the JDL (original when given -j, or as passed to CREAM when given --jdl); or "glite-wms-job-logging-info -v 3" which shows the RSL sent to Globus.

What does work

It turns out that it's possible to just use the CeRequirements parameter in the JDL. This works both for CREAM (direct submission), but apparently also for the WMS. There is little documentation, but see this article on WNoDeS.

The following JDL requests 8 GB of memory:

Executable = "memlimits.sh";
Stdoutput = "stdout.txt";
StdError = "stderror.txt";
InputSandBox  = "memlimits.sh";
OutputSandbox = {"stdout.txt","stderror.txt"};
CERequirements = "other.GlueHostMainMemoryRAMSize >= 8192";

This works, but not in combination with a GRAM5 CE. It should be noted that there is some configuration required on CREAM to pick up these requirements.

Here is the memlimits.sh script used above. It just allocates memory until it hits the limit.

#!/bin/bash
echo "\$ ulimit -a"
ulimit -a
cat > memeat.c <<'EOF'
#include <stdio.h>
#include <stdlib.h>
/* eat up memory until fail or crash */
int main(int argc, char *argv[])
{
    size_t chunk = 1024 * 1024; /* one mega-byte allocation size */
    size_t total = 0; /* how many bytes we got in total */
     printf("Going to allocate %zi bytes at a time\n", chunk);
    while(1) {
   printf(".");
   if (NULL == malloc(chunk)) {
       break;
   }
   total += chunk;
    }
    printf("\nThat's our limit. Total allocation: %zi bytes (%zi kB)\n", total, (total / 1024));
    return 0;
}
EOF
echo "compiling memeat.c"
gcc -o memeat memeat.c
echo "running memeat.c. Cross fingers."
./memeat
echo "All done, exit code = $?"