Passing job requirements through the WMS
Abstract
Contrary to available documentation on CREAM and WMS middleware, the only known way to pass multi-core and memory requirements through the WMS is
- by using the JDL parameters SMPGranularity, CPUNumber and CERequirements as indicated by the example below and
- by submitting to a CREAM CE and not a Globus GRAM5 CE.
The typical stanza in the JDL for multi-core jobs would read:
CPUNumber = m; SMPGranularity = n;
where m and n are integers indicating the total number of cores for the job and the number of cores per node. For high-memory jobs the following line is required:
CERequirements = "other.GlueHostMainMemoryRAMSize >= j";
where j is the total number of megabytes required.
Requesting more memory for Grid jobs
Statement of the problem
Different users have different needs. Scheduling the computational resources of computer cores and memory in a grid environment in a way that treats all users fairly, requires enforced limitations on the consumption of said resources.
By default, a job slot is treated as a simple unit of computation: you get one core and some predetermined amount of memory for a fixed duration as set by the queue properties. Want more cores? You get more memory. Want more memory? You are also allocated more cores. But increasingly we see that users just wish to run a single process on hardware with lots of memory. This leaves cores idling which other users could still use.
The solution
Job schedulers can do a remarkable job of mixing and matching job requirements and placing jobs on nodes and assigning resources, if the jobs indicate exactly what they really need. For multi-core job scheduling this means the opportunistic back-filling of short jobs. For high-memory jobs, this means co-locating with low-memory jobs. If a user indicates her requirements, the scheduler may accommodate.
The Catch
There is a catch. Grid jobs are submitted through systems such as the CREAM CE and the EMI WMS. The jobs are expressed in a language called JDL (job description language) and over time, there have been attempts to treat certain expressions in this language as parameters that should be passed to the local batch system (a.k.a. LRMS). Not all of the advertised ways work equally well.
What doesn't work
Let's start by mentioning what doesn't work. This has been tested on a reasonably up-to-date system, but not the latest and greatest. The software in production comes from the UMD-3 fold.
glite-wms-ui-commands | 3.4.0 | UMD3 |
glite-wms-core | 3.5.0 | UMD3 |
emi-cream-ce | 1.2.2 | UMD3 |
According to the CREAM documentation, A requirement like
Requirements = "other.GlueHostMainMemoryRAMSize > 100"
which states that the host should have at least 100MB of main memory, should be forwarded to the CREAM as a CeRequirement if the WMS configuration (/etc/glite-wms/glite_wms.conf) lists the parameter in the CeForwardParameters setting (in the WorkloadManager section). This doesn't work. Not sure if it ever did, but it certainly did not with the above versions of the software. Since this was tracked in Savannah bug #42288, this should have worked so it might be a regression.
Another way, although not widely advertised, involves the generic parameter passing from the WMS through the PropagateToLRMS
PropagateToLRMS= { [ name = "smpgranularity"; value = jdl.SMPGranularity ], [ name = "wholenodes"; value = jdl.WholeNodes ; requires = jdl.WholeNodes == true; ], [ name = "hostsmpsize"; value = ce.GlueHostArchitectureSMPSize ], [ name = "mpi_type"; value = jdl.MpiType; requires = ce.GlueCEInfoLRMSType == "lsf"; ], [ name = "hostmainmem"; value = ce.GlueHostMainMemoryRAMSize; requires = ce.GlueCEInfoLRMSType == "pbs"; ] };
See Savannah bug #58878. This also does not work.
WMS to GRAM5
What certainly doesn't work is passing parameters to GRAM5 from the WMS. Memory requirements are not passed at all, and multi-core requirements are somehow boggled as well. A JDL statement like
SMPGranularity = 4; CPuNumber = 4;
is passed straight to CREAM and correctly translated to the PBS script
#PBS -l nodes=1:ppn=4
but when sent to GRAM5, the Globus RSL reads
(queue=short)(jobtype=single)(count=4)(hostCount=4)
which is translated to
#PBS -l nodes=4
and that is certainly not correct.
This information can be found by running commands like glite-wms-job-info, which shows the JDL (original when given -j, or as passed to CREAM when given --jdl); or "glite-wms-job-logging-info -v 3" which shows the RSL sent to Globus.
What does work
It turns out that it's possible to just use the CeRequirements parameter in the JDL. This works both for CREAM (direct submission), but apparently also for the WMS. There is little documentation, but see this article on WNoDeS.
The following JDL requests 8 GB of memory:
Executable = "memlimits.sh"; Stdoutput = "stdout.txt"; StdError = "stderror.txt"; InputSandBox = "memlimits.sh"; OutputSandbox = {"stdout.txt","stderror.txt"}; CERequirements = "other.GlueHostMainMemoryRAMSize >= 8192";
This works, but not in combination with a GRAM5 CE.
Here is the memlimits.sh script used above. It just allocates memory until it hits the limit.
#!/bin/bash echo "\$ ulimit -a" ulimit -a cat > memeat.c <<'EOF' #include <stdio.h> #include <stdlib.h> /* eat up memory until fail or crash */ int main(int argc, char *argv[]) { size_t chunk = 1024 * 1024; /* one mega-byte allocation size */ size_t total = 0; /* how many bytes we got in total */ printf("Going to allocate %zi bytes at a time\n", chunk); while(1) { printf("."); if (NULL == malloc(chunk)) { break; } total += chunk; } printf("\nThat's our limit. Total allocation: %zi bytes (%zi kB)\n", total, (total / 1024)); return 0; } EOF echo "compiling memeat.c" gcc -o memeat memeat.c echo "running memeat.c. Cross fingers." ./memeat echo "All done, exit code = $?"