Difference between revisions of "Enabling multicore jobs and jobs requesting large amounts of memory"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 56: Line 56:
  
 
For submission to the Nikhef CEs, make sure the JDL file is submitted to Cream endpoint:
 
For submission to the Nikhef CEs, make sure the JDL file is submitted to Cream endpoint:
  https://<CE>.nikhef.nl:8443/cream-pbs-flex
+
  <CE>.nikhef.nl:8443/cream-pbs-flex
 
or make sure to submit to queue ''flex''.
 
or make sure to submit to queue ''flex''.
  

Revision as of 13:42, 10 August 2012

This article describes

Introduction

Certain applications will benefit from access to more than core (logical CPU) on the same physical computer. Grid jobs that use more than one core are referred to as multicore jobs.

Other applications require a specific amount of memory to run efficiently or successfully. Such jobs are called large-memory jobs in this article (because they often require a higher-than-default amount of memory on the machine).

The Cream Computing Elements (CreamCEs) offer support for multicore jobs and large-memory jobs, although they need some additional configuration to forward the job requirements to the batch system.

This article describes the support of multicore jobs or large-memory jobs at the Cream Computing Elements at Nikhef. Section System Configuration describes the setup of the system. In section Submitting multicore or (large) memory jobs, the information relevant to end users is presented.

The information presented here is valid for the UMD-1 version of the CreamCE in combination with a batch system based on Torque 2.3. Other versions of the CreamCE (in particular the nearly unsupported gLite 3.2 version) may require different configuration. Other versions of the Torque batch system may work fine, although that hasn't been verified. Different batch systems fall outside the scope of this article.

System Configuration

Two services are involved in the submission of grid jobs requiring multiple cores or specific amounts of memory: the CreamCE and the Torque batch system. The CreamCE is the entry point for a grid job at a site. The CreamCE processes the resource requests (passed via the JDL file) and translates them into a format that is specific for the batch system implementation. The batch system can then allocate the requested resources.

Setup at the CreamCE

To support multicore jobs or memory requests, the CreamCE must recognize these requests and translate them into directives for the Torque batch system. In the implementation discussed here, the CreamCE needs 3 files:

  • A script to process specific resource requests Media:pbs_local_submit_attributes.sh. This file should be installed as /usr/bin/pbs_local_submit_attributes.sh on the CreamCE. It processes memory resource requests.
  • A configuration file to activate this submit filter Media:torque.cfg. This file should be stored as /var/spool/pbs/torque.cfg.
  • A Torque submit filter to write the input for the batch system Media:torque_submit_filter.pl. This file should be installed as /usr/local/sbin/torque_submit_filter.pl (actually: the location specified in file torque.cfg above as value for SUBMITFILTER). The submit filter is discussed below in more detail.

The submit filter

The submit filter processes the initial input for the batch system. It inspects the requested resources and may change them before writing the (possibly modified) input to standard output (which is forwarded to the batch server).

By default, the batch system allocates 1 core and (optionally) a certain amount of memory to each job. It is reasonable that a multicore job requesting N cores can also access N times the amount of memory of a single job.

For jobs requesting a certain amount of memory, the situation is the opposite. When a job requests M times the amount of memory of a default job, that means that M-1 other jobs cannot run on the same machine. From resource point of view, the large-memory job is thus equivalent to a job requesting M cores. In the resource usage accounting system, both types of jobs should be treated as if they are N or M simple jobs.

The submit filter presented here scales the amount of memory available to a multicore job with the number of requested cores. The amount of memory per core is obtained from the queue definition (see subsection Setup at the batch system), possibly capped by a maximum (which is also taken from the queue definition). Note: Torque allows to define defaults and maximum limits for physical memory (option mem), physical memory per process (pmem), virtual memory (vmem), virtual memory per process (pvem). The submit script will scale mem and vmem with the number of cores; the per-process limits pvmem and pmem scale with the maximum number of cores on any of the allocated physical machines (in case more than one physical machine is used).

For large-memory jobs, the submit filter scales the number of allocated cores with the amount of requested memory divided by the amount of memory per single job; the number of cores is rounded up to the nearest integer. A large-memory jobs also gets more than one core allocated for computing.

After the number of cores has been established, the maximum CPU time available to the job will be computed. This limit is the product of the allocated number of cores and the default CPU time limit for a single core job.

Setup at the batch system

As described in the previous section, the submit filter will try to determine the default amount of memory per job slot and the CPU time limit per core from the queue definition. In addition, there may be limits to the amount of memory. These limits have to be defined per queue in Torque via the parameters resources_default and resources_max:

set queue <QUEUE> resources_default.cput = <HH:MM:ss>
set queue <QUEUE> resources_default.pmem = <N>mb
set queue <QUEUE> resources_max.cput = <HH:MM:ss>
set queue <QUEUE> resources_max.pmem = <M>mb

The above may be extended with limits or defaults for vmem, mem and pvmem, depending on site policies.

Note: although not required by Torque, the grid information system for certain VOs expects to find a CPU time limit. So not defining a CPU time limit will cause problems.

At Nikhef, a dedicated queue flex was created for flexible resource requests. The number of jobs that can be queued/running in this queue is limited.

Submitting multicore or (large) memory jobs

Resource requirements are specified in the JDL file that is passed during submission to the WMS or CreamCE.

For submission to the Nikhef CEs, make sure the JDL file is submitted to Cream endpoint:

<CE>.nikhef.nl:8443/cream-pbs-flex

or make sure to submit to queue flex.

Note: jobs requesting multiple cores on the same physical machines (or equivalently, large amounts of memory) will typically wait much longer in the batch queue before they can run than single core jobs. On busy systems it will take some time before the scheduler can find all requested resources on one physical machine.

Note: the examples below are valid for the UMD-1 version of the CreamCE. They differ from the parameters that should be used when submitting to the gLite 3.2 version of the CreamCE!

Multicore jobs

Job requirements concerning the number of reserved cores and the number of cores on a physical machines can be specified via the attributes CpuNumber and SMPgranularity, respectively.

Example: to request 4 cores on 1 node, use the following statements in the JDL:

 SMPgranularity = 4;
 CpuNumber = 4;

Note: this will fail if the requested number of cores is larger than the number of cores present in the physical machine (actually: configured number of cores in the physical machine).

A similar construction can be used to request multiple cores on multiple physical machines. For example, M cores at multiple machines, requiring N cores per machine (where M > N):

 SMPgranularity = N;
 CpuNumber = M;

The batch system will try to find (M/N) physical machines, with N cores per machine. If M is not an integer multiple of N, the batch system will reserve int(M/N) machines with N cores and 1 machine with (M mod N) cores. Say M=8 and N=3, then there will be int(8/3)=2 machines with 3 cores and 1 machines with (8 mod 3)=2 cores.

Large-memory jobs

The JDL attribute CERequirements can be used to request a physical amount of memory on the node via the field other.GlueHostMainMemoryRAMSize. The unit is MB.

Example: to request at least 8 GB physical memory on a node, use the following statement in the JDL:

 CERequirements = "other.GlueHostMainMemoryRAMSize >= 8192";

Whole-node jobs

To request all resources on one node, the attribute WholeNodes should be set to true:

 WholeNodes = true;