Enabling multicore jobs and jobs requesting large amounts of memory

From PDP/Grid Wiki
Jump to navigationJump to search

This article describes

Introduction

Certain applications will benefit from access to more than core (logical CPU) on the same physical computer. Grid jobs that use more than one core are referred to as multicore jobs.

Other applications require a specific amount of memory to run efficiently or successfully. Such jobs are called large-memory jobs in this article (because they often require a higher-than-default amount of memory on the machine).

The Cream Computing Elements (CreamCEs) offer support for multicore jobs and large-memory jobs, although they need some additional configuration to forward the job requirements to the batch system.

This article describes the support of multicore jobs or large-memory jobs at the Cream Computing Elements at Nikhef. Section "System Configuration" describes the setup of the system. In section "Submitting multicore or (large) memory jobs", the information relevant to users of the Computing Elements is presented.

The information presented here is valid for the UMD-1 version of the CreamCE in combination with a batch system based on Torque 2.3. Other versions of the CreamCE (in particular the nearly unsupported gLite 3.2 version) may require different configuration. Other versions of the Torque batch system may work fine, although that hasn't been verified. Different batch systems fall outside the scope of this article.


System Configuration

Two services are involved in the submission of grid jobs requiring multiple cores or specific amounts of memory: the CreamCE and the Torque batch system. The CreamCE is the entry point for a grid job at a site. The CreamCE processes the resource requests and translates them into a format that is specific for the batch system implementation. The batch system can then allocate the requested resources.

To support multicore jobs or memory requests, the CreamCE must recognize these requests and translate them into directives for the Torque batch system. In the implementation discussed here, the CreamCE needs 3 files:

* A script to process specific resource requests
* A Torque submit filter to write the input for the batch system
* A configuration file to activate this submit filter

Submitting multicore or (large) memory jobs