Difference between revisions of "GRAM5 In EGI"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 122: Line 122:
  
 
== globus-gram-job-manager-pbs-nikhef ==
 
== globus-gram-job-manager-pbs-nikhef ==
Replaces the standard pbs job manager, adding VOMS-based accounting info, applying the VDT and EDG patches on executable validation, and forcing cwd to $TMPDIR by default to forestall clueless users.
+
Replaces the standard pbs job manager, adding VOMS-based accounting info, applying the VDT and EDG patches on executable validation, and forcing cwd to $TMPDIR by default to forestall inadvertent home directory use by the end-users. The configuration file /etc/globus/globus-pbs.conf is extended with a single variable setting
 +
vomsinfo=''path-to-voms-proxy-info''
  
 
Source: https://ndpfsvn.nikhef.nl/repos/pdpsoft/trunk/nl.nikhef.ndpf.tools/globus-gram-job-manager-pbs-nikhef/
 
Source: https://ndpfsvn.nikhef.nl/repos/pdpsoft/trunk/nl.nikhef.ndpf.tools/globus-gram-job-manager-pbs-nikhef/
  
 
Tar-ball: http://software.nikhef.nl/temporary/umd-gram5/tgz/globus-gram-job-manager-pbs-nikhef-0.2.src.tgz
 
Tar-ball: http://software.nikhef.nl/temporary/umd-gram5/tgz/globus-gram-job-manager-pbs-nikhef-0.2.src.tgz
 +
 +
Example config:
 +
 +
log_path="/var/spool/pbs/server_logs"
 +
pbs_default="stro.nikhef.nl"
 +
mpiexec=yes
 +
qsub="/usr/bin/qsub"
 +
qstat="/usr/bin/qstat"
 +
qdel="/usr/bin/qdel"
 +
cluster="1"
 +
remote_shell="no"
 +
cpu_per_node="1"
 +
softenv_dir=
 +
vomsinfo="/usr/bin/voms-proxy-info"
  
 
== globus-gram5-glue2-info-providers ==
 
== globus-gram5-glue2-info-providers ==

Revision as of 16:58, 25 June 2012

A computing element (CE) in the EGI is far more than an interface to submit jobs: the ecosystem has a lot of implicit assumptions as to what a CE is supposed to do, the set of services that it should run, the information it should publish in a variety of formats, and the programs that have to be installed on the CE itself to 'play nice' with the other EGI services. In this writeup, we attempt to document what needs to be done to turn the basic GRAM5 CE into a service which integrated with EGI. To do this, we use the GRAM5 service from the Initiative for Globus in Europe (IGE), as well as other components distributed via the EGI UMD repository.

Goals of the GRAM5 CE

The GRAM5 service we have in mind should support the following:

  • provide a GRAM5/GRAM2 compatible submission interface (what you get from IGE gram5 proper)
  • resemble the LCG-CE application interface
  • be visible to end-users using lcg-infosites
  • be visible to the EMI Workload Management System (WMS) and 'automatically' attract jobs for supported VOs
  • interoperate with GLUE2 information system clients
  • support legacy GLUE1.3 (mainly for the WMS and lcg-infosites)
  • support all IGTF CAs and honour CRLs
  • fully support VOMS FQAN mappings and poolaccounts
  • also allow local accounts to be used in conjunction with (and override) VOMS FQANs
  • be configured for scalability using the Scheduler Event Generator (SEG)
  • be resilient to users: use TMPDIR as default home, and find executables using PATH
  • support accounting output in NDPF (CREAMy) compatible form
  • log details in conformance to the CSIRT requirements

Requisite software

Add the repository at http://software.nikhef.nl/temporary/umd-gram5/rhel5/RPMS/ to your yum repos, preferably using a local mirror:

[NIKHEF-UMD-GRAM5]
name=Nikhef UMD GRAM5 extras
baseurl=http://software.nikhef.nl/temporary/umd-gram5/rhel5/RPMS/
gpgcheck=0
enabled=1

Base installation

  • Install either "ige-meta-globus-gram5-2.1-1.el5.noarch.rpm" and then remove the superfluous packages, or install the list of RPMs for basic GRAM5.

Information system

  • Add the BDII server from UMD by adding/installing the following RPMs (for Torque)
"bdii","5.2.5-2.el5","noarch"
"glite-yaim-bdii","4.3.4-1.el5","noarch"
"glite-yaim-torque-utils","5.0.0-1.sl5","noarch"
  • Install Yaim to be able to configure BDII and the Glue1.3 information providers, if not already installed via Yum because of glite-yaim-bdii
"glite-yaim-core","5.0.2-1.sl5","noarch"
  • Install the GLUE schemas and basic information providers (in GIP style) from gLite/EMI
"glue-schema","2.0.8-1.el5","noarch"
"glite-info-provider-service", "1.7.0-1.el5", "noarch"
  • Install the dynamic infoproviders from LCG, which are used for GLUE1.3 information (GLUE2 only has static bits for the moment):
"lcg-info-dynamic-pbs", "2.0.0-1.sl5", "noarch"
"lcg-info-dynamic-scheduler-pbs", "2.2.1-1.sl5", "noarch"
"lcg-info-dynamic-scheduler-generic", "2.3.5-1.sl5", "noarch"

Mkgridmap support for local users

To support local users, either dynamically generated from LDAP or VOMS or via local additions, install the mkgridmap tool:

"edg-mkgridmap", "4.0.0-1", "noarch"

CRL support

Install fetch-crl version 3 from EPEL for RHEL5:

"fetch-crl3", "3.0.7-1.el5", "noarch"

WMS support

The gLite WMS assumed all kinds of LB stuff are already magically installed on every CE in the world. Let's make the WMS happy, and add:

"glite-lb-logger", "2.2.6-1.sl5", "x86_64"
"glite-lb-client", "5.0.8-1.sl5", "x86_64"

Scalability and accounting fixes for the PBS job manager

This only works for you if you use Torque/PBS: install the Nikhef pbs job manager to get TMPDIR relocation, proper accounting log files, and VOMS FQAN logging for accounting. Install the one with Yum from the Nikhef repo:

"globus-gram-job-manager-pbs-nikhef", "0.2-1", "noarch"

from the Nikhef NDPF repository. This package obsoletes the default Globus pbs.pm job manager and replaces it with a modified version. The source it at https://ndpfsvn.nikhef.nl/repos/pdpsoft/trunk/nl.nikhef.ndpf.tools/globus-gram-job-manager-pbs-nikhef/ and the RPM at http://software.nikhef.nl/temporary/umd-gram5/rhel5/RPMS/noarch/globus-gram-job-manager-pbs-nikhef-0.2-2.noarch.rpm

Dependencies

Installing the above packages via Yum, iwth the EPEL and UMD repositories enables, will automatically download and install all dependencies. For Quattor, use the "checkdeps" tool.

Configuration

Lists and templates

RPM package lists

Superfluous packages in IGE meta-package via UMD

The following packages are not needed for proper functioning, once you already have a functioning batch system client setup. You will not need client and server tools for each and every scheduler in the world, which is what you get once you do dependency resolution through UMD. UMD ships all kinds of batch system client that may not be the version you want. Install your own favourite version of a batch system and be content. Also: only install the batch system plugins for the jobmanager that you really need:

fedora-usermgmt
fedora-usermgmt-core
fedora-usermgmt-default-fedora-setup
fedora-usermgmt-shadow-utils

gridengine
libtorque
munge
munge-libs
torque
torque-client

and after installing your own batch system client, pickone of

globus-gram-job-manager-condor
globus-gram-job-manager-pbs
globus-gram-job-manager-sge

or they'll complain about missing dependencies.

Specific Packages

globus-gram-job-manager-pbs-nikhef

Replaces the standard pbs job manager, adding VOMS-based accounting info, applying the VDT and EDG patches on executable validation, and forcing cwd to $TMPDIR by default to forestall inadvertent home directory use by the end-users. The configuration file /etc/globus/globus-pbs.conf is extended with a single variable setting

vomsinfo=path-to-voms-proxy-info

Source: https://ndpfsvn.nikhef.nl/repos/pdpsoft/trunk/nl.nikhef.ndpf.tools/globus-gram-job-manager-pbs-nikhef/

Tar-ball: http://software.nikhef.nl/temporary/umd-gram5/tgz/globus-gram-job-manager-pbs-nikhef-0.2.src.tgz

Example config:

log_path="/var/spool/pbs/server_logs"
pbs_default="stro.nikhef.nl"
mpiexec=yes
qsub="/usr/bin/qsub"
qstat="/usr/bin/qstat"
qdel="/usr/bin/qdel"
cluster="1"
remote_shell="no"
cpu_per_node="1"
softenv_dir=
vomsinfo="/usr/bin/voms-proxy-info"

globus-gram5-glue2-info-providers

The BDII-GIP style information providers for static information in GLUE2 format for the gatekeeper. This has the arbitrary choices for some of the GLUE2 values

Source: https://ndpfsvn.nikhef.nl/repos/pdpsoft/trunk/nl.nikhef.ndpf.tools/globus-gram5-glue2-info-providers/

Tar-ball: http://software.nikhef.nl/temporary/umd-gram5/tgz/globus-gram5-glue2-info-providers-0.2.src.tgz

globus-yaim-gram5

The GIP confoguration function, and the gram5 node type that also configs the use of LCAS/LCMAPS, the VOMSES information, and GIP. The node-type is full-fledged including:

gram5_FUNCTIONS="
config_vomsdir
config_vomses
config_users
config_vomsmap
config_mkgridmap
config_lcas_lcmaps_gt4
config_gip_gram5_glue2
config_gip_gram5_glue13
"

Source: https://ndpfsvn.nikhef.nl/repos/pdpsoft/trunk/nl.nikhef.ndpf.tools/globus-yaim-gram5/

Tar-ball: http://software.nikhef.nl/temporary/umd-gram5/tgz/globus-yaim-gram5-0.3.src.tgz