Working with CREAM CE servers and VOs

From PDP/Grid Wiki
Jump to navigationJump to search

Introduction

The Computing Resource Execution And Management (CREAM) Service is used for job management across nodes in a Computing Element (CE). Nikhef has 4 production and 2 test CREAM CE servers that run in the Nikhef Grid Cluster. (The most authoritative list of the current CREAM CE servers can be found at: NDPF:System_Functions.)

Production servers
klomp.nikhef.nl
juk.nikhef.nl
gazon.nikhef.nl
stremsel.nikhef.nl
Testbed (ITB) servers
tbn15.nikhef.nl
tbn21.nikhef.nl

The CREAM CE servers allow valid users to submit jobs so this means authorization plays a role in the CREAM CE function. These are managed by Quattor from stal.nikhef.nl and are running UMD-4.

This documentation page covers enabling a new assurance profile on the CREAM CE servers and enabling a specific VO to run jobs.

Adding plugins in Quattor

Enabling a differentiated assurance profile for access to the Grid cluster requires installing additional packages for the CREAM CE servers. In this case we enabled Identifier-Only Trust Assurance (IOTA) or Combined Assurance/Adequacy Model (CAM). (More information can be found in the IGTF documentation on Combined Assurance.)

We added the LCMAPS-plugins-vo-ca-ap package to the rpms template in /project/quattor/conf/cfg/grid/umd-4/glite/ce/rpms.tpl

'/software/packages/{lcmaps-plugins-vo-ca-ap}' = nlist();

and the ca-policy-egi-cam package as a new template in /project/quattor/conf/cfg/grid/common/security/

template common/security/ca-policy-egi-cam;
"/software/packages/{ca-policy-egi-cam}" = nlist();

Since LCMAPS is managed by YAIM (which is independent of the Quattor deployment), the variable VO_CA_AP_FILE needed to be defined in a few places:

  • /quattor/conf/cfg/grid/yaim/mapping_functions.tpl (to map the variable from Quattor to YAIM)
  • /quattor/conf/cfg/sites/ndpf/site/config/yaim.tpl (template that YAIM takes as input for the site-info.pre file which is used to create the lcmaps.db on the CREAM CE server)

Creating vo-ca-ap-file

This file is a template in quattor under grid/common/security which will later be pushed to the CREAM CE servers. More detailed information about the LCMAPS plugin can be found on the LCMAPS Plugin (vo-ca-ap) wiki page.

unique template common/security/vo-ca-ap-file;

include { 'components/filecopy/config' };

variable VO_CA_AP_FILE ?='/etc/grid-security/vo-ca-ap-file'; 

variable VO_CA_AP_CONTENT =<<EOF;
# EGI Combined Adequacy Assurance Model (CAM) reference configuration
# for LCMAPS-plugins-vo-ca-ap. For instructions, see
#   https://wiki.nikhef.nl/grid/Lcmaps-plugins-vo-ca-ap
# Reference configuation follows https://documents.egi.eu/document/2930
# and associated implementation guidance as for the EGI SPG processes
#
# EGI egi-cam version 1.95
#
/atlas   		file:policy-egi-core.info, file:policy-egi-cam.info
/cms     		file:policy-egi-core.info, file:policy-egi-cam.info
/lhcb    		file:policy-egi-core.info, file:policy-egi-cam.info
/alice  		file:policy-egi-core.info, file:policy-egi-cam.info
#/lsgrid/Project_MinE	file:policy-egi-core.info, file:policy-egi-cam.info
#subgroups are not supported at the moment so lsgrid was added
/lsgrid			file:policy-egi-core.info, file:policy-egi-cam.info
# Default policy for other VOs:
/*       		file:policy-egi-core.info
#
# for non-VOMS enabled credentials, supports only core policy trust anchors:
"-"      file:policy-egi-core.info
EOF

"/software/components/filecopy/services" = npush(
    escape(VO_CA_AP_FILE),
      nlist("config",VO_CA_AP_CONTENT,
            "perms","0644",
            "owner","root"),
);

Rebuilding Nikhef RPMs

The variable(s) (VO_CA_AP_FILE) also needed to be defined/updated in the nikhef-yaim-cream-ce and nikhef-yaim-core rpms. These rpms are built locally from an svn repository. [1] The two RPMs needed an additional function and variable.

  • in trunk/nikhef-yaim-core/functions/local/config_lcas_lcmaps_gt4 (this file creates the lcmaps.db file on the server which needs to now included the vo-ca-ap policy)
vo_ca_ap = "lcmaps_vo_ca_ap.mod"
           " -certdir ${X509_CERT_DIR}"
           " -vo_ca_ap_file ${VO_CA_AP_FILE}"
# policies
withvoms:
vo_ca_ap -> vomslocalgroup
vomslocalgroup -> vomslocalaccount
vomslocalaccount -> posix_enf | vomspoolaccount
vomspoolaccount -> posix_enf
standard:
vo_ca_ap -> localaccount
localaccount -> posix_enf | poolaccount
poolaccount -> posix_enf
  • and again in trunk/nikhef-yaim-cream-ce/functions/local/config_cream_glexec
vo_ca_ap = "lcmaps_vo_ca_ap.mod"
           " -certdir ${X509_CERT_DIR}"
           " -vo_ca_ap_file ${VO_CA_AP_FILE}"
# policies
withvoms:
proxycheck -> vo_ca_ap
vo_ca_ap -> vomslocalgroup
vomslocalgroup -> vomslocalaccount
vomslocalaccount -> posix_enf | vomspoolaccount
vomspoolaccount -> posix_enf
standard:
proxycheck -> vo_ca_ap
vo_ca_ap -> localaccount
localaccount -> posix_enf | poolaccount
poolaccount -> posix_enf

If the files do not exist to amend, they are available from the EGI repository: http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/ and search for the latest yaim-core release.

Next, the RPMs are built using the makefile available in the svn repository. The standard directory structure for building an RPM needs to be available where the RPM is being built: BUILD, BUILDROOT, RPMS, SOURCES, SPECS, and SRPMS directories.

The new noarchs are then copied to stal.nikhef.nl under /project/quattor/www/html/mirror/nikhef/tools/el6/... Finally, there needs to be a new snapshot in the Quattor profile:

makesnapshot nikhef-tools-el6 YEARMONTHDAY
deploy-to-mirrors

(The deploy-to-mirrors command will push the updated snapshot to stalkaars-01 and stalkaars-03.)

Edit the nikhef-tools snapshot in quattor: /srv/quattor/etc/repository-snapshots/nikhef-tools-el6.conf Also edit the snapshot date in quattor: conf/cfg/clusters/itb/repository/config.tpl

For prd, edit /project/quattor/conf/cfg/sites/ndpf/site/repository/default-snapshots.tpl with the updated snapshot date.

More policy implementations

Add the ca-policy-egi-cam to the general CA policy (/srv/quattor/conf/cfg/grid/common/security/CA.tpl)

include { 'common/security/ca-policy-egi-core' };

This also needs to be added to the CE (/srv/quattor/conf/cfg/grid/umd-4/glite/ce/service.tpl)

# include CAs
include { 'common/security/CA' };
# including cam policy for wn
include { 'common/security/ca-policy-egi-cam' };

and worker nodes (/srv/quattor/conf/cfg/grid/umd-4/glite/wn/service.tpl)

# include CAs
include { 'common/security/CA' };
# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes should not be using the IOTA CAs
include { 'common/security/ca-policy-egi-cam' };
# write the Nikhef vo-ca-ap file for nikhef policies
include { 'common/security/vo-ca-ap-file' };


If needed, add the VOs to the CREAM CE node profile under clusters/itb/profiles/ in Quattor.

Pushing the changes

Check if the configuration builds with

makexprof -f itb

Debug any mistakes, then push the configuration to the testbed nodes.

pushxprof -f itb

Once this is done it will be required to rerun yaim manually on the server. The software packages that have been pushed from quattor are triggered on the updated servers, however, the CREAM CE servers do not know that YAIM needs to be run to update the configuration. Therefore running:

ncm-ncd --co yaim

should allow the updated policies to be implemented on the servers.

Checking if it worked...

Before pushing the changes into production (prd), it is advised to push the changes to the testbed (ITB) and run a simple test job with the help of a user.

Log onto a test CREAM CE server (i.e., tbn15.nikhef.nl) and check:

  • check the lcmaps-plugin has been installed
yum search lcmaps-plugins-vo-ca-ap
  • /etc/lcmaps/lcmaps.db and /etc/lcmaps/lcmaps-glexec.db have the vo-ca-ap policies updated in the files.
  • and/or check the config files are correct under /opt/glite/yaim/functions/config_cream_glexec and config_lcas_lcmaps_gt4.
  • and the /etc/grid-security/vo-ca-ap-file
  • check /usr/share/doc/ca-policy-egi/ca-policy-egi-cam.vo-ca-ap

Troubleshooting hints

  • this will manually rerun YAIM on the server: ncm-ncd --co yaim
  • /opt/glite/yaim/bin/yaim -c -s /etc/siteinfo/lcg-quattor-site-info.def -n creamCE -n TORQUE_utils
  • useful logs to check the quattor deployment: /var/log/ncm/ncd.log (more verbose information is available from /var/log/ncm-cdispd.log)

Useful references

  1. If you need to check out the repository, the link is: [svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk]
  2. IGTF documentation on Combined Assurance
  3. LCMAPS Plugin (vo-ca-ap)
  4. Other notes from David G:

The quickest fix is to (first) allow the DOGWOOD CA bundle and configure the LCMAPS plugin accordingly. ...Such full support also required new software and configuration at each resource centre. You must deploy the additional trust anchor meta-packages and the new local policies in unison, and never install the cam package without such software support.