Working with CREAM CE servers and VOs
Introduction
The Computing Resource Execution And Management (CREAM) Service is used for job management across nodes in a Computing Element (CE). Nikhef has 4 production and 2 test CREAM CE servers that run in the Nikhef Grid Cluster. (The most authoritative list of the current CREAM CE servers can be found at: NDPF:System_Functions).
Production servers
- klomp.nikhef.nl
- juk.nikhef.nl
- gazon.nikhef.nl
- stremsel.nikhef.nl
Testbed (ITB) servers
- tbn15.nikhef.nl
- tbn21.nikhef.nl
These are managed by Quattor from stal.nikhef.nl and are running UMD-4.
The CREAM CE servers allow valid users to submit jobs so this means authorization plays a role in the CREAM CE function.
Adding plugins in Quattor
If you need to enable a differentiated assurance profile for access to the systems, you'll need to install some additional packages for the CREAM CE servers. In this case we have enabled Identifier-Only Trust Assurance (IOTA) or Combined Assurance/Adequacy Model (CAM).
We added the LCMAPS-plugins-vo-ca-ap package to the rpms template in /project/quattor/conf/cfg/grid/umd-4/glite/ce/rpms.tpl
'/software/packages/{lcmaps-plugins-vo-ca-ap}' = nlist();
And the ca-policy-egi-cam package as a new template in /project/quattor/conf/cfg/grid/common/security/
template common/security/ca-policy-egi-cam; "/software/packages/{ca-policy-egi-cam}" = nlist();
Since LCMAPS is managed by YAIM (which is independent of the Quattor deployment), the variable VO_CA_AP_FILE needs to be defined in a few places:
- /quattor/conf/cfg/grid/yaim/mapping_functions.tpl (to map the variable from Quattor to YAIM)
- /quattor/conf/cfg/sites/ndpf/site/config/yaim.tpl (template that YAIM takes as input for the site-info.pre file which is used to create the lcmaps.db on the CREAM CE server)
Rebuilding Nikhef RPMs
The variable(s) (VO_CA_AP_FILE) need to be defined/updated in the nikhef-yaim-cream-ce and nikhef-yaim-core rpms. These rpms are built locally from an svn repository. [1] The two RPMs need an additional function and variable.
- in trunk/nikhef-yaim-core/functions/local/config_lcas_lcmaps_gt4 (this file creates the lcmaps.db file on the server which needs to now included the vo-ca-ap policy)
vo_ca_ap = "lcmaps_vo_ca_ap.mod" " -certdir ${X509_CERT_DIR}" " -vo_ca_ap_file ${VO_CA_AP_FILE}" # policies withvoms: vo_ca_ap -> vomslocalgroup vomslocalgroup -> vomslocalaccount vomslocalaccount -> posix_enf | vomspoolaccount vomspoolaccount -> posix_enf standard: vo_ca_ap -> localaccount localaccount -> posix_enf | poolaccount poolaccount -> posix_enf
- and again in trunk/nikhef-yaim-cream-ce/functions/local/config_cream_glexec
vo_ca_ap = "lcmaps_vo_ca_ap.mod" " -certdir ${X509_CERT_DIR}" " -vo_ca_ap_file ${VO_CA_AP_FILE}" # policies withvoms: proxycheck -> vo_ca_ap vo_ca_ap -> vomslocalgroup vomslocalgroup -> vomslocalaccount vomslocalaccount -> posix_enf | vomspoolaccount vomspoolaccount -> posix_enf standard: proxycheck -> vo_ca_ap vo_ca_ap -> localaccount localaccount -> posix_enf | poolaccount poolaccount -> posix_enf
If you do no have the files to amend, they are available from the EGI repo: http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/ and look for the latest yaim-core release.
Next, build the RPMs using the makefile available in the svn repository. You will need to create the standard structure for building an RPM by creating directories called BUILD, BUILDROOT, RPMS, SOURCES, SPECS, and SRPMS.
Copy the new noarchs to stal.nikhef.nl under /project/quattor/www/html/mirror/nikhef/tools/el6/...
Create a new snapshot in the Quattor profile:
makesnapshot nikhef-tools-el6 YEARMONTHDAY deploy-to-mirrors
(The deploy-to-mirrors command will push the updated snapshot to stalkaars-01 and stalkaars-03.)
Edit the nikhef-tools snapshot in quattor: /srv/quattor/etc/repository-snapshots/nikhef-tools-el6.conf
More policy implementations
Add the ca-policy-egi-cam to the general CA policy (/srv/quattor/conf/cfg/grid/common/security/CA.tpl)
include { 'common/security/ca-policy-egi-core' };
This also needs to be added to the CE (/srv/quattor/conf/cfg/grid/umd-4/glite/ce/service.tpl)
# include CAs include { 'common/security/CA' }; # including cam policy for wn include { 'common/security/ca-policy-egi-cam' };
and worker nodes (/srv/quattor/conf/cfg/grid/umd-4/glite/wn/service.tpl)
# include CAs include { 'common/security/CA' }; # manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes should not be using the IOTA CAs include { 'common/security/ca-policy-egi-cam' }; # write the Nikhef vo-ca-ap file for nikhef policies include { 'common/security/vo-ca-ap-file' };
Pushing the changes
makexprof -f itb
debug any mistakes
pushxprof -f itb
Checking if it worked...
Before pushing the changes into production (prd), it is advised to push the changes to the testbed (ITB) and run a simple test job with the help of a user.
Log onto a test CREAM CE server (i.e., tbn15.nikhef.nl) and check:
- check the lcmaps-plugin has been installed
yum search lcmaps-plugins-vo-ca-ap
- /etc/lcmaps/lcmaps.db and /etc/lcmaps/lcmaps-glexec.db have the vo-ca-ap policies updated in the files.
- and/or check the config files are correct under /opt/glite/yaim/functions/config_cream_glexec and config_lcas_lcmaps_gt4.
- and the /etc/grid-security/vo-ca-ap-file
- check /usr/share/doc/ca-policy-egi/ca-policy-egi-cam.vo-ca-ap
Troubleshooting hints
- this will manually rerun YAIM: /opt/glite/yaim/bin/yaim
- /opt/glite/yaim/bin/yaim -c -s /etc/siteinfo/lcg-quattor-site-info.def -n creamCE -n TORQUE_utils
- useful logs to check the quattor deployment: /var/log/ncm/ncd.log (more verbose information is available from /var/log/ncm-cdispd.log)
Useful references
- If you need to check out the repository, the link is: [svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk]
- IGTF documentation on Combined Assurance
- LCMAPS Plugin (vo-ca-ap)
- Other notes from David G:
The quickest fix is to (first) allow the DOGWOOD CA bundle and configure the LCMAPS plugin accordingly. ...Such full support also required new software and configuration at each resource centre. You must deploy the additional trust anchor meta-packages and the new local policies in unison, and never install the cam package without such software support.