Difference between revisions of "Working with CREAM CE servers and VOs"
Line 50: | Line 50: | ||
"/software/packages/{ca-policy-egi-cam}" = nlist(); | "/software/packages/{ca-policy-egi-cam}" = nlist(); | ||
− | 4. The worker nodes needed to load these packages from /srv/quattor/conf/cfg/grid/umd-4/glite/wn-torque4/service.tpl. (Note! The /srv/quattor/conf/cfg/grid/umd-4/glite/wn/ will not push any packages to the worker nodes. It is no longer an active directory.) | + | 4. The worker nodes needed to load these packages from /srv/quattor/conf/cfg/grid/umd-4/glite/wn-torque4/service.tpl. (Note! The /srv/quattor/conf/cfg/grid/umd-4/glite/wn/ will not push any packages to the worker nodes. It is no longer an active directory.) So these lines were added to the service.tpl: |
# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes should not be using the IOTA CAs | # manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes should not be using the IOTA CAs |
Revision as of 16:04, 6 March 2019
Introduction
The Computing Resource Execution And Management (CREAM) Service is used for job management across nodes in a Computing Element (CE). Nikhef has 4 production and 2 test CREAM CE servers that run in the Nikhef Grid Cluster. (The most authoritative list of the current CREAM CE servers can be found at: NDPF:System_Functions.)
Production servers |
---|
klomp.nikhef.nl |
juk.nikhef.nl |
gazon.nikhef.nl |
stremsel.nikhef.nl |
Testbed (ITB) servers |
---|
tbn15.nikhef.nl |
tbn21.nikhef.nl |
The CREAM CE servers allow valid users to submit jobs so this means authorization plays a role in the CREAM CE function. The servers are managed by Quattor from stal.nikhef.nl and are running UMD-4.
This documentation page describes how to enable a new assurance profile on the CREAM CE servers and allow a specific VO to run jobs. In this case we enabled Identifier-Only Trust Assurance (IOTA) or Combined Assurance/Adequacy Model (CAM). (More information can be found in the IGTF documentation on Combined Assurance.)
Adding plugins in Quattor
Enabling a differentiated assurance profile for access to the Grid cluster required two additional software package installations: lcmaps-plugins-vo-ca-ap.x86_64 and ca-policy-egi-cam.noarch.
These two software packages need to be added to the following templates in quattor.
1. The LCMAPS-plugins-vo-ca-ap package was added in /project/quattor/conf/cfg/grid/umd-4/glite/ce/rpms.tpl
'/software/packages/{lcmaps-plugins-vo-ca-ap}' = nlist();
2. The ca-policy-egi-cam package needed to be added to the list in /project/quattor/conf/cfg/grid/umd-4/glite/ce/service.tpl next to the general CA.tpl
# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes shouldn't be using the IOTA CAs include { 'common/security/ca-policy-egi-cam' };
3. A new template file was created for the ca-policy-egi-cam package in /project/quattor/conf/cfg/grid/common/security/
template common/security/ca-policy-egi-cam; "/software/packages/{ca-policy-egi-cam}" = nlist();
4. The worker nodes needed to load these packages from /srv/quattor/conf/cfg/grid/umd-4/glite/wn-torque4/service.tpl. (Note! The /srv/quattor/conf/cfg/grid/umd-4/glite/wn/ will not push any packages to the worker nodes. It is no longer an active directory.) So these lines were added to the service.tpl:
# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes should not be using the IOTA CAs include { 'common/security/ca-policy-egi-cam' };
If needed, add the VOs to the CREAM CE node ITB profile under clusters/itb/profiles/ in Quattor.
For production servers, /site/vo/cream-selected.tpl manages the list of allowed VOs.
Creating new variables
Since LCMAPS is managed by YAIM (which is independent of the Quattor deployment), the variable VO_CA_AP_FILE needed to be defined in a few places (this will later build the site-info.pre files, which are used by YAIM).
The VO_CA_AP_FILE variable needed to be added to these two templates:
- /quattor/conf/cfg/grid/yaim/mapping_functions.tpl (to map the variable from Quattor to YAIM)
- /quattor/conf/cfg/sites/ndpf/site/config/yaim.tpl (template that YAIM takes as input for the site-info.pre file which is used to create the lcmaps.db on the CREAM CE server)
Creating vo-ca-ap-file
The vo-ca-ap-file template in quattor was created under grid/common/security to create a single spot for the security and policy implementations. More detailed information about the LCMAPS plugin and creating this file can be found on the LCMAPS Plugin (vo-ca-ap) wiki page.
unique template common/security/vo-ca-ap-file; include { 'components/filecopy/config' }; variable VO_CA_AP_FILE ?='/etc/grid-security/vo-ca-ap-file'; variable VO_CA_AP_CONTENT =<<EOF; # EGI Combined Adequacy Assurance Model (CAM) reference configuration # for LCMAPS-plugins-vo-ca-ap. For instructions, see # https://wiki.nikhef.nl/grid/Lcmaps-plugins-vo-ca-ap # Reference configuation follows https://documents.egi.eu/document/2930 # and associated implementation guidance as for the EGI SPG processes # # EGI egi-cam version 1.95 # /atlas file:policy-egi-core.info, file:policy-egi-cam.info /cms file:policy-egi-core.info, file:policy-egi-cam.info /lhcb file:policy-egi-core.info, file:policy-egi-cam.info /alice file:policy-egi-core.info, file:policy-egi-cam.info #/lsgrid/Project_MinE file:policy-egi-core.info, file:policy-egi-cam.info #subgroups are not supported at the moment so lsgrid was added /lsgrid file:policy-egi-core.info, file:policy-egi-cam.info # Default policy for other VOs: /* file:policy-egi-core.info # # for non-VOMS enabled credentials, supports only core policy trust anchors: "-" file:policy-egi-core.info EOF "/software/components/filecopy/services" = npush( escape(VO_CA_AP_FILE), nlist("config",VO_CA_AP_CONTENT, "perms","0644", "owner","root"), );
Rebuilding Nikhef RPMs
The variable(s) (VO_CA_AP_FILE) also needed to be defined/updated in the nikhef-yaim-cream-ce and nikhef-yaim-core rpms.
These rpms were built locally from an svn repository (svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk). The two RPMs needed an additional function (config_lcas_lcmaps_gt4 and config_cream_glexec) and variable (vo_ca_ap) defined in the package.
- in trunk/nikhef-yaim-core/functions/local/config_lcas_lcmaps_gt4 (this file creates the lcmaps.db file on the server which now includes the vo-ca-ap policy)
vo_ca_ap = "lcmaps_vo_ca_ap.mod" " -certdir ${X509_CERT_DIR}" " -vo_ca_ap_file ${VO_CA_AP_FILE}" # policies withvoms: vo_ca_ap -> vomslocalgroup vomslocalgroup -> vomslocalaccount vomslocalaccount -> posix_enf | vomspoolaccount vomspoolaccount -> posix_enf standard: vo_ca_ap -> localaccount localaccount -> posix_enf | poolaccount poolaccount -> posix_enf
- and again in trunk/nikhef-yaim-cream-ce/functions/local/config_cream_glexec
vo_ca_ap = "lcmaps_vo_ca_ap.mod" " -certdir ${X509_CERT_DIR}" " -vo_ca_ap_file ${VO_CA_AP_FILE}" # policies withvoms: proxycheck -> vo_ca_ap vo_ca_ap -> vomslocalgroup vomslocalgroup -> vomslocalaccount vomslocalaccount -> posix_enf | vomspoolaccount vomspoolaccount -> posix_enf standard: proxycheck -> vo_ca_ap vo_ca_ap -> localaccount localaccount -> posix_enf | poolaccount poolaccount -> posix_enf
If the files (or functions) do not exist to amend in the existing RPMs, they are available from the EGI repository: http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/--then search for the latest yaim-core release.
Next, the RPMs were built using the makefile available in the svn repository. The standard directory structure for building an RPM needed to be available where the RPM was being built: BUILD, BUILDROOT, RPMS, SOURCES, SPECS, and SRPMS directories.
The new noarchs were then copied to stal.nikhef.nl under /project/quattor/www/html/mirror/nikhef/tools/el6/... Finally, there needed to be a new snapshot in the Quattor profile:
makesnapshot nikhef-tools-el6 YEARMONTHDAY deploy-to-mirrors
(The deploy-to-mirrors command will push the updated snapshot to stalkaars-01 and stalkaars-03.)
Edit the nikhef-tools snapshot on stal: /srv/quattor/etc/repository-snapshots/nikhef-tools-el6.conf Also edit the snapshot date in quattor (...on stal): conf/cfg/clusters/itb/repository/config.tpl
For the prd implementation, edit /project/quattor/conf/cfg/sites/ndpf/site/repository/default-snapshots.tpl with the updated snapshot date.
Pushing the changes
Check if the configuration builds with
makexprof -f itb
Debug any mistakes, then push the configuration to the testbed nodes.
pushxprof -f itb
Once this is done it will be required to rerun yaim manually on each CREAM CE server. The software packages that have been pushed from quattor are triggered on the updated servers, however, the CREAM CE servers do not know that YAIM needs to be run to update the configuration. Therefore running:
ncm-ncd --co yaim
should allow the updated policies to be implemented.
Checking if it worked...
Before pushing the changes into production (prd), it is advised to push the changes to the testbed (itb) and run a simple test job with the help of a user.
Log onto a test CREAM CE server (i.e., tbn15.nikhef.nl):
- check the lcmaps-plugin has been installed
yum search lcmaps-plugins-vo-ca-ap
- /etc/lcmaps/lcmaps.db and /etc/lcmaps/lcmaps-glexec.db have the vo-ca-ap policies updated in the files.
- and/or check the config files are correct under /opt/glite/yaim/functions/config_cream_glexec and config_lcas_lcmaps_gt4.
- and the /etc/grid-security/vo-ca-ap-file
- check /usr/share/doc/ca-policy-egi/ca-policy-egi-cam.vo-ca-ap
Troubleshooting hints
- this will manually rerun YAIM on the server: ncm-ncd --co yaim
- /opt/glite/yaim/bin/yaim -c -s /etc/siteinfo/lcg-quattor-site-info.def -n creamCE -n TORQUE_utils
- useful logs to check the quattor deployment: /var/log/ncm/ncd.log (more verbose information is available from /var/log/ncm-cdispd.log)
Useful references
- IGTF documentation on Combined Assurance
- LCMAPS Plugin (vo-ca-ap)
- Other notes from David G:
The quickest fix is to (first) allow the DOGWOOD CA bundle and configure the LCMAPS plugin accordingly. ...Such full support also required new software and configuration at each resource centre. You must deploy the additional trust anchor meta-packages and the new local policies in unison, and never install the cam package without such software support.