Difference between revisions of "Working with CREAM CE servers and VOs"

From PDP/Grid Wiki
Jump to navigationJump to search
 
(40 intermediate revisions by 2 users not shown)
Line 27: Line 27:
 
|}
 
|}
  
The CREAM CE servers allow valid users to submit jobs so this means authorization plays a role in the CREAM CE function. The servers are managed by Quattor from stal.nikhef.nl and are running UMD-4.
+
The CREAM CE servers allow valid users to submit jobs so this means authorization plays a role in the CREAM CE function. The servers for Nikhef are managed by Quattor from stal.nikhef.nl and are running UMD-4.
  
This documentation page describes how to enable a new assurance profile on the CREAM CE servers and a specific VO to run jobs.
+
This documentation page describes how to enable a new assurance profile on the CREAM CE servers and allow a specific VO to run jobs. In this case we enabled Identifier-Only Trust Assurance (IOTA) or Combined Assurance/Adequacy Model (CAM). ([https://wiki.egi.eu/wiki/EGI_IGTF_Release#Combined_Assurance.2FAdequacy_Model More information can be found in the IGTF documentation on Combined Assurance].)
  
 
== Adding plugins in Quattor ==
 
== Adding plugins in Quattor ==
Enabling a differentiated assurance profile for access to the Grid cluster requires two additional software package installations for the CREAM CE servers. In this case we enabled Identifier-Only Trust Assurance (IOTA) or Combined Assurance/Adequacy Model (CAM). ([https://wiki.egi.eu/wiki/EGI_IGTF_Release#Combined_Assurance.2FAdequacy_Model More information can be found in the IGTF documentation on Combined Assurance].)
+
Enabling a differentiated assurance profile for access to the Grid cluster requires two additional software package installations: lcmaps-plugins-vo-ca-ap.x86_64 and ca-policy-egi-cam.noarch.  
  
The LCMAPS-plugins-vo-ca-ap package was added to the rpms template in /project/quattor/conf/cfg/grid/umd-4/glite/ce/rpms.tpl
+
These two software packages are included in the following templates in quattor.
 +
 
 +
1. In /project/quattor/conf/cfg/grid/umd-4/glite/ce/rpms.tpl add the LCMAPS-plugins-vo-ca-ap package
  
 
  '/software/packages/{lcmaps-plugins-vo-ca-ap}' = nlist();
 
  '/software/packages/{lcmaps-plugins-vo-ca-ap}' = nlist();
  
and the ca-policy-egi-cam package was added as a new template in /project/quattor/conf/cfg/grid/common/security/  
+
2. In /project/quattor/conf/cfg/grid/umd-4/glite/ce/service.tpl add the ca-policy-egi-cam package next to the general CA.tpl
 +
 
 +
# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes shouldn't be using the IOTA CAs
 +
include { 'common/security/ca-policy-egi-cam' };
 +
 
 +
3. In /project/quattor/conf/cfg/grid/common/security/ create a new template file called ca-policy-egi-cam package
  
 
  template common/security/ca-policy-egi-cam;
 
  template common/security/ca-policy-egi-cam;
 
  "/software/packages/{ca-policy-egi-cam}" = nlist();
 
  "/software/packages/{ca-policy-egi-cam}" = nlist();
 +
 +
4. In /srv/quattor/conf/cfg/grid/umd-4/glite/wn-torque4/service.tpl the worker nodes need to load the egi-cam package. (Note! The /srv/quattor/conf/cfg/grid/umd-4/glite/wn/ will not push any packages to the worker nodes. It is no longer an active directory.) So these lines were added to the service.tpl:
 +
 +
# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes should not be using the IOTA CAs
 +
include { 'common/security/ca-policy-egi-cam' };
 +
 +
If needed, add the VOs to the CREAM CE node ITB profile under clusters/itb/profiles/ in Quattor.
 +
 +
For production servers, /site/vo/cream-selected.tpl manages the list of allowed VOs.
  
 
== Creating new variables ==
 
== Creating new variables ==
  
Since LCMAPS is managed by [https://twiki.cern.ch/twiki/bin/view/EGEE/YAIM YAIM] (which is independent of the Quattor deployment), the variable VO_CA_AP_FILE needed to be defined in a few places (this will later build the site-info.pre files, which are used by YAIM).  
+
Since LCMAPS is managed by [https://twiki.cern.ch/twiki/bin/view/EGEE/YAIM YAIM] (which runs independent of the Quattor deployment), the variable VO_CA_AP_FILE needs to be defined in a few places. (The changes to these files build the site-info.pre).  
  
The VO_CA_AP_FILE variable needs to be added to these two templates:
+
VO_CA_AP_FILE="/etc/grid-security/vo-ca-ap-file"
 +
 
 +
This definition is added to:
 
* /quattor/conf/cfg/grid/yaim/mapping_functions.tpl (to map the variable from Quattor to YAIM)
 
* /quattor/conf/cfg/grid/yaim/mapping_functions.tpl (to map the variable from Quattor to YAIM)
* /quattor/conf/cfg/sites/ndpf/site/config/yaim.tpl (template that YAIM takes as input for the site-info.pre file which is used to create the lcmaps.db on the CREAM CE server)
+
* /quattor/conf/cfg/sites/ndpf/site/config/yaim.tpl (template that YAIM takes as input for the site-info.pre file which is used to create the lcmaps.db on the CREAM CE server.)
  
 
== Creating vo-ca-ap-file ==  
 
== Creating vo-ca-ap-file ==  
Line 92: Line 110:
  
 
== Rebuilding Nikhef RPMs ==
 
== Rebuilding Nikhef RPMs ==
The variable(s) (VO_CA_AP_FILE) also needs to be defined/updated in the nikhef-yaim-cream-ce and nikhef-yaim-core rpms.  
+
The variable(s) (VO_CA_AP_FILE) needs to be defined/updated in the nikhef-yaim-cream-ce and nikhef-yaim-core rpms.  
  
These rpms are built locally from an svn repository (svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk). The two RPMs need an additional function (config_lcas_lcmaps_gt4 and config_cream_glexec) and variable (vo_ca_ap) defined in the RPM.
+
These rpms are built locally from an svn repository (svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk). The two RPMs need an additional function (config_lcas_lcmaps_gt4 and config_cream_glexec) and variable (vo_ca_ap).
  
 
*in trunk/nikhef-yaim-core/functions/local/config_lcas_lcmaps_gt4 (this file creates the lcmaps.db file on the server which now includes the vo-ca-ap policy)
 
*in trunk/nikhef-yaim-core/functions/local/config_lcas_lcmaps_gt4 (this file creates the lcmaps.db file on the server which now includes the vo-ca-ap policy)
Line 101: Line 119:
 
             " -certdir ${X509_CERT_DIR}"
 
             " -certdir ${X509_CERT_DIR}"
 
             " -vo_ca_ap_file ${VO_CA_AP_FILE}"
 
             " -vo_ca_ap_file ${VO_CA_AP_FILE}"
 +
 
  # policies
 
  # policies
 
  withvoms:
 
  withvoms:
Line 107: Line 126:
 
  vomslocalaccount -> posix_enf | vomspoolaccount
 
  vomslocalaccount -> posix_enf | vomspoolaccount
 
  vomspoolaccount -> posix_enf
 
  vomspoolaccount -> posix_enf
 +
 
  standard:
 
  standard:
 
  vo_ca_ap -> localaccount
 
  vo_ca_ap -> localaccount
Line 117: Line 137:
 
             " -certdir ${X509_CERT_DIR}"
 
             " -certdir ${X509_CERT_DIR}"
 
             " -vo_ca_ap_file ${VO_CA_AP_FILE}"
 
             " -vo_ca_ap_file ${VO_CA_AP_FILE}"
 +
 
  # policies
 
  # policies
 
  withvoms:
 
  withvoms:
Line 124: Line 145:
 
  vomslocalaccount -> posix_enf | vomspoolaccount
 
  vomslocalaccount -> posix_enf | vomspoolaccount
 
  vomspoolaccount -> posix_enf
 
  vomspoolaccount -> posix_enf
 +
 
  standard:
 
  standard:
 
  proxycheck -> vo_ca_ap
 
  proxycheck -> vo_ca_ap
Line 130: Line 152:
 
  poolaccount -> posix_enf
 
  poolaccount -> posix_enf
  
If the files (or functions) do not exist to amend in the existing RPMs, they are available from the EGI repository: [http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/ http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/] and search for the latest yaim-core release.
+
If the files (or functions) do not exist to amend in the existing RPMs, they are available from the EGI repository: [http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/ http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/]--then search for the latest yaim-core release.
  
Next, the RPMs are built using the makefile available in the svn repository. The standard directory structure for building an RPM needs to be available where the RPM is being built: BUILD, BUILDROOT, RPMS, SOURCES, SPECS, and SRPMS directories.
+
Next, the RPMs are built using the makefile available in the svn repository. The standard directory structure for building an RPM may need to be created: BUILD, BUILDROOT, RPMS, SOURCES, SPECS, and SRPMS directories.
  
 
The new noarchs are then copied to stal.nikhef.nl under /project/quattor/www/html/mirror/nikhef/tools/el6/...
 
The new noarchs are then copied to stal.nikhef.nl under /project/quattor/www/html/mirror/nikhef/tools/el6/...
Line 142: Line 164:
 
(The deploy-to-mirrors command will push the updated snapshot to stalkaars-01 and stalkaars-03.)  
 
(The deploy-to-mirrors command will push the updated snapshot to stalkaars-01 and stalkaars-03.)  
  
Edit the nikhef-tools snapshot in quattor: /srv/quattor/etc/repository-snapshots/nikhef-tools-el6.conf
+
The nikhef-tools snapshot is edited on stal: /srv/quattor/etc/repository-snapshots/nikhef-tools-el6.conf
Also edit the snapshot date in quattor: conf/cfg/clusters/itb/repository/config.tpl
+
and conf/cfg/clusters/itb/repository/config.tpl
  
For prd, edit /project/quattor/conf/cfg/sites/ndpf/site/repository/default-snapshots.tpl with the updated snapshot date.
+
For the prd implementation, /project/quattor/conf/cfg/sites/ndpf/site/repository/default-snapshots.tpl updates with the snapshot date.
 
 
== More policy implementations ==
 
Add the ca-policy-egi-cam to the general CA policy (/srv/quattor/conf/cfg/grid/common/security/CA.tpl)
 
 
 
include { 'common/security/ca-policy-egi-core' };
 
 
 
This also needs to be added to the CE (/srv/quattor/conf/cfg/grid/umd-4/glite/ce/service.tpl)
 
 
 
# include CAs
 
include { 'common/security/CA' };
 
# including cam policy for wn
 
include { 'common/security/ca-policy-egi-cam' };
 
 
 
and worker nodes (/srv/quattor/conf/cfg/grid/umd-4/glite/wn/service.tpl)
 
 
 
# include CAs
 
include { 'common/security/CA' };
 
# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes should not be using the IOTA CAs
 
include { 'common/security/ca-policy-egi-cam' };
 
# write the Nikhef vo-ca-ap file for nikhef policies
 
include { 'common/security/vo-ca-ap-file' };
 
 
 
For production servers, /site/vo/cream-selected.tpl manages the list of allowed VOs.
 
 
 
 
 
If needed, add the VOs to the CREAM CE node profile under clusters/itb/profiles/ in Quattor.
 
  
 
== Pushing the changes ==
 
== Pushing the changes ==
Line 182: Line 178:
 
  pushxprof -f itb
 
  pushxprof -f itb
  
Once this is done it will be required to rerun yaim manually on the server. The software packages that have been pushed from quattor are triggered on the updated servers, however, the CREAM CE servers do not know that YAIM needs to be run to update the configuration. Therefore running:  
+
Once this is done, it is required to rerun yaim manually on each CREAM CE server. The software packages that are pushed from quattor are triggered on the updated servers, however, the CREAM CE servers do not know that YAIM needs to be run to update the configuration. Therefore running:  
  
 
  ncm-ncd --co yaim
 
  ncm-ncd --co yaim
  
should allow the updated policies to be implemented on the servers.
+
should allow the updated policies to be implemented.
 +
 
 +
Log in on a worker node to check if the software packages were added will help later in the debugging process.
  
 
== Checking if it worked... ==
 
== Checking if it worked... ==
Before pushing the changes into production (prd), it is advised to push the changes to the testbed (ITB) and run a simple test job with the help of a user.  
+
Before pushing the changes into production (prd), it is advised to push the changes to the testbed (itb) and run a simple test job with the help of a user.  
  
Log onto a test CREAM CE server (i.e., tbn15.nikhef.nl) and check:  
+
Log onto a test CREAM CE server (i.e., tbn15.nikhef.nl):  
 
* check the lcmaps-plugin has been installed
 
* check the lcmaps-plugin has been installed
  
Line 198: Line 196:
 
* /etc/lcmaps/lcmaps.db and /etc/lcmaps/lcmaps-glexec.db have the vo-ca-ap policies updated in the files.
 
* /etc/lcmaps/lcmaps.db and /etc/lcmaps/lcmaps-glexec.db have the vo-ca-ap policies updated in the files.
 
* and/or check the config files are correct under /opt/glite/yaim/functions/config_cream_glexec and config_lcas_lcmaps_gt4.
 
* and/or check the config files are correct under /opt/glite/yaim/functions/config_cream_glexec and config_lcas_lcmaps_gt4.
* and the /etc/grid-security/vo-ca-ap-file
+
* and this file exists: /etc/grid-security/vo-ca-ap-file
 
* check /usr/share/doc/ca-policy-egi/ca-policy-egi-cam.vo-ca-ap
 
* check /usr/share/doc/ca-policy-egi/ca-policy-egi-cam.vo-ca-ap
  
 
== Troubleshooting hints ==
 
== Troubleshooting hints ==
* this will manually rerun YAIM on the server: ncm-ncd --co yaim
+
* ncm-ncd --co yaim will rerun YAIM manually on the server.
 
* /opt/glite/yaim/bin/yaim -c -s /etc/siteinfo/lcg-quattor-site-info.def -n creamCE -n TORQUE_utils
 
* /opt/glite/yaim/bin/yaim -c -s /etc/siteinfo/lcg-quattor-site-info.def -n creamCE -n TORQUE_utils
 
* useful logs to check the quattor deployment: /var/log/ncm/ncd.log (more verbose information is available from /var/log/ncm-cdispd.log)
 
* useful logs to check the quattor deployment: /var/log/ncm/ncd.log (more verbose information is available from /var/log/ncm-cdispd.log)
 +
 +
If things are still failing, it is helpful to add a VO that you can test from in the VO-CA-AP file on a CREAM CE and/or UI server, then figure out which worker node the job is being delegated to.
 +
 +
This was the error the user received:
 +
******  JobID=https://stremsel.nikhef.nl:8443/CREAM910312615
 +
 +
Status        = [DONE-FAILED]
 +
 +
ExitCode      = [N/A]
 +
 +
FailureReason = [Cannot move ISB (retry_copy ${globus_transfer_cmd}
 +
gsiftp://stremsel.nikhef.nl/var/cream_sandbox
 +
/mine/CN_USERNAME_HBIW9YENQ_9vUbLu_O_surfsara_nl_DC_rcauth_clients_DC_rcauth_DC_eu_VONAME_NULL_Capability_NULL_mine000
 +
/91/CREAM910312615/ISB/test.sh file:///tmp/jobdir/42164683.korf.nikhef.nl/CREAM910312615/test.sh):
 +
 +
  error: globus_ftp_client: the server responded with an error500 500-Command failed. : callback failed.500-an end-of-file was reached500-globus_xio: The GSI
 +
XIO driver failed to establish a secure connection. The failure occured during a handshake read.500-globus_xio: An end of file occurred500 End.; reason=1;
 +
Cannot move ISB (retry_copy ${globus_transfer_cmd}
 +
 +
  gsiftp://stremsel.nikhef.nl/var/cream_sandbox
 +
/mine/CN_USERNAME_HBIW9YENQ_9vUbLu_O_surfsara_nl_DC_rcauth_clients_DC_rcauth_DC_eu_VONAME_NULL_Capability_NULL_mine000
 +
/91/CREAM910312615/ISB/test.sh file:///tmp/jobdir/42164683.korf.nikhef.nl/CREAM910312615/test.sh): error: globus_ftp_client: the server responded with an error
 +
500 500-Command failed. : callback failed.  500-an end-of-file was reached  500-globus_xio: The GSI XIO driver failed to establish a secure connection. The
 +
failure occured during a handshake read.  500-globus_xio: An end of file occurred  500 End.]
 +
 +
The cause for this error seemed to be the worker nodes not having the egi-cam packages installed.
 +
 +
It may require adding ca_RCauth-Pilot-ICA-G1.noarch temporarily to a UI server where you can submit the jobs.
  
 
== Useful references ==
 
== Useful references ==

Latest revision as of 14:22, 1 May 2019

Introduction

The Computing Resource Execution And Management (CREAM) Service is used for job management across nodes in a Computing Element (CE). Nikhef has 4 production and 2 test CREAM CE servers that run in the Nikhef Grid Cluster. (The most authoritative list of the current CREAM CE servers can be found at: NDPF:System_Functions.)

Production servers
klomp.nikhef.nl
juk.nikhef.nl
gazon.nikhef.nl
stremsel.nikhef.nl
Testbed (ITB) servers
tbn15.nikhef.nl
tbn21.nikhef.nl

The CREAM CE servers allow valid users to submit jobs so this means authorization plays a role in the CREAM CE function. The servers for Nikhef are managed by Quattor from stal.nikhef.nl and are running UMD-4.

This documentation page describes how to enable a new assurance profile on the CREAM CE servers and allow a specific VO to run jobs. In this case we enabled Identifier-Only Trust Assurance (IOTA) or Combined Assurance/Adequacy Model (CAM). (More information can be found in the IGTF documentation on Combined Assurance.)

Adding plugins in Quattor

Enabling a differentiated assurance profile for access to the Grid cluster requires two additional software package installations: lcmaps-plugins-vo-ca-ap.x86_64 and ca-policy-egi-cam.noarch.

These two software packages are included in the following templates in quattor.

1. In /project/quattor/conf/cfg/grid/umd-4/glite/ce/rpms.tpl add the LCMAPS-plugins-vo-ca-ap package

'/software/packages/{lcmaps-plugins-vo-ca-ap}' = nlist();

2. In /project/quattor/conf/cfg/grid/umd-4/glite/ce/service.tpl add the ca-policy-egi-cam package next to the general CA.tpl

# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes shouldn't be using the IOTA CAs
include { 'common/security/ca-policy-egi-cam' };

3. In /project/quattor/conf/cfg/grid/common/security/ create a new template file called ca-policy-egi-cam package

template common/security/ca-policy-egi-cam;
"/software/packages/{ca-policy-egi-cam}" = nlist();

4. In /srv/quattor/conf/cfg/grid/umd-4/glite/wn-torque4/service.tpl the worker nodes need to load the egi-cam package. (Note! The /srv/quattor/conf/cfg/grid/umd-4/glite/wn/ will not push any packages to the worker nodes. It is no longer an active directory.) So these lines were added to the service.tpl:

# manually adding the cam policy for the IOTA CA's instead of in the general CA since all nodes should not be using the IOTA CAs
include { 'common/security/ca-policy-egi-cam' };

If needed, add the VOs to the CREAM CE node ITB profile under clusters/itb/profiles/ in Quattor.

For production servers, /site/vo/cream-selected.tpl manages the list of allowed VOs.

Creating new variables

Since LCMAPS is managed by YAIM (which runs independent of the Quattor deployment), the variable VO_CA_AP_FILE needs to be defined in a few places. (The changes to these files build the site-info.pre).

VO_CA_AP_FILE="/etc/grid-security/vo-ca-ap-file"

This definition is added to:

  • /quattor/conf/cfg/grid/yaim/mapping_functions.tpl (to map the variable from Quattor to YAIM)
  • /quattor/conf/cfg/sites/ndpf/site/config/yaim.tpl (template that YAIM takes as input for the site-info.pre file which is used to create the lcmaps.db on the CREAM CE server.)

Creating vo-ca-ap-file

The vo-ca-ap-file template in quattor was created under grid/common/security to create a single spot for the security and policy implementations. More detailed information about the LCMAPS plugin and creating this file can be found on the LCMAPS Plugin (vo-ca-ap) wiki page.

unique template common/security/vo-ca-ap-file;

include { 'components/filecopy/config' };

variable VO_CA_AP_FILE ?='/etc/grid-security/vo-ca-ap-file'; 

variable VO_CA_AP_CONTENT =<<EOF;
# EGI Combined Adequacy Assurance Model (CAM) reference configuration
# for LCMAPS-plugins-vo-ca-ap. For instructions, see
#   https://wiki.nikhef.nl/grid/Lcmaps-plugins-vo-ca-ap
# Reference configuation follows https://documents.egi.eu/document/2930
# and associated implementation guidance as for the EGI SPG processes
#
# EGI egi-cam version 1.95
#
/atlas   		file:policy-egi-core.info, file:policy-egi-cam.info
/cms     		file:policy-egi-core.info, file:policy-egi-cam.info
/lhcb    		file:policy-egi-core.info, file:policy-egi-cam.info
/alice  		file:policy-egi-core.info, file:policy-egi-cam.info
#/lsgrid/Project_MinE	file:policy-egi-core.info, file:policy-egi-cam.info
#subgroups are not supported at the moment so lsgrid was added
/lsgrid			file:policy-egi-core.info, file:policy-egi-cam.info
# Default policy for other VOs:
/*       		file:policy-egi-core.info
#
# for non-VOMS enabled credentials, supports only core policy trust anchors:
"-"      file:policy-egi-core.info
EOF

"/software/components/filecopy/services" = npush(
    escape(VO_CA_AP_FILE),
      nlist("config",VO_CA_AP_CONTENT,
            "perms","0644",
            "owner","root"),
);

Rebuilding Nikhef RPMs

The variable(s) (VO_CA_AP_FILE) needs to be defined/updated in the nikhef-yaim-cream-ce and nikhef-yaim-core rpms.

These rpms are built locally from an svn repository (svn+ssh://svn@ndpfsvn.nikhef.nl/repos/ndpf/nl.nikhef.ndpf.yaim/trunk). The two RPMs need an additional function (config_lcas_lcmaps_gt4 and config_cream_glexec) and variable (vo_ca_ap).

  • in trunk/nikhef-yaim-core/functions/local/config_lcas_lcmaps_gt4 (this file creates the lcmaps.db file on the server which now includes the vo-ca-ap policy)
vo_ca_ap = "lcmaps_vo_ca_ap.mod"
           " -certdir ${X509_CERT_DIR}"
           " -vo_ca_ap_file ${VO_CA_AP_FILE}"

# policies
withvoms:
vo_ca_ap -> vomslocalgroup
vomslocalgroup -> vomslocalaccount
vomslocalaccount -> posix_enf | vomspoolaccount
vomspoolaccount -> posix_enf

standard:
vo_ca_ap -> localaccount
localaccount -> posix_enf | poolaccount
poolaccount -> posix_enf
  • and again in trunk/nikhef-yaim-cream-ce/functions/local/config_cream_glexec
vo_ca_ap = "lcmaps_vo_ca_ap.mod"
           " -certdir ${X509_CERT_DIR}"
           " -vo_ca_ap_file ${VO_CA_AP_FILE}"

# policies
withvoms:
proxycheck -> vo_ca_ap
vo_ca_ap -> vomslocalgroup
vomslocalgroup -> vomslocalaccount
vomslocalaccount -> posix_enf | vomspoolaccount
vomspoolaccount -> posix_enf

standard:
proxycheck -> vo_ca_ap
vo_ca_ap -> localaccount
localaccount -> posix_enf | poolaccount
poolaccount -> posix_enf

If the files (or functions) do not exist to amend in the existing RPMs, they are available from the EGI repository: http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/--then search for the latest yaim-core release.

Next, the RPMs are built using the makefile available in the svn repository. The standard directory structure for building an RPM may need to be created: BUILD, BUILDROOT, RPMS, SOURCES, SPECS, and SRPMS directories.

The new noarchs are then copied to stal.nikhef.nl under /project/quattor/www/html/mirror/nikhef/tools/el6/... Finally, there needs to be a new snapshot in the Quattor profile:

makesnapshot nikhef-tools-el6 YEARMONTHDAY
deploy-to-mirrors

(The deploy-to-mirrors command will push the updated snapshot to stalkaars-01 and stalkaars-03.)

The nikhef-tools snapshot is edited on stal: /srv/quattor/etc/repository-snapshots/nikhef-tools-el6.conf and conf/cfg/clusters/itb/repository/config.tpl

For the prd implementation, /project/quattor/conf/cfg/sites/ndpf/site/repository/default-snapshots.tpl updates with the snapshot date.

Pushing the changes

Check if the configuration builds with

makexprof -f itb

Debug any mistakes, then push the configuration to the testbed nodes.

pushxprof -f itb

Once this is done, it is required to rerun yaim manually on each CREAM CE server. The software packages that are pushed from quattor are triggered on the updated servers, however, the CREAM CE servers do not know that YAIM needs to be run to update the configuration. Therefore running:

ncm-ncd --co yaim

should allow the updated policies to be implemented.

Log in on a worker node to check if the software packages were added will help later in the debugging process.

Checking if it worked...

Before pushing the changes into production (prd), it is advised to push the changes to the testbed (itb) and run a simple test job with the help of a user.

Log onto a test CREAM CE server (i.e., tbn15.nikhef.nl):

  • check the lcmaps-plugin has been installed
yum search lcmaps-plugins-vo-ca-ap
  • /etc/lcmaps/lcmaps.db and /etc/lcmaps/lcmaps-glexec.db have the vo-ca-ap policies updated in the files.
  • and/or check the config files are correct under /opt/glite/yaim/functions/config_cream_glexec and config_lcas_lcmaps_gt4.
  • and this file exists: /etc/grid-security/vo-ca-ap-file
  • check /usr/share/doc/ca-policy-egi/ca-policy-egi-cam.vo-ca-ap

Troubleshooting hints

  • ncm-ncd --co yaim will rerun YAIM manually on the server.
  • /opt/glite/yaim/bin/yaim -c -s /etc/siteinfo/lcg-quattor-site-info.def -n creamCE -n TORQUE_utils
  • useful logs to check the quattor deployment: /var/log/ncm/ncd.log (more verbose information is available from /var/log/ncm-cdispd.log)

If things are still failing, it is helpful to add a VO that you can test from in the VO-CA-AP file on a CREAM CE and/or UI server, then figure out which worker node the job is being delegated to.

This was the error the user received:

******  JobID=https://stremsel.nikhef.nl:8443/CREAM910312615

Status        = [DONE-FAILED]

ExitCode      = [N/A]

FailureReason = [Cannot move ISB (retry_copy ${globus_transfer_cmd} 
gsiftp://stremsel.nikhef.nl/var/cream_sandbox 
/mine/CN_USERNAME_HBIW9YENQ_9vUbLu_O_surfsara_nl_DC_rcauth_clients_DC_rcauth_DC_eu_VONAME_NULL_Capability_NULL_mine000
/91/CREAM910312615/ISB/test.sh file:///tmp/jobdir/42164683.korf.nikhef.nl/CREAM910312615/test.sh):

 error: globus_ftp_client: the server responded with an error500 500-Command failed. : callback failed.500-an end-of-file was reached500-globus_xio: The GSI 
XIO driver failed to establish a secure connection. The failure occured during a handshake read.500-globus_xio: An end of file occurred500 End.; reason=1; 
Cannot move ISB (retry_copy ${globus_transfer_cmd}

 gsiftp://stremsel.nikhef.nl/var/cream_sandbox
/mine/CN_USERNAME_HBIW9YENQ_9vUbLu_O_surfsara_nl_DC_rcauth_clients_DC_rcauth_DC_eu_VONAME_NULL_Capability_NULL_mine000
/91/CREAM910312615/ISB/test.sh file:///tmp/jobdir/42164683.korf.nikhef.nl/CREAM910312615/test.sh): error: globus_ftp_client: the server responded with an error 
500 500-Command failed. : callback failed.  500-an end-of-file was reached  500-globus_xio: The GSI XIO driver failed to establish a secure connection. The 
failure occured during a handshake read.  500-globus_xio: An end of file occurred  500 End.]

The cause for this error seemed to be the worker nodes not having the egi-cam packages installed.

It may require adding ca_RCauth-Pilot-ICA-G1.noarch temporarily to a UI server where you can submit the jobs.

Useful references

  1. IGTF documentation on Combined Assurance
  2. LCMAPS Plugin (vo-ca-ap)
  3. Other notes from David G:

The quickest fix is to (first) allow the DOGWOOD CA bundle and configure the LCMAPS plugin accordingly. ...Such full support also required new software and configuration at each resource centre. You must deploy the additional trust anchor meta-packages and the new local policies in unison, and never install the cam package without such software support.