Upgrading Quattor managed glite servers

From PDP/Grid Wiki
Jump to navigationJump to search

Updating a gLite release

Updating a gLite release consists of the following steps: (IMPORTANT NOTE: Only if followed all the steps on the ITB (Test) System, you can move directly to the 'Compilation' step).

  • Synchronizing our copy of the gLite repository with the official release repository at CERN
  • Generating Quattor templates for the updates and creating a new gLite update branch in the name space
  • Compilation of the profiles for the hosts in the Installation Test Bed using the new gLite updates
  • Deployment and troubleshooting
  • Using the last gLite updates as default for all clusters

The following notes were taken during the upgrade from glite-3.1-update-29 to glite-3.1-update-31. These notes depent heavyly on our installation and might not be applicable at other Quattor installations without major changes, paste and copy will surely not work.

Synchronization of the local gLite repository

As ndpfmgr@stal

  • Run the script ~/bin/mirror-glite

This synchronizes our gLite mirror at stal (under directory /project/quattor/www/html/mirror/glite) with the official repository at CERN (fetch release from: http://glitesoft.cern.ch/EGEE/gLite/).

Generation of the update templates & creation of a new update branch

This needs to be executed in your own working Quattor environment, either at a Quattor server (e.g. stal) or your own laptop. It assumes that you have a working Quattor environment, in particular the definition of an environment variable $L pointing to a usable Quattor repository checkout.

Generating the update templates

First argument of rpmUpdates.pl script is the directory of the mirror created as ndpfmgr before.

  • $L/../bin/rpmUpdates.pl /project/quattor/www/html/mirror/glite/3.1/generic/sl4/i386/updates/ > /tmp/31-i386
  • $L/../bin/rpmUpdates.pl /project/quattor/www/html/mirror/glite/3.1/generic/sl4/x86_64/updates/ > /tmp/31-x86-64

The commands above create Quattor templates that will replace all existing packages (in the Quattor profile) with the most recent versions found in the update repositories. Note that this is a blunt approach that does not take into account packages that were added or deleted as part of an update.

Creating a new update branch

Our Quattor hierarchy contains one directory hierarchy per gLite update. This structure permits to select which gLite updates will be installed per cluster or even per individual node. There is also a site-wide default setting.

The easiest way to create an update branch is to copy an existing one:

  • cd $L/cfg/grid/glite-3.1/update/
  • cp -a 29 31

The copied branch still contains the directories that Subversion uses internally. They need to be removed:

  • find $L/cfg/grid/glite-3.1/update/31 -type d -name .svn -exec rm -rf {} \;

The templates in the new branch still contain the namespace paths for the original branch. They need to be corrected because otherwise the compilation will fail.

  • for file in `grep -H -r "/29/" 31/* |awk -F : '{print $1}'`; do sed -i "s/\/29\//\/31\//g" $file; done

Then replace the template containing the updates with the contents of the template generated by rpmUpdates.pl. Note that this is not a straightforward operation because the original update template may contain manual additions (or deletions). The generated template only ensures that the newest version of each rpm is installed. Unfortunately this is usually not sufficient for gLite release since an update release may introduce new packages (found in the release repository and thus not processed by rpmUpdates.pl) and package dependencies may require additional rpms to be installed (or deleted). It requires manual editing to ensure that such customizations are copied from the original update template to the newly generated one. Specifically for gLite 3.1, the Torque packages should be uncommented in the generated repository because newer gLite releases contain updates for them in the externals repository.... Note: pay attention to the namespace in the template! The generated one does not match the required namespace and will thus cause compilation errors.

  • cp /tmp/31-i386 $L/cfg/grid/glite-3.1/update/31/i386/rpms.tpl
  • edit $L/cfg/grid/glite-3.1/update/31/i386/rpms.tpl

Note: if the generated templates were stored somewhere under $L/cfg, they will cause compilation errors because their namespace will not match. They need to be deleted or moved to another directory.

Compilation

Still in your local environment, you need to decide which hosts or cluster(s) will be upgraded. For the initial test, only use the installation testbed (ITB). If there are problems, they will not affect all production machines.

Using the conventions below, in particular the use of the ?= conditional assignment operator, setting GLITE_UPDATE_VERSION for a host takes precedence over the cluster setting, which takes precedence over the gLite default setting.

To define the gLite update version for a specific host, define variable GLITE_UPDATE_VERSION in the object template ($L/cfg/cluster/<clustername>/profiles/profile_<host>.tpl) before the line include machine-type/<type>:

variable GLITE_UPDATE_VERSION ?= "31";
include machine-types/se_dpm_disk;

To define the gLite update version for an entire cluster, set the update version number in $L/cfg/cluster/<clustername>/local/pro_facility_config.tpl:

variable GLITE_UPDATE_VERSION ?= {
    if ( ! exists( GLITE_VERSION ) ) { error("GLITE_VERSION undefined"); };
    if ( GLITE_VERSION == "glite-3.1" ) {
        return('31');                       # gLite 3.1 update 31
    }
    else {
        return('44');                       # gLite 3.0 update 44
    };
};

Defining the gLite default update level is done in $L/cfg/grid/glite-3.1/glite/defaults.tpl:

variable GLITE_UPDATE_VERSION ?= '31';

Finally, perform a compilation of the select host(s) or cluster. It is mandatory to use the -u option of makexprof during the first compilation after running the mirror script because the contents of the gLite repository have changed. To perform the first compilation for the ITB:

  • makexprof -u -f itb

If there are no problems, the changes may be committed to the SVN repository. The new update branch ($L/cfg/grid/glite-3.1/update/<nr>/) needs to be added to svn:

cd $L/cfg/grid/glite-3.1/update
svn add 31
svn commit 31

Don't forget to commit the changes for the host, cluster of gLite-wide defaults!

Deployment

Deploying the new configuration is described in detail in the article How_to_work_with_our_Quattor_setup#Deploying.

Work as ndpfmgr@stal, get the latest from SVN and push the profiles for the selected host(s) or cluster. Again, don't forget to use option -u for the first compilation to refresh the repository templates:

cd $L/cfg
svn up
pushxprof -u -f itb

Troubleshooting

And finaly-A, since Quattor does some kind of package management

If you get noticed, preferably by a monitoring system (if you don't have one -good luck), that something does not work you could try the following:

  • Logon the host who shows problems
  • tail /var/log/ncm/ncd.log
2008/10/02-15:25:30 [INFO] Errors while configuring spma (1)
2008/10/02-15:25:30 [ERROR] 1 errors, 0 warnings executing configure

This tells us, that we actualy have to look into

  • tail /var/log/spma.log
2008/10/02-15:25:28 [WARN] Errors found:
depcheck: package glite-UI 3.1.19-0 needs glite-amga-api-python >= 1.3.0-1
depcheck: package glite-UI 3.1.19-0 needs glite-amga-cli >= 1.3.0-4
there were 2 dependency problem(s) and 0 conflict(s)

With this information now simply visit the web page which has the rpm list for this particular server, in our case: http://glite.web.cern.ch/glite/packages/R3.1/deployment/glite-UI/3.1.20-0/glite-UI-3.1.20-0.html

These packages have to be added to the respective rpm list, step wise:

And finaly-B, again as normal-user@stal

Here the "Quattor package managing" for the ui is wrong. Note that you don't add it to the updates directory, this goes to the "base list".

  • In our example the following should work:
cat << EOF >> $L/cfg/grid/glite-3.1/glite/ui/rpms.tpl
"/software/packages"=pkg_repl("glite-amga-cli","1.3.0-4","i386");
"/software/packages"=pkg_repl("glite-amga-api-python","1.3.0-1","noarch");
EOF
  • makexprof -f prd bosui
  • If compilation is successful proceed with:
cvs commit

And finaly-C, again as ndpfmgr@stal

Get the updates commited as normal user above, confirm that they compile, push profile to the node that misses packets (here bosui, an UI)

  • cd $L/cfg/grid/glite-3.1/glite/ui
  • cvs update rpms.tpl
  • makexprof -u -f prd bosui
  • if OK, pushxprof -u -f prd bosui
  • check monitoring system

Loop till all services are useable again