Difference between revisions of "Upgrading Quattor managed glite servers"
m |
|||
Line 143: | Line 143: | ||
=== Loop till all services are useable again === | === Loop till all services are useable again === | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ---- | ||
+ | [http://www.bronvanwelzijn.nl] |
Revision as of 14:51, 12 June 2010
Updating a gLite release
Updating a gLite release consists of the following steps:
(IMPORTANT NOTE: Only if followed all the steps on the ITB (Test) System, you can move directly to the 'Compilation' step when you're following the steps for the PRD (Production) System).
- Synchronizing our copy of the gLite repository with the official release repository at CERN
- Generating Quattor templates for the updates and creating a new gLite update branch in the name space
- Compilation of the profiles for the hosts in the Installation Test Bed using the new gLite updates
- Deployment and troubleshooting
- Using the last gLite updates as default for all clusters
The following notes were taken during the upgrade from glite-3.1-update-29 to glite-3.1-update-31. These notes depent heavyly on our installation and might not be applicable at other Quattor installations without major changes, paste and copy will surely not work.
Synchronization of the local gLite repository
As ndpfmgr@stal
- Run the script ~/bin/mirror-glite
This synchronizes our gLite mirror at stal (under directory /project/quattor/www/html/mirror/glite) with the official repository at CERN (fetch release from: http://glitesoft.cern.ch/EGEE/gLite/).
Generation of the update templates & creation of a new update branch
This needs to be executed in your own working Quattor environment, either at a Quattor server (e.g. stal) or your own laptop. It assumes that you have a working Quattor environment, in particular the definition of an environment variable $L pointing to a usable Quattor repository checkout.
Generating the update templates
First argument of rpmUpdates.pl script is the directory of the mirror created as ndpfmgr before.
- $L/../bin/rpmUpdates.pl /project/quattor/www/html/mirror/glite/3.1/generic/sl4/i386/updates/ > /tmp/31-i386
- $L/../bin/rpmUpdates.pl /project/quattor/www/html/mirror/glite/3.1/generic/sl4/x86_64/updates/ > /tmp/31-x86-64
The commands above create Quattor templates that will replace all existing packages (in the Quattor profile) with the most recent versions found in the update repositories. Note that this is a blunt approach that does not take into account packages that were added or deleted as part of an update.
Creating a new update branch
Our Quattor hierarchy contains one directory hierarchy per gLite update. This structure permits to select which gLite updates will be installed per cluster or even per individual node. There is also a site-wide default setting.
The easiest way to create an update branch is to copy an existing one:
- cd $L/cfg/grid/glite-3.1/update/
- cp -a 29 31
The copied branch still contains the directories that Subversion uses internally. They need to be removed:
- find $L/cfg/grid/glite-3.1/update/31 -type d -name .svn -exec rm -rf {} \;
The templates in the new branch still contain the namespace paths for the original branch. They need to be corrected because otherwise the compilation will fail.
- for file in `grep -H -r "/29/" 31/* |awk -F : '{print $1}'`; do sed -i "s/\/29\//\/31\//g" $file; done
Then replace the template containing the updates with the contents of the template generated by rpmUpdates.pl. Note that this is not a straightforward operation because the original update template may contain manual additions (or deletions). The generated template only ensures that the newest version of each rpm is installed. Unfortunately this is usually not sufficient for gLite release since an update release may introduce new packages (found in the release repository and thus not processed by rpmUpdates.pl) and package dependencies may require additional rpms to be installed (or deleted). It requires manual editing to ensure that such customizations are copied from the original update template to the newly generated one. Specifically for gLite 3.1, the Torque packages should be uncommented in the generated repository because newer gLite releases contain updates for them in the externals repository.... Note: pay attention to the namespace in the template! The generated one does not match the required namespace and will thus cause compilation errors.
- cp /tmp/31-i386 $L/cfg/grid/glite-3.1/update/31/i386/rpms.tpl
- edit $L/cfg/grid/glite-3.1/update/31/i386/rpms.tpl
Note: if the generated templates were stored somewhere under $L/cfg, they will cause compilation errors because their namespace will not match. They need to be deleted or moved to another directory.
Compilation
Still in your local environment, you need to decide which hosts or cluster(s) will be upgraded. For the initial test, only use the installation testbed (ITB). If there are problems, they will not affect all production machines.
Using the conventions below, in particular the use of the ?= conditional assignment operator, setting GLITE_UPDATE_VERSION for a host takes precedence over the cluster setting, which takes precedence over the gLite default setting.
To define the gLite update version for a specific host, define variable GLITE_UPDATE_VERSION in the object template ($L/cfg/cluster/<clustername>/profiles/profile_<host>.tpl) before the line include machine-type/<type>:
variable GLITE_UPDATE_VERSION ?= "31"; include machine-types/se_dpm_disk;
To define the gLite update version for an entire cluster, set the update version number in $L/cfg/cluster/<clustername>/local/pro_facility_config.tpl:
variable GLITE_UPDATE_VERSION ?= { if ( ! exists( GLITE_VERSION ) ) { error("GLITE_VERSION undefined"); }; if ( GLITE_VERSION == "glite-3.1" ) { return('31'); # gLite 3.1 update 31 } else { return('44'); # gLite 3.0 update 44 }; };
Defining the gLite default update level is done in $L/cfg/grid/glite-3.1/glite/defaults.tpl:
variable GLITE_UPDATE_VERSION ?= '31';
Finally, perform a compilation of the select host(s) or cluster. It is mandatory to use the -u option of makexprof during the first compilation after running the mirror script because the contents of the gLite repository have changed. To perform the first compilation for the ITB:
- makexprof -u -f itb
If there are no problems, the changes may be committed to the SVN repository. The new update branch ($L/cfg/grid/glite-3.1/update/<nr>/) needs to be added to svn:
cd $L/cfg/grid/glite-3.1/update svn add 31 svn commit 31
Don't forget to commit the changes for the host, cluster of gLite-wide defaults!
Deployment
Deploying the new configuration is described in detail in the article How_to_work_with_our_Quattor_setup#Deploying.
Work as ndpfmgr@stal, get the latest from SVN and push the profiles for the selected host(s) or cluster. Again, don't forget to use option -u for the first compilation to refresh the repository templates:
cd $L/cfg svn up pushxprof -u -f itb
Troubleshooting
And finaly-A, since Quattor does some kind of package management
If you get noticed, preferably by a monitoring system (if you don't have one -good luck), that something does not work you could try the following:
- Logon the host who shows problems
- tail /var/log/ncm/ncd.log
2008/10/02-15:25:30 [INFO] Errors while configuring spma (1) 2008/10/02-15:25:30 [ERROR] 1 errors, 0 warnings executing configure
This tells us, that we actualy have to look into
- tail /var/log/spma.log
2008/10/02-15:25:28 [WARN] Errors found: depcheck: package glite-UI 3.1.19-0 needs glite-amga-api-python >= 1.3.0-1 depcheck: package glite-UI 3.1.19-0 needs glite-amga-cli >= 1.3.0-4 there were 2 dependency problem(s) and 0 conflict(s)
With this information now simply visit the web page which has the rpm list for this particular server, in our case: http://glite.web.cern.ch/glite/packages/R3.1/deployment/glite-UI/3.1.20-0/glite-UI-3.1.20-0.html
These packages have to be added to the respective rpm list, step wise:
And finaly-B, again as normal-user@stal
Here the "Quattor package managing" for the ui is wrong. Note that you don't add it to the updates directory, this goes to the "base list".
- In our example the following should work:
cat << EOF >> $L/cfg/grid/glite-3.1/glite/ui/rpms.tpl "/software/packages"=pkg_repl("glite-amga-cli","1.3.0-4","i386"); "/software/packages"=pkg_repl("glite-amga-api-python","1.3.0-1","noarch"); EOF
- makexprof -f prd bosui
- If compilation is successful proceed with:
cvs commit
And finaly-C, again as ndpfmgr@stal
Get the updates commited as normal user above, confirm that they compile, push profile to the node that misses packets (here bosui, an UI)
- cd $L/cfg/grid/glite-3.1/glite/ui
- cvs update rpms.tpl
- makexprof -u -f prd bosui
- if OK, pushxprof -u -f prd bosui
- check monitoring system