Difference between revisions of "NDPF TODO List"

From PDP/Grid Wiki
Jump to navigationJump to search
(Blanked the page)
 
(28 intermediate revisions by 9 users not shown)
Line 1: Line 1:
== Enable backup on hooimijt ==
 
  
Het startup script staat er wel (/etc/init.d/adsm), maar zodra did gestart wordt gaat de
 
load naar oneindig omdat de tsmc eerst een expiry van de /export/cache gaat doen. Dat duurt
 
volgens mij een paar uur waarin de hele machine unresponsive is (het tsmc process zit dan al die tijd in een "D" state). Daar kan de rest van de farm dus niet tegen.
 
 
Eerst even bij SARA de oude backup laten moven, en dan nog eens proberen op een rustig moment? (niet weggooien! :-)
 
 
== Upgrade Torque ==
 
New version (2.0) of Torque is out.  This one includes TMPDIR patch IIRC.
 
Time to upgrade.  Misschien moeten we JT de Torque server laten
 
vernietigen.
 
De laatste RPMs hiervoor (van SteveT met de TMPDIR patches &c) staan nu op
 
  http://hepunx.rl.ac.uk/~traylens/rpms/torque
 
maar SteveT waarschuwde nog wel: "The newer ones have had less than a day of testing so be warned."
 
 
VOor SteveT's versie zijn de startup scripts verschillend (niet meer een enkele "/etc/init.d/pbs",
 
maar een setje "pbs_{mom,sched,sever}". Daarop moeten de startup configs in de Quattor config
 
wel worden aangepast (nu staat daar nog een manual override in de local/ config). Zie voor
 
de sources ook:
 
  http://www.gridpp.rl.ac.uk/viewcvs/viewcvs.cgi/torque/
 
 
 
== Check pool accounts ==
 
Apparently things can go wrong if we have e.g.
 
 
dteamsm01
 
 
and
 
 
dteam001
 
 
as pool accounts for 'dteamsm' and 'dteam' ... because 'dteamsm01' is a valid
 
pool account for .dteam.  Check and repair.
 
== VOBOX installation ==
 
LCG now has an official "VOBOX" profile.
 
We need to install one of these, evaluate
 
it and based on what we see, maybe install
 
one more.
 
 
== Fixes needed for information published to BDII ==
 
There are a number of new attributes in the GlueVOView blocks that are not yet being published,
 
like the software dir and data dir.  This is going to require some serious quattor work and
 
is not a task to be taken lightly.
 
 
Also there is a warning in the GIIS monitor that our
 
publishing of teras.sara.nl as a close SE is failing some sanity checks.  This last one may
 
be a fault in the test, someone needs to looks carefully at this.
 
== Update of Resource Broker ==
 
See [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0509&L=lcg-rollout&F=&S=&P=34762 message on LCG-ROLLOUT]
 
== R-GMA GIN Update ==
 
See [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0510&L=lcg-rollout&F=&S=&P=391 message on LCG-ROLLOUT]
 
== LFC and DPM Updates ==
 
After we get a DPM, we need to install [http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0510&L=lcg-rollout&F=&S=&P=43649 these updates ]
 
== FTS client ==
 
Once we get a DPM we can start doing service challenge stuff.  For this we will need an
 
FTS client.  Here is some info from Gavin McCance:
 
 
Clients:
 
 
Either manually: [ https://uimon.cern.ch/twiki/bin/view/LCG/FtsClientInstall13 Link]
 
 
or with whatever comes with LCG-2.6.0 (yaim?)
 
 
The difficulty is the FTS server URL (i.e. "where does my client commandline
 
tool point to"). Currently, we have not finished our integration with the
 
BDII (or indeed any information system) so we use a file in the local
 
filesystem. An XML file as well.
 
 
[ https://uimon.cern.ch/twiki/bin/view/LCG/FtsClientInstall13 Link]
 
 
describes the format of this services.xml file - but it really only suitable
 
to point to client commandline tool to *one* FTS server. It is possible to
 
point it to multiple servers and use the "-s" option of the commandlinme
 
tool to select between them, but it's a hack that will have to go away once
 
we do it properly with the BDII, so I wouldn't rely on it.
 
 
Clients at NIKHEF: point to the SARA server.
 
 
Clients at SARA: point to either CERN T0 server or the local SARA server. If
 
the same client on a given machine needs to point to both servers (the CERN
 
one to manage the T0-T1 and the SARA on to manage the T1-T2) then use the
 
"-s" option with two entries in the service.xml file.
 
 
hope this helps,
 
cheers,
 
gav
 
== Cleanpool Script ==
 
The '''bad''' version of the cleanpool script is in /export/perm/adm/bin.
 
It needs to have the apostrophes removed, be put back into test mode.
 
This means to replace the "rm -fr" and "rmdir" stuff with "ls -l"
 
and see what it thinks it will do.  If this all looks OK, it needs to
 
be tested with the "rm" commands put back, but in a sandboxed environment,
 
say copy a pool directory somewhere and try there.
 
 
== hooibaal OS upgrade ==
 
"hooibaal" is nu de laatste RH73 machine  :-)
 
 
== ganglia monitoring ==
 
ganglia monitoring op tbn06 is zo goed als dood. Tijd voor een nieuwe
 
machine die ganglia/syslog/auditing op zich gaat nemen. Dat moet maar een 'nieuwe'
 
bak zijn (en geen oude pizza0 class doos).
 
== more LCG updates ==
 
Two updates have been made available to LCG-2_6_0.
 
 
edg-utils-system-1.8.0-1_sl3.noarch.rpm
 
edg-utils-user-1.8.0-1_sl3.noarch.rpm
 
 
They are in the apt/yum updates folder
 
 
[ http://grid-deployment.web.cern.ch/grid-deployment/gis/apt/LCG-2_6_0/sl3/en/i386/RPMS.lcg_sl3.updates/ link ]
 

Latest revision as of 15:24, 22 March 2012