NDPF News
At A Glance
Beste gebruikers van de Nikhef Data Processing Faciliteit en SSO, Dear users of the Nikhef Data Processing Facility and SSO, Op dinsdag 8 maart a.s. (volgende week) zal er gepland onderhoud plaatsvinden aan een deel van de netwerk faciliteiten bij Nikhef. De centrale 'router' in het NDPF netwerk zal worden vervangen door een nieuw exemplaar, waarbij alle verbindingen worden verbroken. Vanwege fysieke beperkingen (kabellengtes en kastruimte) is het helaas niet mogelijk deze vervanging uit te voeren zonder onderbreking van de grid en NikIdM diensten. De volgende diensten zullen op 8 maart van 09.00 CET tot ca. 17.00 NIET BESCHIKBAAR zijn: * SSO en federatieve diensten (SURFspot, MailFilter, grid certificaten) * het wijzigen van wachtwoorden of mail aliases * grid computing services op NIKHEF-ELPROD * de qsub-tunnel ('nsub') op ikohefnet desktop systemen * data opgeslagen op tbn18.nikhef.nl * WMS en brokering en andere grid services op .nikhef.nl domeinen Afgezien van SSO en storage zijn al deze diensten ook beschikbaar op de andere BiG Grid sites, zoals bij SARA, RUG-CIT, en HTC-Philips, en op de overige sites in EGI en wLCG. Deze blijven gewoon beschikbaar. Ook andere Nikhef diensten zijn gewoon bereikbaar tijdens dit onderhoud. Deze vervanging is noodzakelijk voor consolidatie van bandbreedte, de introductie van IPv6 in productie binnen de grid netwerken, en ter voorbereiding op high-throughput cloud services binnen BiG Grid. Er is bij ingrijpende werkzaamheden altijd kans dat het misloopt, ondanks onze voorafgaande simulaties en tests - op deze schaal is het onmogelijk om alles van tevoren te testen. Indien dit gebeurt zal aan het eind van deze dag de oude situatie worden hersteld en - na diagnose - op donderdag 10 een nieuwe poging worden ondernomen. Wij hopen op uw begrip! DavidG.
On Tuesday March 8 (next week) intrusive network maintenance will be performed on selected parts of the Nikhef network infrastructure, affecting Grid and NikIdM (single sign-on) services. The routing equipment at the core of the grid network will be replaced and all links have to be reconneced to the new device. Due to physical limitations - cable lengths and cabinet space - this cannot be done without service interruption. The following services WILL NOT BE AVAILABLE on March 8, from 0900-1700 CET: * single sign-on and federative services (SURFspot, MailFilter, certificates) * changing your password or adding email aliases * grid computing services at NIKHEF-ELPROD * the qsub-tunnel ('nsub') from the Nikhef desktop network * access to data stored at tbn18.nikhef.nl * WMS, brokering, and other Grid services hosted on .nikhef.nl domains Apart from the SSO and storage services, alternatives are available at our partner BiG Grid sites, such as SARA, RUG-CIT and HTC-Philips. Also all other sites in EGI and wLCG can be used. Other services at Nikhef are not affected by this maintenance. The new network router allows for consolidation of bandwidth and better interconnects, the introduction of production-level IPv6 services in the grid network, and prepares for the introduction of high-throughput cloud services at Nikhef in the context of BiG Grid. However extensive the planning and testing, there is always the possibility of some horrible failure. At this scale, it is unrealistic to test every possible interaction in the system. Were such a failure to occur, we can restore the old situation at the end of the day, and a new attempt will be done on Thursday March 10 -- of course after due diagnosis of the failure. We hope for your understanding! DavidG.
Actual
An actual overview of downtimes for Nikhef's grid services and those of other BiG Grid sites is present at the Big Grid downtime overview page.
Past announcements
Change in queue names and properties
To improve the uniformity of the computing resources at the various BiG Grid sites towards the users, all sites will define identical queues (concerning name and properties) on their computing systems.
At grid site NIKHEF-ELPROD, we have created new queues that will replace some of the existing queues. The following queues will be removed from the systems per December 7th, 2009:
- "test": replaced by queue "infra";
- "qshort": replaced by queue "short" with a maximum wall time of 4 hours;
- "qlong": replaced by queue "medium" with a maximum wall time of 36 hours.
Note that the replacing queues can already be used.
How does this affect users of the computing infrastructure?
- Users who do not explicitly submit jobs to a specific queue, do not have to take any action;
- Users who put a statement in the .jdl file to select a specific queue may have to change the queue name in the .jdl file;
- Users who directly submit jobs to a Computing Element and queue via the command glite-wms-job-submit using the option
"--resource <CE>:2119/jobmanager-pbs-<QUEUE>"
may have to change the name for <QUEUE>.
WN upgrade to CentOS-5 x86-64 and gLite 3.2
On Monday October 26th, 2009, all worker nodes will be upgraded to CentOS-5 x86-64 with gLite 3.2 middleware. The worker nodes will be put off-line the weekend before (Oct 24/25) to allow running jobs to complete.
Moving grid services to new data center
A new data center has been built at Nikhef. The existing grid infrastructure at Nikhef will be moved to this new data center between 10 August and 21 August. During the migration process, grid services will be unavailable.
This large-scale operation will take place in two phases:
1) Moving grid services and network infrastructure (10-14 August 2009)
During this phase, all grid services at site NIKHEF-ELPROD will be unavailable. For grid users this means that the following services cannot be used:
- Computing services (CEs gazon.nikhef.nl and trekker.nikhef.nl);
- Storage services (SE tbn18.nikhef.nl);
- Job submission services (WMS graszode.nikhef.nl, graspol.nikhef.nl, dorsvlegel.nikhef.nl);
- Requesting renewal of grid certificates via the Dutchgrid CA web site will not be possible at 10 and 11 August (requests can be submitted via mail but will not be processed);
- The web sites www.dutchgrid.nl, www.vl-e.nl and poc.vl-e.nl will be unavailable.
2) Moving compute and storage clusters (15-21 August 2009)
In this phase, the computing and storage clusters will be unavailable. Grid users will not be able to:
- Use the computing services (CEs gazon.nikhef.nl and trekker.nikhef.nl);
- Access certain data files via SE tbn18.nikhef.nl.
We advise all users of the grid infrastructure to:
- Request renewal of grid certificates before August 5th (only if the certificate will expire early or mid-August);
- Use the grid computing services at SARA (CE ce.gina.sara.nl)
- Submit grid jobs via the WMS at SARA (WMS wms.grid.sara.nl)
- Plan their work such, that no access is required to data files via storage element tbn18.nikhef.nl, or to copy relevant data elsewhere.
During the weekend of 25-26 April, work will be done on the electric systems providing power to the grid computing nodes at Nikhef. Therefore, the grid computing nodes ("worker nodes") at Nikhef's grid facility will be unavailable from Friday April 24, 13:00 until Monday April 27, 10:00.
The facility will stop accepting new jobs from Thursday morning 8:00 (the 23rd), to enable jobs to finish before the shutdown of the nodes.
Stopping resource broker services at NIKHEF-ELPROD
The resource broker services at site NIKHEF-ELPROD will be stopped at 02-February-2009. This affects the RBs bosheks.nikhef.nl, boszwijn.nikhef.nl and the alias rb03.nikhef.nl.
Current users of the resource brokers are advised to start submitting their jobs via the replacing WMS servers as soon as possible, but not later than January 15, 2009. Output sandboxes have to be retrieved before February 2, 2009; after this date they will be deleted.
The WMS hosts at NIKHEF-ELPROD are graszode.nikhef.nl and graspol.nikhef.nl (or use the alias wms03.nikhef.nl).
Removing installation of VL-e PoC release 2
The installation of VL-e PoC release 2 will be removed from the worker nodes at site NIKHEF-ELPROD per 02-February-2009. Users of the the PoC should migrate to release 3 of the PoC, which is already available.
Change in services
The Classic Storage Element tbn15.nikhef.nl (alias se03.nikhef.nl) will be removed from the information system at Monday 17-Nov-2008. Access via gridftp to the storage element will remain available until further notice.
After removal of the service from the information system, the lcg-* tools (e.g. lcg-cr, lcg-cp) can no longer be used to access the data on this storage element.data on this storage element.
EGEE-broadcast: https://cic.gridops.org/index.php?section=rc&page=broadcastretrieval&step=2&typeb=C&idbroadcast=37321