Difference between revisions of "NDPF News"

From PDP/Grid Wiki
Jump to navigationJump to search
m
Line 80: Line 80:
  
 
An actual overview of downtimes for Nikhef's grid services and those of other BiG Grid sites is present at the [http://www.nikhef.nl/~ronalds/downtime/index.php Big Grid downtime overview page].
 
An actual overview of downtimes for Nikhef's grid services and those of other BiG Grid sites is present at the [http://www.nikhef.nl/~ronalds/downtime/index.php Big Grid downtime overview page].
 
 
 
= Past announcements =
 
 
=== Change in queue names and properties ===
 
 
To improve the uniformity of the computing resources at the various BiG Grid sites towards the users, all sites will define identical queues (concerning name and properties) on their computing systems.
 
 
At grid site NIKHEF-ELPROD, we have created new queues that will replace some of the existing queues. The following queues will be removed from the systems per December 7th, 2009:
 
* "test": replaced by queue "infra";
 
* "qshort": replaced by queue "short" with a maximum wall time of 4 hours;
 
* "qlong": replaced by queue "medium" with a maximum wall time of 36 hours.
 
Note that the replacing queues can already be used.
 
 
How does this affect users of the computing infrastructure?
 
* Users who do not explicitly submit jobs to a specific queue, do not have to take any action;
 
* Users who put a statement in the .jdl file to select a specific queue may have to change the queue name in the .jdl file;
 
* Users who directly submit jobs to a Computing Element and queue via the command glite-wms-job-submit using the option
 
  "--resource <CE>:2119/jobmanager-pbs-<QUEUE>"
 
may have to change the name for <QUEUE>.
 
 
== WN upgrade to CentOS-5 x86-64 and gLite 3.2 ==
 
 
On Monday October 26th, 2009, all worker nodes will be upgraded to CentOS-5 x86-64 with gLite 3.2 middleware. The worker nodes will be put off-line the weekend before (Oct 24/25) to allow running jobs to complete.
 
 
== Moving grid services to new data center ==
 
 
A new data center has been built at Nikhef. The existing grid
 
infrastructure at Nikhef will be moved to this new data center between
 
10 August and 21 August. During the migration process, grid services
 
will be unavailable.
 
 
This large-scale operation will take place in two phases:
 
 
=== 1) Moving grid services and network infrastructure (10-14 August 2009) ===
 
During this phase, all grid services at site NIKHEF-ELPROD will be
 
unavailable.
 
For grid users this means that the following services cannot be used:
 
* Computing services (CEs gazon.nikhef.nl and trekker.nikhef.nl);
 
* Storage services (SE tbn18.nikhef.nl);
 
* Job submission services (WMS graszode.nikhef.nl, graspol.nikhef.nl, dorsvlegel.nikhef.nl);
 
* Requesting renewal of grid certificates via the Dutchgrid CA web site will not be possible at 10 and 11 August (requests can be submitted via mail but will not be processed);
 
* The web sites www.dutchgrid.nl, www.vl-e.nl and poc.vl-e.nl will be unavailable.
 
 
=== 2) Moving compute and storage clusters (15-21 August 2009) ===
 
In this phase, the computing and storage clusters will be unavailable.
 
Grid users will not be able to:
 
* Use the computing services (CEs gazon.nikhef.nl and trekker.nikhef.nl);
 
* Access certain data files via SE tbn18.nikhef.nl.
 
 
We advise all users of the grid infrastructure to:
 
* Request renewal of grid certificates before August 5th (only if the certificate will expire early or mid-August);
 
* Use the grid computing services at SARA (CE ce.gina.sara.nl)
 
* Submit grid jobs via the WMS at SARA (WMS wms.grid.sara.nl)
 
* Plan their work such, that no access is required to data files via storage element tbn18.nikhef.nl, or to copy relevant data elsewhere.
 
 
== Grid computing nodes unavailable 24-27 April ==
 
 
During the weekend of 25-26 April, work will be done on the electric
 
systems providing power to the grid computing nodes at Nikhef.
 
Therefore, the grid computing nodes ("worker nodes") at Nikhef's grid
 
facility will be unavailable from Friday April 24, 13:00 until Monday
 
April 27, 10:00.
 
 
The facility will stop accepting new jobs from Thursday morning 8:00
 
(the 23rd), to enable jobs to finish before the shutdown of the nodes.
 
 
== Stopping resource broker services at NIKHEF-ELPROD ==
 
 
The resource broker services at site NIKHEF-ELPROD will be stopped at 02-February-2009. This affects the RBs bosheks.nikhef.nl, boszwijn.nikhef.nl and the alias rb03.nikhef.nl.
 
 
Current users of the resource brokers are advised to start submitting their jobs via the replacing WMS servers as soon as possible, but not later than January 15, 2009. Output sandboxes have to be retrieved before February 2, 2009; after this date they will be deleted.
 
 
The WMS hosts at NIKHEF-ELPROD are graszode.nikhef.nl and graspol.nikhef.nl (or use the alias wms03.nikhef.nl).
 
 
 
== Removing installation of VL-e PoC release 2 ==
 
 
The installation of VL-e PoC release 2 will be removed from the worker nodes at site NIKHEF-ELPROD per 02-February-2009. Users of the the PoC should migrate to release 3 of the PoC, which is already available.
 
 
 
 
== Change in services ==
 
 
The Classic Storage Element tbn15.nikhef.nl (alias se03.nikhef.nl) will be removed from the information system at Monday 17-Nov-2008. Access via gridftp to the storage element will remain available until further notice.
 
 
After removal of the service from the information system, the lcg-* tools (e.g. lcg-cr, lcg-cp) can no longer be used to access the data on this storage element.data on this storage element.
 
 
EGEE-broadcast: https://cic.gridops.org/index.php?section=rc&page=broadcastretrieval&step=2&typeb=C&idbroadcast=37321
 

Revision as of 08:56, 3 March 2011

At A Glance

Beste gebruikers van de Nikhef Data Processing Faciliteit en SSO,
Dear users of the Nikhef Data Processing Facility and SSO,

Op dinsdag 8 maart a.s. (volgende week) zal er gepland onderhoud
plaatsvinden aan een deel van de netwerk faciliteiten bij Nikhef. De
centrale 'router' in het NDPF netwerk zal worden vervangen door een
nieuw exemplaar, waarbij alle verbindingen worden verbroken. Vanwege
fysieke beperkingen (kabellengtes en kastruimte) is het helaas niet
mogelijk deze vervanging uit te voeren zonder onderbreking van de
grid en NikIdM diensten.

De volgende diensten zullen op 8 maart van 09.00 CET tot ca. 17.00
NIET BESCHIKBAAR zijn:
* SSO en federatieve diensten (SURFspot, MailFilter, grid certificaten)
* het wijzigen van wachtwoorden of mail aliases
* grid computing services op NIKHEF-ELPROD
* de qsub-tunnel ('nsub') op ikohefnet desktop systemen
* data opgeslagen op tbn18.nikhef.nl
* WMS en brokering en andere grid services op .nikhef.nl domeinen

Afgezien van SSO en storage zijn al deze diensten ook beschikbaar op de
andere BiG Grid sites, zoals bij SARA, RUG-CIT, en HTC-Philips, en
op de overige sites in EGI en wLCG. Deze blijven gewoon beschikbaar.
Ook andere Nikhef diensten zijn gewoon bereikbaar tijdens dit onderhoud.

Deze vervanging is noodzakelijk voor consolidatie van bandbreedte, de
introductie van IPv6 in productie binnen de grid netwerken, en ter
voorbereiding op high-throughput cloud services binnen BiG Grid.

Er is bij ingrijpende werkzaamheden altijd kans dat het misloopt, ondanks
onze voorafgaande simulaties en tests - op deze schaal is het onmogelijk om 
alles van tevoren te testen. Indien dit gebeurt zal aan het eind van deze 
dag de oude situatie worden hersteld en - na diagnose - op donderdag 10 een
nieuwe poging worden ondernomen.

Wij hopen op uw begrip!

	DavidG.


On Tuesday March 8 (next week) intrusive network maintenance will be
performed on selected parts of the Nikhef network infrastructure, affecting
Grid and NikIdM (single sign-on) services. The routing equipment at the
core of the grid network will be replaced and all links have to be
reconneced to the new device. Due to physical limitations - cable lengths
and cabinet space - this cannot be done without service interruption.

The following services WILL NOT BE AVAILABLE on March 8, from 0900-1700 CET:
* single sign-on and federative services (SURFspot, MailFilter, certificates)
* changing your password or adding email aliases
* grid computing services at NIKHEF-ELPROD
* the qsub-tunnel ('nsub') from the Nikhef desktop network
* access to data stored at tbn18.nikhef.nl
* WMS, brokering, and other Grid services hosted on .nikhef.nl domains

Apart from the SSO and storage services, alternatives are available at
our partner BiG Grid sites, such as SARA, RUG-CIT and HTC-Philips. Also
all other sites in EGI and wLCG can be used.
Other services at Nikhef are not affected by this maintenance.

The new network router allows for consolidation of bandwidth and better
interconnects, the introduction of production-level IPv6 services in the
grid network, and prepares for the introduction of high-throughput
cloud services at Nikhef in the context of BiG Grid.

However extensive the planning and testing, there is always the possibility
of some horrible failure. At this scale, it is unrealistic to test every
possible interaction in the system. Were such a failure to occur, we can 
restore the old situation at the end of the day, and a new attempt will be 
done on Thursday March 10 -- of course after due diagnosis of the failure.

We hope for your understanding!

	DavidG.

Actual

An actual overview of downtimes for Nikhef's grid services and those of other BiG Grid sites is present at the Big Grid downtime overview page.