NDPF

From GridWiki
Jump to: navigation, search

Contents

Nikhef Data Processing Facility

NDPF TODO List 
pending actions needed on NDPF
Acceptable Use Policy 
also available in Oud-Hollandsch as "Gebruiksvoorwaarden" for those that are linguistically challenged

Hardware Inventory and Connections

NDPF_Systems 
system function and logical network tables NDPF:System_Functions : which node does what, and NDPF:Network Network Connections in the NDPF (all routers and switches)
NDPF_Standalone_Nodes 
list of manually managed hosts, with access, usage and restoration info on the CT Wiki
Hardware_Issues
overview of current hardware problems
NDPF_OPN_Interfaces 
Cabling of the OPN switches and routers
Marsepein node type details
Details of the DELL R630 worker nodes type 'marsepein'
Oliebol node type details
Details of the Supermicro 4u 36 disk node type 'oliebol'
Strijker node type details
Details of the Dell PowerEdge R515 2u 12 disk node type 'strijker'
Biet node type details
Details of the Dell PowerEdge R710 disk node type 'biet'
NDPF_Disk_Servers 
which disk server contains what??
NDPF_DomUlist 
domUs on a Dom0
HW_X4500_MAC_20080417 
MAC addresses "EasterEgg" Sun X4500 Thumpers
HooiThumpersLocation 
Where is my Thumper? Physical locations per serial number
Valentine_memory 
Status and information about the dram read errors on the Valentines
SintMaarten_network 
Analysis of the performance issue with the SintMaarten (HP) worker nodes
Agile_testbed 
Management of the agile test for experimentation
OpenVPN 
How to use the OpenVPN servers
EGI Infra 
Infrastructure and order lists for EGI TF10
NDPF_System_Locations  
old cabling information

Operating procedures and manuals

External_Procedures_and_Tools 
a collection of links to external documentation on procedures and websites "to get things done"
CT Wiki NDPF 
system descriptions of all stand-alone systems, access, usage and restoration
GridPBX 
The Grid Asterisk/Elastix service
SurfVideoConf 
The Big Grid and Nikhef videoconference
DTConsole 
The Cantankerous HoD Guide to accessing machine consoles from your Desktop
Shutting_Down_WorkerNodes 
How to shutdown the worker nodes, for example when the tempearture in the server room becomes too high
WLCG_Accounting 
Guide to assembling the monthly accounting summaries for WLCG
Monthly Review 
Guide to assembling the monthly "vak N review" for the NL-T1 meeting.
GLite on Nikhef Desktops 
Guide to install the glite UI tarball for Nikhef desktop use

User Management

NDPFDirectoryImplementation 
Structure of the NDPF Directory and Authorization in the NDPF
Creating_Pool_Accounts_With_LDAP 
how to create poolaccounts for a new VO (or extend an existing set), or even recover from an empty gridmapdir
Adding local users 
How to create a new user account.
Adding a new VO 
How to add a new VO
NDPF_UidsAndGids 
Uid and Gid number plan for the NDPF clusters

Grid, Quattor and Yaim

GRID Issues 
Articles that might be interesting for the resolution of issues that are seen when working with the GRID.
How to work with our Quattor setup 
How to work with the template repository, compiling and deploying node templates
Quattor package management with yum-based ncm-spma 
Description of package management with the yum-based ncm-spma
Installing a new VM via Quattor
What to do to create a new virtual machine from scratch using our Quattor system.
Installing updates: OS, CAs Quattor, VL-e 
The procedure to download, configure and install updates for the OS via Quattor.
Upgrading Quattor managed glite servers 
Information on how to upgrade Quattor managed glite servers.
Upgrading Quattor Components 
Information on how to upgrade quattor components using the CVS repository on stal and the local tools for rpm management.
Quattor and IPv6 
Summary of what it has been done so far, in order to configure servers via quattor with IPv6 connectivity.
Pan Tutorial 
Information about the Pan language.
RPM Repository 
Information about working with the RPM repository.
Checkdeps
A Quattor tool to verify the RPM dependencies before deploying on a host
Maintaining_local_Yaim_functions 
How to build a release of the local nikhef-yaim-* rpms and include them in the Quattor profiles.
Namespaced Quattor configuration 
Documentation on the Quattor setup, including the namespaced templates
AII version 2 and complex block device schemas 
How to configure complex block device schemas in Pan with AII version 2
Setting up a gLite WMS 
Notes for installing and configuring a gLite WMS
LCAS and LCMAPS installation for gLExec and (GT4) gatekeepers 
What to install and what/how to configure the components
Requesting or Renewing Host certificates 
Guide on the procedure to request a new host certificate or renew an existing certificate

Security Operations

How to ban users with quattor 
Using quattor to ban a user on WMS, CE, DPM nodes.
How to install security updates using quattor 
Update single or multiple rpms which are vulnerable on quattor managed hosts.
HowToLimitSSHScans 
How to limit ssh scans by blocking probing IPs using standard iptables modules on RL4 and EL5
HowTo Disk images, RAID, LVM 
How to access data in disk images created with dd

Misc

Adding a VO to a VOMS server 
How to add a new VO to a VOMS-is-evil server
Monitoring Script 
It monitors the status of the different jobs, in order to identify those that aborted, but still on a queue.
Managing RAID Controllers 
How to deal with failing disks and batteries; how do I find my disk?

Batch Systems and Scheduling

Adding/removing nodes to PBS 
Information on how to add or remove nodes from PBS.
NDPFAccouting 
Information on how the accounting chain works from PBS to the local and APEL accounting portals. (historic information is in Accounting).
VoViewGraphing 
Creating the voview/grisview graphs on www.nikhef.nl/grid/stats, and the cron jobs that run on naab.
Enabling multicore jobs and jobs requesting large amounts of memory 
Overview of the steps to enable flexible resource requests
Maui reservations 
Creating a system reservation in Maui
Various local tools 
Overview of local tools: when_idle (perform an action when a node has drained), mom-taskset (pin a job to the slots/cores allocated), prune_userprocs (remove user processes not related to a batch job).

CVMFS

SoftDrive backend
behind the curtains of SoftDrive.
CVMFS servers and repositories
Overview of the cvmfs repositories hosted at the Stratum-0 and Stratum-1 servers at Nikhef
Adding a new cvmfs repository
How to build a new cvmfs repository at the Stratum-0 server, update its contents, synchronize the contents to the Stratum-1 server and configure the clients to use the repository.

Systems Documentation

NDPFAlarmsSystem 
handling NL-T1 alarms at Nikhef
Nagios Monitoring Setup 
current setup of Nagios monitoring
NDPFSubVersion 
NDPF SubVersion Repository set up
NDPFIpmiNikhefNlBindConfig 
modifying the ipmi.nikhef.nl domain
NDPF rsync backup 
backup of service nodes using rsync and indirect ADSM usage
NDPF MySQL configurations 
Configuration and monitoring of the MySQL service on bedstee
BuildingCentOSXfsKernelModules  
Building the XFS kernel modules for CentOS (if CentOS plus is late)
CricketGraphing 
generating the configuration for network graphs
NDPF Dell switch config 
setup and configuration of the dell switches (disabling port-fast) - required for every new installation
Nortel 4548GT switches 
Luilak/Bulldozer subcluster switch documentation
Arista 7148S switches 
Core switch documentation
Remote usage of the Dell console switches 
How to remotely connect to the Dell console switches.
Serial Consoles 
how to setup your serial (over LAN) console and IMPI 2.0 SoL stuff, even with Xen, on a PE1950
SettingLcdPanelText 
how to set the LCD panel text at runtime on a Dell PowerEdge using IPMI


NDPF_Dell_OpenManage_SA_Installation 
use and configuration of the chassis and other controls of the Dell (PE1950) systems
NDPF_Equinox_ELS16 
documentation for the Equinox ELS-16 TS
NDPF GS environment 
Grid Service (specialties) environment documentation
RAID-1 configuration and management 
how to set up RAID-1 and how to manage RAID devices
SunX4500Documentation 
some tidbits on the X4500 Thumpers
Increasing Thumper filesystems for DPM 
recipe for increasing the logical volume, file system and making it visible to DPM
HP BL460c G6 
How to setup and configure and manage the "Sint Maarten" cluster
HooiMaanden 
How to setup and configure and manage the "Hooimaanden"
Hooikar/hooiwagen
Configuration of DPM disk servers exporting iSCSI-mounted file systems
Bieten
How to setup and configure and manage the "Suiker Bieten"
backup_umts_location 
Information about the backup UMTS connection
Remote Location 
Information about the remote location

Problem Recovery

Procedures to be followed when recovering from major outages. No extensive descriptions, just a list of what to do, without having to think/know much about the topic.

rebooting XCP VMs the hard way 
How to handle a virtual machine that doesn't want to go anymore.
cvmfs errors/warnings 
description of the errors and warnings that cvmfs reports via its Nagios check
Restoring Services 
order in which to bring services back on-line
Resurrecting kuiken 
steps to restart and check the VOMS server kuiken

Troubleshooting

How to figure out various things, diagnose problems, fix them, etc.

Tracing Jobs 
Given a job ID, how to figure out where the job came from.
CREAM CE Troubleshooting 
Error messages and how to solve them for CREAM.
Generating crash reports with abrt 
How to generate core dumps from crashing services

System Utilities

PBS_Caching_Utilities 
The current LCG software can put severe load on a pbs server due to the times it calls qstat (we have seen as many as 25 qstat calls per second). This load can eventually bring the entire system to an halt. In order to reduce the impact of this problem (a full solution requires rewriting part of the grid job manager), Davide wrote a set of utilities that wrap the original pbs commands and provide caching
MoniFarm Utilities 
Graphing package for farm utilisation (used to make the production graphs for the NDPF Facilities)
NDPF_VOBOX_Alice_Renewalscript 
The proxy renewal wrapper script, enhanced with status info


Virtualization

GSP Virtualisation with Xen 
Grid Server Park virtualisation with Xen Cloud Platform
Xen on CentOS 5 
Installation and configuration of XEN Dom0 CentOS 5 x86_64, DomU CentOS 4 i386
Managing the security training sites 
Quick writeup of managing the security training hosts with XCP, cobbler and salt.

Virtualisation Issues

NDPF VMware authentication 
controlling and creating VMs on the central VMware server
NDPF VMware tips 
tips and tricks for generating vmware images
Xen on CentOS 5 - Notes 
Notes for installing and configuring a XEN System on CentOS 5
Xen on CentOS - Automating Installation-Administration  
Quattor managed Xen-Dom0 and DomUs
Xen 3.2, CentOS 5.1 and NAT HOWTO 
HOWTO document to set up Xen with NAT networking

Performance data and analysis

NDPF Node Performance 
performance figures for nodes used for accounting purposes
NDPF Disk Performance 
performance figures for disk servers
RunningSPEC 
how to run the SPEC2006 and SPEC2000 suites at Nikhef, including using the Intel Compiler Suite and SmartHeap
Running HS06 
notes from running the HEPSPEC-06 benchmark
Historical WN information 
which nodes had what names during which time periods, etc.

User Security and eTokens

Using an Aladdin eToken PRO to store grid certificates 
Using an Aladdin eToken PRO with grid certificates (including gridproxy generation)
Using voms-proxy-init on an OSX (10.4 or higher) system 
Tiny how to on how to get voms-proxy-init working on your Mac.

User Tips and Tricks

Passing job requirements through the WMS 
If your jobs need more cores or more memory, and you still want to use the WMS
How to control access rights for LFC/SRM files 
How to control access rights to your files stored on the grid in LFC, SRM or both
Checksumming support in SRM implementations 
An overview of the support for checksumming in different SRM implementations
VO-specific software and modules 
How to install VO-specific software including automatic modules support
Qsub-tunnel 
job submission from the desktop and login systems to the NDPF clusters
Qsub-tunnel-admin 
administering the qsub tunnel and adding new users

Software development tips

How to handle OpenSSL and not get hurt 
Intended for developer that use the OpenSSL library. Organized as a (collective) braindump.
Building gLExec from src rpm 
How to (re)build gLExec from a .src.rpm.
Software development on Arista hardware
Useful tips, how-tos and more.
Funny Curly things
An investigation into the different versions of the cURL tool.
How to work with VOMS
A detailed overview of how VOMS must work and how you need to work with VOMS.

AOB

Grid Opendag preparation 
What to do, checklists, ideas and other pieces of information suited to prepare yourself for the Opendag (Open House) at Nikhef.
Grid contributions Open Dag 2013 
Plans/Programm what to show during the Open Dag 2013, Oct. 5.
Views
Personal tools