Difference between revisions of "NDPF"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 141: Line 141:
 
== Software development tips ==
 
== Software development tips ==
  
; [[How to handle OpenSSL and not get hurt]] : Intended for the developer of OpenSSL. Organized as a (collective) braindump.
+
; [[How to handle OpenSSL and not get hurt]] : Intended for developer that use the OpenSSL library. Organized as a (collective) braindump.

Revision as of 06:39, 30 September 2009

Nikhef Data Processing Facility

NDPF TODO List
pending actions needed on NDPF
Voorbereiding verhuizing
Overzicht van voorbereidingen voor de verhuizing van het data center


Hardware Inventory and Connections

NDPF_Node_Functions
which node does what?? table of IPMI interfaces
NDPF_Standalone_Nodes
list of manually managed hosts
Hardware_Issues
overview of current hardware problems
NetworkDeelConnections
Network Connections in the NDPF (all routers and switches)
NDPF_OPN_Interfaces
Cabling of the OPN switches and routers
NDPF_Disk_Servers
which disk server contains what??
NDPF_DomUlist
domUs on a Dom0
NDPF H139
Layout of the cabinets in the server room
HW_X4500_MAC_20080417
MAC addresses "EasterEgg" Sun X4500 Thumpers
HooiThumpersLocation
Where is my Thumper? Physical locations per serial number
Valentine_memory
Status and information about the dram read errors on the Valentines
Agile_testbed
Management of the agile test for experimentation
OpenVPN
How to use the OpenVPN servers

Operating procedures and manuals

External_Procedures_and_Tools
a collection of links to external documentation on procedures and websites "to get things done"
GridPBX
The Grid Asterisk/Elastix service
DTConsole
The Cantankerous HoD Guide to accessing machine consoles from your Desktop
Shutting_Down_WorkerNodes
How to shutdown the worker nodes, for example when the tempearture in the server room becomes too high
WLCG_Accounting
Guide to assembling the monthly accounting summaries for WLCG
GLite on Nikhef Desktops
Guide to install the glite UI tarball for Nikhef desktop use

User Management

NDPFDirectoryImplementation
Structure of the NDPF Directory and Authorization in the NDPF
Creating_Pool_Accounts_With_LDAP
how to create poolaccounts for a new VO (or extend an existing set), or even recover from an empty gridmapdir
Adding local users
How to create a new user account.
Adding a new VO
How to add a new VO
NDPF_UidsAndGids
Uid and Gid number plan for the NDPF clusters

Grid, Quattor and Yaim

How to work with our Quattor setup
How to work with the template repository, compiling and deploying node templates
Installing updates: OS, CAs Quattor, VL-e
The procedure to download, configure and install updates for the OS via Quattor.
Upgrading Quattor managed glite servers
Information on how to upgrade Quattor managed glite servers.
Upgrading Quattor Components
Information on how to upgrade quattor components using the CVS repository on stal and the local tools for rpm management.
Maintaining_local_Yaim_functions
How to build a release of the local nikhef-yaim-* rpms and include them in the Quattor profiles.
Namespaced Quattor configuration
Documentation on the Quattor setup, including the namespaced templates
AII version 2 and complex block device schemas
How to configure complex block device schemas in Pan with AII version 2
Setting up a gLite WMS
Notes for installing and configuring a gLite WMS
LCAS and LCMAPS installation for gLExec and (GT4) gatekeepers
What to install and what/how to configure the components
Requesting or Renewing Host certificates
Guide on the procedure to request a new host certificate or renew an existing certificate

Security Operations

How to ban users with quattor
Using quattor to ban a user on WMS, CE, DPM nodes.
How to install security updates using quattor
Update single or multiple rpms which are vulnerable on quattor managed hosts.
HowToLimitSSHScans
How to limit ssh scans by blocking probing IPs using standard iptables modules on RL4 and EL5

Misc

Adding a VO to a VOMS server
How to add a new VO to a VOMS-is-evil server

Batch Systems and Scheduling

Adding/removing nodes to PBS
Information on how to add or remove nodes from PBS.
NDPFAccouting
Information on how the accounting chain works from PBS to the local and APEL accounting portals. (historic information is in Accounting).
VoViewGraphing
Creating the voview/grisview graphs on www.nikhef.nl/grid/stats, and the cron jobs that run on naab.

Systems Documentation

NDPFAlarmsSystem
handling NL-T1 alarms at Nikhef
Nagios Monitoring Setup
current setup of Nagios monitoring
NDPFSubVersion
NDPF SubVersion Repository set up
NDPFIpmiNikhefNlBindConfig
modifying the ipmi.nikhef.nl domain
NDPF rsync backup
backup of service nodes using rsync and indirect ADSM usage
NDPF MySQL configurations
Configuration and monitoring of the MySQL service on bedstee
BuildingCentOSXfsKernelModules
Building the XFS kernel modules for CentOS (if CentOS plus is late)
CricketGraphing
generating the configuration for network graphs
NDPF Dell switch config
setup and configuration of the dell switches (disabling port-fast) - required for every new installation
Nortel 4548GT switches
Luilak/Bulldozer subcluster switch documentation
Arista 7148S switches
Core switch documentation
Remote usage of the Dell console switches
How to remotely connect to the Dell console switches.
Serial Consoles
how to setup your serial (over LAN) console and IMPI 2.0 SoL stuff, even with Xen, on a PE1950
SettingLcdPanelText
how to set the LCD panel text at runtime on a Dell PowerEdge using IPMI


NDPF_Dell_OpenManage_SA_Installation
use and configuration of the chassis and other controls of the Dell (PE1950) systems
NDPF_Equinox_ELS16
documentation for the Equinox ELS-16 TS
NDPF GS environment
Grid Service (specialties) environment documentation
RAID-1 configuration and management
how to set up RAID-1 and how to manage RAID devices
SunX4500Documentation
some tidbits on the X4500 Thumpers
Increasing Thumper filesystems for DPM
recipe for increasing the logical volume, file system and making it visible to DPM
backup_umts_location
Information about the backup UMTS connection
Remote Location
Information about the remote location

Problem Recovery

Procedures to be followed when recovering from major outages. No extensive descriptions, just a list of what to do, without having to think/know much about the topic.

Restoring Services
order in which to bring services back on-line
Resurrecting kuiken
steps to restart and check the VOMS server kuiken

System Utilities

PBS_Caching_Utilities
The current LCG software can put severe load on a pbs server due to the times it calls qstat (we have seen as many as 25 qstat calls per second). This load can eventually bring the entire system to an halt. In order to reduce the impact of this problem (a full solution requires rewriting part of the grid job manager), Davide wrote a set of utilities that wrap the original pbs commands and provide caching
MoniFarm Utilities
Graphing package for farm utilisation (used to make the production graphs for the NDPF Facilities)
NDPF_VOBOX_Alice_Renewalscript
The proxy renewal wrapper script, enhanced with status info


Virtualization

Xen on CentOS 5
Installation and configuration of XEN Dom0 CentOS 5 x86_64, DomU CentOS 4 i386

Virtualisation Issues

NDPF VMware authentication
controlling and creating VMs on the central VMware server
NDPF VMware tips
tips and tricks for generating vmware images
Xen on CentOS 5 - Notes
Notes for installing and configuring a XEN System on CentOS 5
Xen on CentOS - Automating Installation-Administration
Quattor managed Xen-Dom0 and DomUs
Xen 3.2, CentOS 5.1 and NAT HOWTO
HOWTO document to set up Xen with NAT networking

Performance data and analysis

NDPF Node Performance
performance figures for nodes used for accounting purposes
NDPF Disk Performance
performance figures for disk servers
RunningSPEC
how to run the SPEC2006 and SPEC2000 suites at Nikhef, including using the Intel Compiler Suite and SmartHeap

User Security and eTokens

Using an Aladdin eToken PRO to store grid certificates
Using an Aladdin eToken PRO with grid certificates (including gridproxy generation)


User Tips and Tricks

How to control access rights for LFC/SRM files
How to control access rights to your files stored on the grid in LFC, SRM or both
Checksumming support in SRM implementations
An overview of the support for checksumming in different SRM implementations


Software development tips

How to handle OpenSSL and not get hurt
Intended for developer that use the OpenSSL library. Organized as a (collective) braindump.