NDPF
From PDP/Grid Wiki
Revision as of 13:47, 29 July 2009 by Janjust@nikhef.nl (talk | contribs) (→Operating procedures and manuals)
Nikhef Data Processing Facility
- NDPF TODO List
- pending actions needed on NDPF
- Voorbereiding verhuizing
- Overzicht van voorbereidingen voor de verhuizing van het data center
Hardware Inventory and Connections
- NDPF_Node_Functions
- which node does what?? table of IPMI interfaces
- NetworkDeelConnections
- Network Connections in the NDPF (all routers and switches)
- NDPF_OPN_Interfaces
- Cabling of the OPN switches and routers
- NDPF_Disk_Servers
- which disk server contains what??
- NDPF_DomUlist
- domUs on a Dom0
- NDPF H139
- Layout of the cabinets in the server room
- HW_X4500_MAC_20080417
- MAC addresses "EasterEgg" Sun X4500 Thumpers
- HooiThumpersLocation
- Where is my Thumper? Physical locations per serial number
- Valentine_memory
- Status and information about the dram read errors on the Valentines
- OpenVPN
- How to use the OpenVPN servers
Operating procedures and manuals
- External_Procedures_and_Tools
- a collection of links to external documentation on procedures and websites "to get things done"
- GridPBX
- The Grid Asterisk/Elastix service
- DTConsole
- The Cantankerous HoD Guide to accessing machine consoles from your Desktop
- Shutting_Down_WorkerNodes
- How to shutdown the worker nodes, for example when the tempearture in the server room becomes too high
- WLCG_Accounting
- Guide to assembling the monthly accounting summaries for WLCG
- Glite on Nikhef Desktops
- Guide to install the glite UI tarball for Nikhef desktop use
User Management
- NDPFDirectoryImplementation
- Structure of the NDPF Directory and Authorization in the NDPF
- Creating_Pool_Accounts_With_LDAP
- how to create poolaccounts for a new VO (or extend an existing set), or even recover from an empty gridmapdir
- Adding local users
- How to create a new user account.
- Adding a new VO
- How to add a new VO
- NDPF_UidsAndGids
- Uid and Gid number plan for the NDPF clusters
Grid, Quattor and Yaim
- How to work with our Quattor setup
- How to work with the template repository, compiling and deploying node templates
- Installing updates: OS, CAs Quattor, VL-e
- The procedure to download, configure and install updates for the OS via Quattor.
- Upgrading Quattor managed glite servers
- Information on how to upgrade Quattor managed glite servers.
- Upgrading Quattor Components
- Information on how to upgrade quattor components using the CVS repository on stal and the local tools for rpm management.
- Maintaining_local_Yaim_functions
- How to build a release of the local nikhef-yaim-* rpms and include them in the Quattor profiles.
- Namespaced Quattor configuration
- Documentation on the Quattor setup, including the namespaced templates
- AII version 2 and complex block device schemas
- How to configure complex block device schemas in Pan with AII version 2
- Setting up a gLite WMS
- Notes for installing and configuring a gLite WMS
- LCAS and LCMAPS installation for gLExec and (GT4) gatekeepers
- What to install and what/how to configure the components
- Requesting or Renewing Host certificates
- Guide on the procedure to request a new host certificate or renew an existing certificate
Security Operations
- How to ban users with quattor
- Using quattor to ban a user on WMS, CE, DPM nodes.
- How to install security updates using quattor
- Update single or multiple rpms which are vulnerable on quattor managed hosts.
Misc
- Adding a VO to a VOMS server
- How to add a new VO to a VOMS-is-evil server
Batch Systems and Scheduling
- Adding/removing nodes to PBS
- Information on how to add or remove nodes from PBS.
- NDPFAccouting
- Information on how the accounting chain works from PBS to the local and APEL accounting portals. (historic information is in Accounting).
- VoViewGraphing
- Creating the voview/grisview graphs on www.nikhef.nl/grid/stats, and the cron jobs that run on naab.
Systems Documentation
- NDPFAlarmsSystem
- handling NL-T1 alarms at Nikhef
- Nagios Monitoring Setup
- current setup of Nagios monitoring
- NDPFSubVersion
- NDPF SubVersion Repository set up
- NDPFIpmiNikhefNlBindConfig
- modifying the ipmi.nikhef.nl domain
- NDPF rsync backup
- backup of service nodes using rsync and indirect ADSM usage
- NDPF MySQL configurations
- Configuration and monitoring of the MySQL service on bedstee
- BuildingCentOSXfsKernelModules
- Building the XFS kernel modules for CentOS (if CentOS plus is late)
- CricketGraphing
- generating the configuration for network graphs
- NDPF Dell switch config
- setup and configuration of the dell switches (disabling port-fast) - required for every new installation
- Nortel 4548GT switches
- Valentine subcluster switch documentation
- Remote usage of the Dell console switches
- How to remotely connect to the Dell console switches.
- Serial Consoles
- how to setup your serial (over LAN) console and IMPI 2.0 SoL stuff, even with Xen, on a PE1950
- SettingLcdPanelText
- how to set the LCD panel text at runtime on a Dell PowerEdge using IPMI
- NDPF_Dell_OpenManage_SA_Installation
- use and configuration of the chassis and other controls of the Dell (PE1950) systems
- NDPF_Equinox_ELS16
- documentation for the Equinox ELS-16 TS
- NDPF GS environment
- Grid Service (specialties) environment documentation
- RAID-1 configuration and management
- how to set up RAID-1 and how to manage RAID devices
- SunX4500Documentation
- some tidbits on the X4500 Thumpers
- Increasing Thumper filesystems for DPM
- recipe for increasing the logical volume, file system and making it visible to DPM
- backup_umts_location
- Information about the backup UMTS connection
- Remote Location
- Information about the remote location
Problem Recovery
Procedures to be followed when recovering from major outages. No extensive descriptions, just a list of what to do, without having to think/know much about the topic.
- Restoring Services
- order in which to bring services back on-line
- Resurrecting kuiken
- steps to restart and check the VOMS server kuiken
System Utilities
- PBS_Caching_Utilities
- The current LCG software can put severe load on a pbs server due to the times it calls qstat (we have seen as many as 25 qstat calls per second). This load can eventually bring the entire system to an halt. In order to reduce the impact of this problem (a full solution requires rewriting part of the grid job manager), Davide wrote a set of utilities that wrap the original pbs commands and provide caching
- MoniFarm Utilities
- Graphing package for farm utilisation (used to make the production graphs for the NDPF Facilities)
- NDPF_VOBOX_Alice_Renewalscript
- The proxy renewal wrapper script, enhanced with status info
Virtualization
- Xen on CentOS 5
- Installation and configuration of XEN Dom0 CentOS 5 x86_64, DomU CentOS 4 i386
Virtualisation Issues
- NDPF VMware authentication
- controlling and creating VMs on the central VMware server
- NDPF vmware tips
- tips and tricks for generating vmware images
- Xen on CentOS 5 - Notes
- Notes for installing and configuring a XEN System on CentOS 5
- Xen on CentOS - Automating Installation-Administration
- Quator managed Xen-Dom0 and DomUs
- Xen 3.2, CentOS 5.1 and NAT HOWTO
- HOWTO document to set up Xen with NAT networking
Performance data and analysis
- NDPF Node Performance
- performance figures for nodes used for accounting purpuses
- NDPF Disk Performance
- performance figures for disk servers
- RunningSPEC
- how to run the SPEC2006 and SPEC2000 suites at Nikhef, including using the Intel Compiler Suite and SmartHeap
User Security and eTokens
- Using an Aladdin eToken PRO to store grid certificates
- Using an Aladdin eToken PRO with grid certificates (including gridproxy generation)