Difference between revisions of "NDPF"
From PDP/Grid WikiJump to navigationJump to search
|Line 221:||Line 221:|
; [[Updating a kernel module for XenServer 7]] : Getting Intel X722 10Gbps SFP+ cards to work with XenServer 7.0.
; [[Updating a kernel module for XenServer 7]] : Getting Intel X722 10Gbps SFP+ cards to work with XenServer 7.0.
Revision as of 14:57, 15 August 2019
Nikhef Data Processing Facility
- NDPF TODO List
- pending actions needed on NDPF
- Acceptable Use Policy
- also available in Oud-Hollandsch as "Gebruiksvoorwaarden" for those that are linguistically challenged
Hardware Inventory and Connections
- system function and logical network tables NDPF:System_Functions : which node does what, and NDPF:Network Network Connections in the NDPF (all routers and switches)
- list of manually managed hosts, with access, usage and restoration info on the CT Wiki
- overview of current hardware problems
- Cabling of the OPN switches and routers
- Marsepein node type details
- Details of the DELL R630 worker nodes type 'marsepein'
- Oliebol node type details
- Details of the Supermicro 4u 36 disk node type 'oliebol'
- Strijker node type details
- Details of the Dell PowerEdge R515 2u 12 disk node type 'strijker'
- Biet node type details
- Details of the Dell PowerEdge R710 disk node type 'biet'
- which disk server contains what??
- domUs on a Dom0
- MAC addresses "EasterEgg" Sun X4500 Thumpers
- Where is my Thumper? Physical locations per serial number
- Status and information about the dram read errors on the Valentines
- Analysis of the performance issue with the SintMaarten (HP) worker nodes
- Management of the agile test for experimentation
- How to use the OpenVPN servers
- EGI Infra
- Infrastructure and order lists for EGI TF10
- old cabling information
Operating procedures and manuals
- a collection of links to external documentation on procedures and websites "to get things done"
- Guide to using the salt/reclass configuration management system that is taking over from quattor.
- CT Wiki NDPF
- system descriptions of all stand-alone systems, access, usage and restoration
- The Grid Asterisk/Elastix service
- The Big Grid and Nikhef videoconference
- The Cantankerous HoD Guide to accessing machine consoles from your Desktop
- How to shutdown the worker nodes, for example when the tempearture in the server room becomes too high
- Guide to assembling the monthly accounting summaries for WLCG
- Monthly Review
- Guide to assembling the monthly "vak N review" for the NL-T1 meeting.
- GLite on Nikhef Desktops
- Guide to install the glite UI tarball for Nikhef desktop use
- Structure of the NDPF Directory and Authorization in the NDPF
- Creating Pool Accounts With LDAP
- how to create poolaccounts for a new VO (or extend an existing set), or even recover from an empty gridmapdir
- Adding local users
- How to create a new user account.
- Adding a new VO
- How to add a new VO
- Uid and Gid number plan for the NDPF clusters
Grid, Quattor and Yaim
- GRID Issues
- Articles that might be interesting for the resolution of issues that are seen when working with the GRID.
- How to work with our Quattor setup
- How to work with the template repository, compiling and deploying node templates
- Quattor package management with yum-based ncm-spma
- Description of package management with the yum-based ncm-spma
- Installing a new VM via Quattor
- What to do to create a new virtual machine from scratch using our Quattor system.
- Installing updates: OS, CAs Quattor, VL-e
- The procedure to download, configure and install updates for the OS via Quattor.
- Upgrading Quattor managed glite servers
- Information on how to upgrade Quattor managed glite servers.
- Upgrading Quattor Components
- Information on how to upgrade quattor components using the CVS repository on stal and the local tools for rpm management.
- Quattor and IPv6
- Summary of what it has been done so far, in order to configure servers via quattor with IPv6 connectivity.
- Pan Tutorial
- Information about the Pan language.
- RPM Repository
- Information about working with the RPM repository.
- A Quattor tool to verify the RPM dependencies before deploying on a host
- How to build a release of the local nikhef-yaim-* rpms and include them in the Quattor profiles.
- Namespaced Quattor configuration
- Documentation on the Quattor setup, including the namespaced templates
- AII version 2 and complex block device schemas
- How to configure complex block device schemas in Pan with AII version 2
- Setting up a gLite WMS
- Notes for installing and configuring a gLite WMS
- LCAS and LCMAPS installation for gLExec and (GT4) gatekeepers
- What to install and what/how to configure the components
- Requesting or Renewing Host certificates
- Guide on the procedure to request a new host certificate or renew an existing certificate
- Working with CREAM CE servers and VOs
- Information about adding plugins and files for VO authorization to CREAM-CE servers in Quattor.
- How to ban users with quattor
- Using quattor to ban a user on WMS, CE, DPM nodes.
- How to install security updates using quattor
- Update single or multiple rpms which are vulnerable on quattor managed hosts.
- How to limit ssh scans by blocking probing IPs using standard iptables modules on RL4 and EL5
- HowTo Disk images, RAID, LVM
- How to access data in disk images created with dd
- Adding a VO to a VOMS server
- How to add a new VO to a VOMS-is-evil server
- Monitoring Script
- It monitors the status of the different jobs, in order to identify those that aborted, but still on a queue.
- Managing RAID Controllers
- How to deal with failing disks and batteries; how do I find my disk?
- USB stick for Fujitsu Firmware update
- how to prepare a bootable USB stick for firmware updates for Fujitsu servers.
Batch Systems and Scheduling
- Adding/removing nodes to PBS
- Information on how to add or remove nodes from PBS.
- Information on how the accounting chain works from PBS to the local and APEL accounting portals. (historic information is in Accounting).
- Documentation how VIRGO daily accounting has been implemented.
- Creating the voview/grisview graphs on www.nikhef.nl/grid/stats, and the cron jobs that run on naab.
- Enabling multicore jobs and jobs requesting large amounts of memory
- Overview of the steps to enable flexible resource requests
- Maui reservations
- Creating a system reservation in Maui
- Various local tools
- Overview of local tools: when_idle (perform an action when a node has drained), mom-taskset (pin a job to the slots/cores allocated), prune_userprocs (remove user processes not related to a batch job).
- SoftDrive backend
- behind the curtains of SoftDrive.
- CVMFS servers and repositories
- Overview of the cvmfs repositories hosted at the Stratum-0 and Stratum-1 servers at Nikhef
- Adding a new cvmfs repository
- How to build a new cvmfs repository at the Stratum-0 server, update its contents, synchronize the contents to the Stratum-1 server and configure the clients to use the repository.
- handling NL-T1 alarms at Nikhef
- Nagios Monitoring Setup
- current setup of Nagios monitoring
- NDPF SubVersion Repository set up
- modifying the ipmi.nikhef.nl domain
- NDPF rsync backup
- backup of service nodes using rsync and indirect ADSM usage
- NDPF MySQL configurations
- Configuration and monitoring of the MySQL service on bedstee
- Building the XFS kernel modules for CentOS (if CentOS plus is late)
- generating the configuration for network graphs
- NDPF Dell switch config
- setup and configuration of the dell switches (disabling port-fast) - required for every new installation
- Nortel 4548GT switches
- Luilak/Bulldozer subcluster switch documentation
- Arista 7148S switches
- Core switch documentation
- Remote usage of the Dell console switches
- How to remotely connect to the Dell console switches.
- Serial Consoles
- how to setup your serial (over LAN) console and IMPI 2.0 SoL stuff, even with Xen, on a PE1950
- how to set the LCD panel text at runtime on a Dell PowerEdge using IPMI
- use and configuration of the chassis and other controls of the Dell (PE1950) systems
- documentation for the Equinox ELS-16 TS
- NDPF GS environment
- Grid Service (specialties) environment documentation
- RAID-1 configuration and management
- how to set up RAID-1 and how to manage RAID devices
- some tidbits on the X4500 Thumpers
- Increasing Thumper filesystems for DPM
- recipe for increasing the logical volume, file system and making it visible to DPM
- HP BL460c G6
- How to setup and configure and manage the "Sint Maarten" cluster
- How to setup and configure and manage the "Hooimaanden"
- Configuration of DPM disk servers exporting iSCSI-mounted file systems
- How to setup and configure and manage the "Suiker Bieten"
- Hooikanon storage details
- Details of the NetApp E2860
- Hooikanon server set up (ppc64le)
- Set up details for the Power9 hooikanon systems
- Information about the backup UMTS connection
- Remote Location
- Information about the remote location
Procedures to be followed when recovering from major outages. No extensive descriptions, just a list of what to do, without having to think/know much about the topic.
- rebooting XCP VMs the hard way
- How to handle a virtual machine that doesn't want to go anymore.
- cvmfs errors/warnings
- description of the errors and warnings that cvmfs reports via its Nagios check
- Restoring Services
- order in which to bring services back on-line
- Resurrecting kuiken
- steps to restart and check the VOMS server kuiken
How to figure out various things, diagnose problems, fix them, etc.
- Tracing Jobs
- Given a job ID, how to figure out where the job came from.
- CREAM CE Troubleshooting
- Error messages and how to solve them for CREAM.
- Generating crash reports with abrt
- How to generate core dumps from crashing services
- The current LCG software can put severe load on a pbs server due to the times it calls qstat (we have seen as many as 25 qstat calls per second). This load can eventually bring the entire system to an halt. In order to reduce the impact of this problem (a full solution requires rewriting part of the grid job manager), Davide wrote a set of utilities that wrap the original pbs commands and provide caching
- MoniFarm Utilities
- Graphing package for farm utilisation (used to make the production graphs for the NDPF Facilities)
- The proxy renewal wrapper script, enhanced with status info
- GSP Virtualisation with Xen
- Grid Server Park virtualisation with Xen Cloud Platform
- Xen on CentOS 5
- Installation and configuration of XEN Dom0 CentOS 5 x86_64, DomU CentOS 4 i386
- Managing the security training sites
- Quick writeup of managing the security training hosts with XCP, cobbler and salt.
- NDPF VMware authentication
- controlling and creating VMs on the central VMware server
- NDPF VMware tips
- tips and tricks for generating vmware images
- Xen on CentOS 5 - Notes
- Notes for installing and configuring a XEN System on CentOS 5
- Xen on CentOS - Automating Installation-Administration
- Quattor managed Xen-Dom0 and DomUs
- Xen 3.2, CentOS 5.1 and NAT HOWTO
- HOWTO document to set up Xen with NAT networking
Performance data and analysis
- NDPF Node Performance
- performance figures for nodes used for accounting purposes
- NDPF Disk Performance
- performance figures for disk servers
- how to run the SPEC2006 and SPEC2000 suites at Nikhef, including using the Intel Compiler Suite and SmartHeap
- Running HS06
- notes from running the HEPSPEC-06 benchmark
- Historical WN information
- which nodes had what names during which time periods, etc.
User Security and eTokens
- Using an Aladdin eToken PRO to store grid certificates
- Using an Aladdin eToken PRO with grid certificates (including gridproxy generation)
- Using voms-proxy-init on an OSX (10.4 or higher) system
- Tiny how to on how to get voms-proxy-init working on your Mac.
User Tips and Tricks
- PXE UEFI booting and installing
- what to do when your systems are so hip they don't do legacy BIOS boot anymore
- Passing job requirements through the WMS
- If your jobs need more cores or more memory, and you still want to use the WMS
- How to control access rights for LFC/SRM files
- How to control access rights to your files stored on the grid in LFC, SRM or both
- Checksumming support in SRM implementations
- An overview of the support for checksumming in different SRM implementations
- VO-specific software and modules
- How to install VO-specific software including automatic modules support
- job submission from the desktop and login systems to the NDPF clusters
- administering the qsub tunnel and adding new users
Software development tips
- How to handle OpenSSL and not get hurt
- Intended for developer that use the OpenSSL library. Organized as a (collective) braindump.
- Building gLExec from src rpm
- How to (re)build gLExec from a .src.rpm.
- Software development on Arista hardware
- Useful tips, how-tos and more.
- Funny Curly things
- An investigation into the different versions of the cURL tool.
- How to work with VOMS
- A detailed overview of how VOMS must work and how you need to work with VOMS.
- Grid Opendag preparation
- What to do, checklists, ideas and other pieces of information suited to prepare yourself for the Opendag (Open House) at Nikhef.
- Grid contributions Open Dag 2013
- Plans/Programm what to show during the Open Dag 2013, Oct. 5.
- Updating a kernel module for XenServer 7
- Getting Intel X722 10Gbps SFP+ cards to work with XenServer 7.0.
- A workaround for a powerpc saltstack bug
- Getting around a host architecture processing bug in Saltstack.