PXE UEFI booting and installing

From PDP/Grid Wiki
Revision as of 20:14, 22 August 2019 by Davidg@nikhef.nl (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

We recently got a batch of DELL PE640s, and for fast storage opted for a superfast NVMe card of 3TB. That was almost the most expensive part of the whole order. Because the BIOS could not handle this card (or the other way around, the card would not do BIOS, only UEFI) we were forced to do UEFI boot, a first for the NDPF.

We install all our machines through PXE booting, and the configuration is managed by our trusty (legacy) quattor installation. This is based on pxelinux.cfg. For UEFI boot, we need a slightly different approach because the boot loader is a grub.

The boot loader image is called bootx64.efi (from any CentOS mirror, the path is /centos/6/os/x86_64/EFI/BOOT/). This file needs to be placed in the pxelinux.cfg directory along with the usual files for the machines to install. But note that this is different from pxelinux.0, which resides in the directory above pxelinux.cfg/. This also means that subsequent kernel and initrd files will be retrieved from the pxelinux.cfg directory, so you will have to set up symlinks accordingly.

Our setup basically looks like this:

/osinstall/nbp (the root of the tftp server).
/osinstall/nbp/centos-6.9 (the installer lives here)
/osinstall/nbp/pxelinux.cfg/centos69_x86_64 -> ../centos-6.9/x86_64

That last file is a grub configuration file and contains (e.g.)

timeout 5
title EFI Boot
	root (nd)
	kernel /centos69_x86_64/vmlinuz ramdisk=32768 ks=http://stal.nikhef.nl/ks/machine.ks ksdevice=eth4 biosdevname=0
	initrd /centos69_x86_64/initrd.img

Mind the extra line for the root. The "nd" term could mean "network device" and indeed the grub boot loader will download the kernel and initrd via tftp.

The tricky part was convincing dhcpd to do the right thing. There are numerous sources on the internet that suggest all kinds of wild things, but in our case we only needed two key stanzas. Mind you that this is with Mellanox 25Gbit card and these may behave differently than other cards. The PXE spec is a mess and YMMV.

The first statement we really needed was the code 93, specified by RFC 4578; this option is set by the client and helps the server distinguish clients that want to do classic pxe boot and EFI boot. Value 7 means EFI byte code.

The second important statement was tftp-server-name; this behaves like next-server but apparently it is client-dependent which one is necessary.

Finally we used to have a vendor-encapsulation option, possibly for legacy reasons; but the wireshark analysis revealed that this could be interpreted as an invitation for the client to use mtftp (multicast tftp) which we certainly don't have.

Before we used the below configuration, wireshark analysis revealed that the client was trying to connect the dhcp server on port 4011, which is the pxe port for DHCP proxy servers. We don't have a proxy server however; this was a bit confusing but I guess that the mellanox card came up with this idea on its own.

option arch code 93 = unsigned integer 16; # RFC4578

if option arch = 00:07 {
    option tftp-server-name "stal.nikhef.nl";
    filename "pxelinux.cfg/bootx64.efi";
} else {
    option vendor-class-identifier "PXEClient";
    option vendor-encapsulated-options 01:04:00:00:00:00:ff;
    filename "pxelinux/pxelinux.0";