Re: Kernel DHCP unpredictable/fails (PXE boot), userspace DHCP works just fine

From: Attila Nagy <nagy.attila_at_gmail.com>
Date: Fri, 17 Mar 2023 20:05:55 UTC
Rick Macklem <rick.macklem@gmail.com> ezt írta (időpont: 2023. márc. 16.,
Cs, 23:01):

> On Thu, Mar 16, 2023 at 1:44 PM Attila Nagy <nagy.attila@gmail.com> wrote:
> >
> > The problem is that the newer machines take an indefinite time to boot.
> The older ones (with igb NIC) work reliably, they always boot fast.
> Haven't you at least partially answered the question yourself here?
> In other words, it sounds like there is an issue with the NIC driver
> for the newer chip. (If you can replace the NIC with one with
> a different chip, I'd try that.)
>
Yes, this driver is quite bad and has a lot of flaws, but after the OS
boots, it works fine otherwise.
I can't change the NIC. :(


>
> A possible workaround would be to switch to using "options NFS_ROOT"
> instead of
> "BOOTP_NFSROOT".  This way of doing diskless NFS depends on pexboot
> loading the FreeBSD boot loader and then it sets enough environment
> variables so that a kernel built with "options NFS_ROOT" and no
> "options BOOTP_NFSROOT"
> will boot.
>
> Oh, I long forgot this option, thanks for bringing it up!
Yes, it skips that code and the DHCP query along with it and works
wonderfully, the machine boots fast.
For me this confirms that the problem lies in the bootp_subr.c DHCP code
(at least something works bad with this NIC, I guess it might be a timing
issue).

I had to dig out of my memory why we don't use that, but the first boot
helped me to get those memories back:
bootp_subr.c gets option 134 from the DHCP response and loads it into
kern.bootp_cookie, which is then used by /etc/rc.initdiskless to set up the
class, which we depend on.
Well, it's possible to work that around ("encoding" the class in the NFS
root path), but that's not the same (we have different initdiskless
"classes" with the same NFS root paths).

I'm not sure if pxeboot could get that information from the PXE stack, but
I guess even if it has access to the DHCP reply, nobody is interested in
modifying it to actually pass option 134 through to kern.bootp_cookie if it
wasn't implemented in the last many years. :)