[Bug 279951] dhclient unable to reuse recorded lease after timeout, since 12.1

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 25 Jun 2024 15:55:01 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279951

--- Comment #1 from Viktor Ć tujber <viktor.stujber+freebsd-bugs_v4CCPfay@gmail.com> ---
I have been made aware of a detail that changes the conclusion of the previous
findings. The source code change itself is reasonable; what it in fact did is
uncover an issue in dhclient-script.

You see, currently, all other invokable commands in dhclient-script have no
individual exit points (so no error checking and/or no failure states) and just
fall through to a catch-all 'exit_with_hooks 0'. TIMEOUT does do checking, and
as per its documentation, should exit with 0 if it thinks the saved lease is
valid. Considering this, the 2019 change of the meaning of script_go()'s return
value still makes 'zero' the expected value for success, and the nonzero wait()
process failure code continues to be returned as a nonzero bitmasked value. So
there's no issue there.

In addition, I have misunderstood the layout of dhclient-script's TIMEOUT
section. The last part that exits 1 is not the default expected path, it is a
cleanup step, and the ass-backwards if-statement structure is to avoid
copy-pasting, 'goto fail;' or dummy scope constructs. It makes it difficult to
see the positive execution path, especially since it includes a negated AND
condition, split into two if statements.

Ultimately, what the script is doing is:
0. TIMEOUT) - so the DHCP server is not available, but we have saved leases
1. add_new_address()
2. lease must have 'routers'
3. first router on the list must be pingable
4. add_new_alias()
5. add_new_routes()
6. add_new_resolv_conf()

Up until 2019-02, step 3 would fail and the script would exit, but dhclient
proceeded as if it were successful because it wasn't checking the script's exit
code. This also meant that steps 4 5 and 6 weren't performed, but in my case
there were no visible consequences. Now, the requirement for the gateway to be
pingable is enforced, and my host is no longer able to reuse the saved lease
during an outage scenario.

Unfortunately at this point I conclude that everything is now 'working as
programmed', and any further discussion moves from "it's not supposed to do X!"
to "should it be doing X?" or "can it be improved", which is more about design
than bugfixing.

I haven't thought much about how to improve this. My guess is that the ping
test acts as some sort of primitive NLA / sanity check for when the host moves
around between networks or when the network suddenly changes ip ranges, all
while the DHCP server is absent. This makes it a very strange scenario. I think
the power going out and the server booting up faster than the router that's
also the DHCP server, is more likely. Anyway, dhclient-script and that ping
test was implemented by openbsd's krw@ who also maintains dhclient and who
eventually deleted the whole thing. Their current code only checks addressinuse
and nothing else, and they're supposedly transitioning to their dhcpleased.

-- 
You are receiving this mail because:
You are the assignee for the bug.