fxp0 interface going up/down/up/down (dhclient related?)

Sun Jun 9 13:08:50 UTC 2013

On Sun, Jun 09, 2013 at 02:48:29PM +0200, Alban Hertroys wrote:
> On Jun 9, 2013, at 12:44, Jeremy Chadwick <jdc at koitsu.org> wrote:
> 
> > On Sun, Jun 09, 2013 at 12:21:37PM +0200, Alban Hertroys wrote:
> >> I'm having an issue where my fxp0 interface keeps looping between DOWN/UP, with dhclient requesting a lease each time in between. I think it's caused by dhclient:
> >> 
> >> solfertje # dhclient -d fxp0
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> send_packet: Network is down
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> fxp0 link state down -> up
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> fxp0 link state down -> up
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> fxp0 link state down -> up
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> fxp0 link state down -> up
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> ^C
> >> 
> >> In above test I turned off devd (/etc/rc.d/devd stop) and background dhclient (/etc/rc.d/dhclient stop fxp0), and I still go the above result. There's practically no time spent between up/down cycles, this just keeps going on and on.
> >> fxp0 is the only interface that runs on DHCP. The others have static IP's.
> >> 
> >> Initially I thought the issue might be caused by devd, because I have both ethernet and 822.11 type NICs (2x ethernet, 1x wifi) in that system.
> >> 
> >> This is 9-STABLE from yesterday.
> >> 
> >> Before, I had 9-RELEASE running on this system with the same config, and that worked well.
> > 
> > And so what I predicted begins...
> > 
> > The issue is described in the 8.4-RELEASE Errata Notes; the driver is
> > using the same driver version as in stable/9, hence you're experiencing
> > the same problem.  See Open Issues:
> > 
> > http://www.freebsd.org/releases/8.4R/errata.html
> > 
> > No fix for this has been committed.  It is still under discussions by
> > multiple kernel folks as to where the fix should be applied (dhclient or
> > the fxp(4) driver), because the changes made to dhclient (that tickle
> > this bug) may actually affect more drivers than just fxp(4).
> > 
> > You can start by reading the (extremely long but very informative)
> > thread here.  I do urge you to read all the posts, not skim them:
> > 
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/thread.html#73440
> 
> Goodness, and here I was hoping it was just a silly mistake I made…
> 
> IIUC, the issue is a combination of:
> - dhclient now being aware of link state changes and
> - the fxp driver reinitializes for certain mode changes, such as assigning an IP address
> 
> Which causes dhclient to think that the link state changed, fetch a "new" IP address and assigns it to the fxp adapter again, causing the same link state change over and over again.
> 
> Is that about correct?

Someone else can answer this.

> > The only known workarounds at this time are:
> > 
> > a) Cease use of DHCP; set a static IP in rc.conf,
> > 
> > b) Try some of the patches mentioned within the above thread,
> > specifically this one:
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073581.html
> 
> Or c) Use DHCP with a static media setting:
> ifconfig_fxp0="DHCP media 100baseTX mediaopt full-duplex"

DO NOT DO THIS.  People who do this do not understand what this does.
This has bad effects on IEEE 802.3 and will not do/behave like you might
think.  The short version:

The ONLY TIME you should be hard-setting speed and duplex in ifconfig is
when you have a managed switch on the other end where you can set the
speed/duplex for that port as well.  Otherwise, if you have autoneg on
one side, and forced speed/duplex on the other, there is ABSOLUTELY NO
GUARANTEE it will work -- the behaviour at that point is "generally"
undefined (and chaotic), and in my experience what happens is the switch
ends up picking 100/half while the FreeBSD box thinks 100/full and you
end up with an insane collision rate + hilariously slow network speeds
(but usually only in one direction).  The behaviour varies per brand
(and revision) of switch, firmware, and other things.

So bottom line: if you're going to use autoneg, use it consistently on
both ends; if you're going to force speed/duplex, do so consistently on
both ends.  (If you don't own a managed switch, then autoneg is your
only choice)

> That worked for two out of three people apparently.
> I'm not done reading this thread yet though and I noticed a patch by YongHyeon that I'll test first.

The fact it didn't work for 1 person is enough, and furthers my point
(re: the behaviour varies).

The problem needs to get fixed properly by kernel folks, but as I said,
"where" it's to be fixed is being discussed/debated.  Kernel committers'
time is very very sparse/limited right now, which is why the last post
in that thread was from May 29th (a week and a half ago).

As you can see in the thread, I tried to tell Glen Barber that demanding
people just set a static IP in ifconfig / avoidance of DHCP was not
going to fly, and this follow-up thread is proof.  :-)

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |