Re: 60+% ping packet loss on Pi3 under -current and stable-13

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Mon, 02 May 2022 15:53:34 UTC
On Mon, May 02, 2022 at 08:56:12AM +0200, Hans Petter Selasky wrote:
[reply at end]
> On 5/2/22 03:13, bob prohaska wrote:
> > On Sun, May 01, 2022 at 05:10:59PM -0700, Mark Millard wrote:
> > [reply at end]
> > > On 2022-May-1, at 16:27, bob prohaska <fbsd@www.zefox.net> wrote:
> > > 
> > > > On Sun, May 01, 2022 at 12:58:45PM -0700, Mark Millard wrote:
> > > > > 
> > > > > Looks like there is some problem getting past
> > > > > gig1-1-1.gw.davsca11.sonic.net .
> > > > > 
> > > > 
> > > > That seems independent of my own internal connection problems,
> > > > but worth taking up with my ISP on Monday. Meanwhile, can you
> > > > ping any other hosts in the 50.1.20.31-24 range? All are up
> > > > at the moment. Hosts 28 and 24 are the troublemakers.
> > > > 
> > > > If anybody cares there's an ascii-art network diagram at
> > > > http://www.zefox.net/~fbsd/netmap
> > > > 
> > > > Not sure it'll survive the mailing list, but here goes:
> > > > dsl_modem-----switch---------router-----lan-------wifi-----pi4_workstation
> > > >                       |                  |             |
> > > >                       |                  |             |---Mac workstation
> > > >                       |                  |
> > > >                       |                  |------printer
> > > >     ------------------|
> > > >     |
> > > >     |------50.1.20.30 ns1.zefox.net Pi2 12.3 usb-serial----50.1.20.27
> > > >     |------50.1.20.29 ns2.zefox.net Pi2 12.3 usb-serial----50.1.20.30
> > > >     |------50.1.20.27 www.zefox.net Pi2 12.3 usb-serial----50.1.20.26
> > > >     |------50.1.20.26 www.zefox.com Pi2 -current usb-serial---50.1.20.24
> > > >     |------50.1.20.24 pelorus.zefox.org Pi3 13.1 usb-serial---50.1.20.28
> > > > switch
> > > >     |------50.1.20.25 nemesis.zefox.com Pi4 -current usb-serial---50.1.20.29
> > > >     |------50.1.20.28 www.zefox.org Pi3 -current usb-serial----50.1.20.25
> > > 
> > > 
> > > For ns1.zefox.net there is no problem and
> > > it looks like:
> > > 
> > >                                       My traceroute  [v0.95]
> > > amd64_ZFS (192.168.1.120) -> ns1.zefox.net (50.1.20.29)                2022-05-01T16:52:27-0700
> > > Keys:  Help   Display mode   Restart statistics   Order of fields   quit
> > >                                                         Packets               Pings
> > >   Host                                                Loss%   Snt   Last   Avg  Best  Wrst StDev
> > >   1. 192.168.1.1                                       0.0%    53    1.2   0.8   0.1   1.4   0.4
> > >   2. 172.30.26.67                                      0.0%    53   11.8  25.0  11.8  61.0  11.4
> > >   3. 68.85.243.125                                     0.0%    53   10.0  10.0   7.7  46.9   5.3
> > >   4. 96.216.60.165                                     0.0%    53    8.8   9.3   7.8  12.1   0.9
> > >   5. 68.85.243.197                                     0.0%    53    8.6  13.2   8.6  28.3   4.2
> > >   6. be-36231-cs03.seattle.wa.ibone.comcast.net        0.0%    53   15.3  14.8  13.0  16.9   1.0
> > >   7. be-2312-pe12.seattle.wa.ibone.comcast.net         0.0%    53   16.2  15.9  12.9  59.8   6.5
> > >   8. (waiting for reply)
> > >   9. be3717.ccr22.sfo01.atlas.cogentco.com             0.0%    53   29.8  30.9  26.5  97.9  10.1
> > > 10. be2430.ccr31.sjc04.atlas.cogentco.com             0.0%    53   29.0  29.0  26.6  39.3   1.8
> > > 11. 38.104.141.82                                     0.0%    53   28.9  33.8  26.1 115.0  17.0
> > > 12. 0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net            0.0%    53   32.1  31.3  29.2  33.9   1.0
> > > 13. 0.xe-0-0-0.cr1.scrmca13.sonic.net                 0.0%    53   30.5  32.1  29.2  57.6   4.3
> > > 14. gig1-1-1.gw.wscrca11.sonic.net                    0.0%    53   31.8  32.0  28.8  43.7   2.0
> > > 15. gig1-1-1.gw.davsca11.sonic.net                    0.0%    52   31.0  32.4  30.2  38.4   1.4
> > > 16. ns1.zefox.net                                     0.0%    52   51.4  51.1  49.8  53.4   0.8
> > > 
> > > ns2.zefox.net and others got a 17. instead of
> > > a 16. An example is:
> > > 
> > >                                       My traceroute  [v0.95]
> > > amd64_ZFS (192.168.1.120) -> ns2.zefox.net (50.1.20.30)                2022-05-01T16:58:45-0700
> > > Keys:  Help   Display mode   Restart statistics   Order of fields   quit
> > >                                                         Packets               Pings
> > >   Host                                                Loss%   Snt   Last   Avg  Best  Wrst StDev
> > >   1. 192.168.1.1                                       0.0%    55    0.3   0.9   0.1   1.4   0.4
> > >   2. 172.30.26.66                                      0.0%    55   13.5  26.4  10.4  54.7  10.1
> > >   3. 68.85.243.77                                      0.0%    55   10.5   9.1   7.9  10.5   0.6
> > >   4. 24.124.129.106                                    0.0%    54    8.3   9.5   8.2  13.4   1.0
> > >   5. 96.216.60.165                                     0.0%    54    8.8   9.8   7.8  22.8   2.2
> > >   6. 68.85.243.197                                     0.0%    54   17.1  15.1   9.0  37.3   5.9
> > >   7. be-36241-cs04.seattle.wa.ibone.comcast.net        0.0%    54   15.2  15.0  13.2  17.8   0.9
> > >   8. be-2412-pe12.seattle.wa.ibone.comcast.net         0.0%    54   15.0  14.8  13.2  17.1   1.0
> > >   9. (waiting for reply)
> > > 10. be2075.ccr21.sfo01.atlas.cogentco.com             0.0%    54   28.4  29.2  26.9  36.8   1.4
> > > 11. be2379.ccr31.sjc04.atlas.cogentco.com             0.0%    54   29.8  30.0  27.3  84.2   7.6
> > > 12. 38.104.141.82                                     0.0%    54   28.6  33.7  27.5 105.5  16.2
> > > 13. 0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net            0.0%    54   31.6  31.4  29.5  33.8   0.9
> > > 14. 0.xe-0-0-0.cr1.scrmca13.sonic.net                 0.0%    54   31.1  32.1  29.1  52.9   3.4
> > > 15. gig1-1-1.gw.wscrca11.sonic.net                    0.0%    54   31.2  31.9  30.0  34.1   0.9
> > > 16. gig1-1-1.gw.davsca11.sonic.net                    0.0%    54   33.3  32.6  30.8  45.8   2.1
> > > 17. ns2.zefox.net                                     0.0%    54   52.5  51.4  49.1  54.9   1.2
> > > 
> > > The routing need not be the same from one
> > > try to the next.
> > > 
> > > www.zefox.net     is similar.
> > > www.zefox.com     is similar.
> > > pelorus.zefox.org is similar.
> > > nemesis.zefox.com is similar.
> > > www.zefox.org     is similar.
> > > 
> > > Notably www.zefox.org was what I tried and
> > > reported on before that had the failures.
> > > 
> > > I observed a initial connection sequence once
> > > for pelorus.zefox.org where it briefly displayed
> > > something like (not captured, just from memory):
> > > 
> > > 16. gig1-1-1.gw.davsca11.sonic.net
> > > 17. (waiting for reply)
> > > 18. (waiting for reply)
> > > 19. pelorus.zefox.org
> > > 
> > > before changing to
> > > 
> > > 16. gig1-1-1.gw.davsca11.sonic.net
> > > 17. ns2.zefox.net
> > > 
> > > That may be normal but usually timed such that I
> > > would not usually see it.
> > > 
> > > But it might actually be evidence of a stage that
> > > the leads to the overall failure by never getting
> > > past the:
> > > 
> > > 16. gig1-1-1.gw.davsca11.sonic.net
> > > 17. (waiting for reply)
> > > 18. (waiting for reply)
> > > 19. WHATEVER
> > > 
> > > in some cases.
> > > 
> > > However, in the above the below worked fine:
> > > 
> > > 50.1.20.24 pelorus.zefox.org Pi3 13.1 usb-serial---50.1.20.28
> > > 50.1.20.28 www.zefox.org Pi3 -current usb-serial----50.1.20.25
> > > 
> > > What changed?
> > 
> > I restarted an outgoing ping so I could access those hosts via ssh,
> > to bring up a serial console connection to the next host in the "ring".
> > Usually I simply ping 50.1.20.31 (my router) but at least in the past
> > it did not matter what the destination was. In one case I tried an
> > unused address. That makes the role of a distant host somewhat
> > baffling.
> > 
> > Thanks for checking!
> > 
> > bob prohaska
> 
> Hi,
> 
> Did you try to force the link mode to 100MBit/s ?
> 

Not explcitly, but ifconfig -a reports
ue0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80009<RXCSUM,VLAN_MTU,LINKSTATE>
        ether b8:27:eb:71:46:4e
        inet 50.1.20.28 netmask 0xffffff00 broadcast 50.1.20.255
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
so I think it's 100MBit/s anyway.

One new oddity is seeing in the daily security report the lines
www.zefox.org kernel log messages:
+ue0: promiscuous mode enabled
+ue0: promiscuous mode disabled
+ue0: promiscuous mode enabled
+ue0: promiscuous mode disabled
+ue0: promiscuous mode enabled
+ue0: promiscuous mode disabled

I'm using static addresses set in /etc/rc.conf. The DHCP line is commented
out but not expicitly disabled. Could something else be trying to turn DHCP
on, which I gather would also place the interface in promiscuous mode? 

Uname -a reports:

FreeBSD www.zefox.org 14.0-CURRENT FreeBSD 14.0-CURRENT #55 main-n255108-9fb40baf604: Fri Apr 29 20:42:26 PDT 2022     bob@www.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64


Thanks for writing!

bob prohaska