Re: 60+% ping packet loss on Pi3 under -current and stable-13

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 01 May 2022 19:15:27 UTC
On 2022-May-1, at 11:12, bob prohaska <fbsd@www.zefox.net> wrote:

> On Sat, Apr 30, 2022 at 06:39:57PM -0700, Bakul Shah wrote:
>> On Apr 29, 2022, at 7:12 PM, bob prohaska <fbsd@www.zefox.net> wrote:
>>> 
>>> Since about December of 2021 I've been noticing problems with
>>> wired network connectivity on a pair of raspberry pi 3 machines
>>> using wired network connections. One runs stable-13.1, the other
>>> runs -current, both are up to date as of a few days ago.
>>> 
>>> Essentially both machines fail to respond to inbound network
>>> connections via ssh or ping after reboot. If I get on the 
>>> serial console and start an outbound ping to anywhere, both
>>> machines respond to incoming pings with about a 65% packet
>>> loss. 
> 
>> Suggest running tcpdump on the rpi3 to see what is going on
>> when connected to the public vs private net. 
>> 
> 
> Public net first, since that's where the machine is now. Gateway.zefox.net
> is the name of my router's public interface, dcn.org belongs to my isp and
> fusionbroadband is their service provider..
> 
> While on the -current Pi3 serial console (with no outbound ping running) 
> and no inbound traffic from my hosts I see after a couple minutes:
> 
> root@www:/mnt # tcpdump
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on ue0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 10:39:40.887853 ARP, Request who-has www.zefox.org tell gateway.zefox.net, length 46
> 10:39:40.887929 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui Unknown), length 28
> 10:39:40.893220 ARP, Request who-has 50-1-20-1.dsl.static.fusionbroadband.com tell www.zefox.org, length 28
> 10:39:40.915469 ARP, Reply 50-1-20-1.dsl.static.fusionbroadband.com is-at 00:1b:90:d2:4a:c4 (oui Unknown), length 50
> 10:39:40.915529 IP www.zefox.org.50714 > spoke.dcn.davis.ca.us.domain: 51409+ PTR? 28.20.1.50.in-addr.arpa. (41)
> 10:39:40.943602 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.50714: 51409 1/3/6 PTR www.zefox.org. (265)
> 10:39:40.945416 IP www.zefox.org.15986 > spoke.dcn.davis.ca.us.domain: 44966+ PTR? 31.20.1.50.in-addr.arpa. (41)
> 10:39:40.973487 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.15986: 44966 1/3/6 PTR gateway.zefox.net. (266)
> 10:39:40.975037 IP www.zefox.org.57611 > spoke.dcn.davis.ca.us.domain: 31749+ PTR? 1.20.1.50.in-addr.arpa. (40)
> 10:39:46.288219 IP www.zefox.org.49710 > wheel.dcn.davis.ca.us.domain: 31749+ PTR? 1.20.1.50.in-addr.arpa. (40)
> 10:39:46.316239 IP wheel.dcn.davis.ca.us.domain > www.zefox.org.49710: 31749 1/3/6 PTR 50-1-20-1.dsl.static.fusionbroadband.com. (291)
> 10:39:46.318267 IP www.zefox.org.17061 > spoke.dcn.davis.ca.us.domain: 37579+ PTR? 2.253.150.168.in-addr.arpa. (44)
> 10:39:46.346851 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.17061: 37579* 1/2/2 PTR spoke.dcn.davis.ca.us. (145)
> 10:39:46.348674 IP www.zefox.org.40440 > spoke.dcn.davis.ca.us.domain: 20572+ PTR? 1.253.150.168.in-addr.arpa. (44)
> 10:39:51.420705 IP www.zefox.org.64019 > wheel.dcn.davis.ca.us.domain: 20572+ PTR? 1.253.150.168.in-addr.arpa. (44)
> 10:39:51.448850 IP wheel.dcn.davis.ca.us.domain > www.zefox.org.64019: 20572* 1/2/2 PTR wheel.dcn.davis.ca.us. (145)
> 10:40:40.147603 ARP, Request who-has 50-1-20-1.dsl.static.fusionbroadband.com tell ns1.zefox.net, length 46
> 10:40:40.148844 IP www.zefox.org.46127 > spoke.dcn.davis.ca.us.domain: 12186+ PTR? 29.20.1.50.in-addr.arpa. (41)
> 10:40:40.176486 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.46127: 12186 1/3/6 PTR ns1.zefox.net. (262)
> 10:40:57.688225 ARP, Request who-has www.zefox.org tell gateway.zefox.net, length 46
> 10:40:57.688305 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui Unknown), length 28
> 10:42:14.488727 ARP, Request who-has www.zefox.org tell gateway.zefox.net, length 46
> 10:42:14.488804 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui Unknown), length 28
> 10:42:43.761226 ARP, Request who-has 50-1-20-1.dsl.static.fusionbroadband.com tell www.zefox.com, length 46
> 10:42:43.762522 IP www.zefox.org.56181 > spoke.dcn.davis.ca.us.domain: 28779+ PTR? 26.20.1.50.in-addr.arpa. (41)
> 10:42:43.790361 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.56181: 28779 1/3/6 PTR www.zefox.com. (265)
> 10:43:31.289103 ARP, Request who-has www.zefox.org tell gateway.zefox.net, length 46
> 10:43:31.289181 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui Unknown), length 28
> 
> If I now start an inbound ping from one of my hosts it gets no reply and 
> tcpdump reports no additional traffic. With an outbound ping running there's
> at least a sparse reply.
> 
> ^C
> 28 packets captured
> 28 packets received by filter
> 0 packets dropped by kernel
> root@www:/mnt # 
> 
> The "oui unknown" looks like some sort of failure.....
> Can you ping www.zefox.org? I have no outside vantage point.
> There is still no outbound ping running and I would expect
> you'll get no or very sparse reply. 
> 
> 
> Thus far only the two Pi3s suffer from connectivity problems; Pi2s and a Pi4 have
> no difficulty on the same address block. Is there a switch for tcpdump  that will
> limit records to relevant traffic? Otherwise it's a flood.
> 
> These results were obtained after standing idle overnight and
> are rather different (in ways I don't understand) from behavior
> immediately after reboot, I'll have to repeat as I learn more.

I wonder if there is a notable difference between
monitoring traffic from 2 places:

A) from the machine seeing the problem
vs.
B) from a machine not having problems but
   connected were all the traffic would be
   on the wire it is connected to.

It may be that monitoring from both and
comparing/contrasting the reported traffic
from the two provides additional evidence.

There may be modes of monitoring that are
relevant for this. But I'm not familiar
with any detail here.


For reference:

# ping www.zefox.org
PING www.zefox.org (50.1.20.28): 56 data bytes
^C
--- www.zefox.org ping statistics ---
32 packets transmitted, 0 packets received, 100.0% packet loss

I found the command traceroute and it reports:

# traceroute www.zefox.org
traceroute to www.zefox.org (50.1.20.28), 64 hops max, 40 byte packets
 1  192.168.1.1 (192.168.1.1)  0.697 ms  0.486 ms  1.277 ms
 2  172.30.26.66 (172.30.26.66)  30.019 ms
    172.30.26.67 (172.30.26.67)  41.720 ms
    172.30.26.66 (172.30.26.66)  28.645 ms
 3  68.85.243.125 (68.85.243.125)  8.967 ms
    68.85.243.77 (68.85.243.77)  11.462 ms
    68.85.243.125 (68.85.243.125)  10.254 ms
 4  24.124.129.106 (24.124.129.106)  7.510 ms
    96.216.60.165 (96.216.60.165)  10.176 ms
    24.124.129.106 (24.124.129.106)  8.945 ms
 5  68.85.243.197 (68.85.243.197)  10.837 ms
    96.216.60.165 (96.216.60.165)  10.252 ms
    68.85.243.197 (68.85.243.197)  16.036 ms
 6  68.85.243.197 (68.85.243.197)  14.660 ms
    be-36211-cs01.seattle.wa.ibone.comcast.net (68.86.93.49)  14.629 ms
    68.85.243.197 (68.85.243.197)  8.849 ms
 7  be-2412-pe12.seattle.wa.ibone.comcast.net (96.110.34.142)  14.607 ms
    be-36221-cs02.seattle.wa.ibone.comcast.net (68.86.93.53)  14.122 ms
    be-2212-pe12.seattle.wa.ibone.comcast.net (96.110.34.134)  13.877 ms
 8  be-2412-pe12.seattle.wa.ibone.comcast.net (96.110.34.142)  14.133 ms *  13.663 ms
 9  be2075.ccr21.sfo01.atlas.cogentco.com (154.54.0.233)  30.176 ms *
    be3717.ccr22.sfo01.atlas.cogentco.com (154.54.86.209)  29.002 ms
10  be3717.ccr22.sfo01.atlas.cogentco.com (154.54.86.209)  28.477 ms
    be2430.ccr31.sjc04.atlas.cogentco.com (154.54.88.186)  27.203 ms
    be2075.ccr21.sfo01.atlas.cogentco.com (154.54.0.233)  28.515 ms
11  38.104.141.82 (38.104.141.82)  29.820 ms
    be2430.ccr31.sjc04.atlas.cogentco.com (154.54.88.186)  28.605 ms
    38.104.141.82 (38.104.141.82)  33.735 ms
12  38.104.141.82 (38.104.141.82)  27.160 ms
    0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net (135.180.179.146)  32.336 ms
    38.104.141.82 (38.104.141.82)  31.867 ms
13  0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  31.761 ms
    0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net (135.180.179.146)  29.864 ms
    0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  31.711 ms
14  0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  30.373 ms
    gig1-1-1.gw.wscrca11.sonic.net (50.1.36.106)  35.567 ms
    0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  31.146 ms
15  gig1-1-1.gw.davsca11.sonic.net (50.1.36.110)  31.513 ms
    gig1-1-1.gw.wscrca11.sonic.net (50.1.36.106)  31.203 ms
    gig1-1-1.gw.davsca11.sonic.net (50.1.36.110)  31.354 ms
16  gig1-1-1.gw.davsca11.sonic.net (50.1.36.110)  30.125 ms *  31.996 ms
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *
^C

(There did not seem to be much point in having it continue.)

===
Mark Millard
marklmi at yahoo.com