Re: 60+% ping packet loss on Pi3 under -current and stable-13
- In reply to: Mark Millard : "Re: 60+% ping packet loss on Pi3 under -current and stable-13"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 02 May 2022 15:25:39 UTC
On 2022-05-01 12:58, Mark Millard wrote: > On 2022-May-1, at 12:15, Mark Millard <marklmi@yahoo.com> wrote: > > >> On 2022-May-1, at 11:12, bob prohaska <fbsd@www.zefox.net> wrote: >> >>> On Sat, Apr 30, 2022 at 06:39:57PM -0700, Bakul Shah wrote: >>>> On Apr 29, 2022, at 7:12 PM, bob prohaska <fbsd@www.zefox.net> wrote: >>>>> >>>>> Since about December of 2021 I've been noticing problems with >>>>> wired network connectivity on a pair of raspberry pi 3 machines >>>>> using wired network connections. One runs stable-13.1, the other >>>>> runs -current, both are up to date as of a few days ago. >>>>> >>>>> Essentially both machines fail to respond to inbound network >>>>> connections via ssh or ping after reboot. If I get on the >>>>> serial console and start an outbound ping to anywhere, both >>>>> machines respond to incoming pings with about a 65% packet >>>>> loss. >>> >>>> Suggest running tcpdump on the rpi3 to see what is going on >>>> when connected to the public vs private net. >>>> >>> >>> Public net first, since that's where the machine is now. Gateway.zefox.net >>> is the name of my router's public interface, dcn.org belongs to my isp and >>> fusionbroadband is their service provider.. >>> >>> While on the -current Pi3 serial console (with no outbound ping running) >>> and no inbound traffic from my hosts I see after a couple minutes: >>> >>> root@www:/mnt # tcpdump >>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >>> listening on ue0, link-type EN10MB (Ethernet), capture size 262144 bytes >>> 10:39:40.887853 ARP, Request who-has www.zefox.org tell gateway.zefox.net, >>> length 46 >>> 10:39:40.887929 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui >>> Unknown), length 28 >>> 10:39:40.893220 ARP, Request who-has >>> 50-1-20-1.dsl.static.fusionbroadband.com tell www.zefox.org, length 28 >>> 10:39:40.915469 ARP, Reply 50-1-20-1.dsl.static.fusionbroadband.com is-at >>> 00:1b:90:d2:4a:c4 (oui Unknown), length 50 >>> 10:39:40.915529 IP www.zefox.org.50714 > spoke.dcn.davis.ca.us.domain: >>> 51409+ PTR? 28.20.1.50.in-addr.arpa. (41) >>> 10:39:40.943602 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.50714: >>> 51409 1/3/6 PTR www.zefox.org. (265) >>> 10:39:40.945416 IP www.zefox.org.15986 > spoke.dcn.davis.ca.us.domain: >>> 44966+ PTR? 31.20.1.50.in-addr.arpa. (41) >>> 10:39:40.973487 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.15986: >>> 44966 1/3/6 PTR gateway.zefox.net. (266) >>> 10:39:40.975037 IP www.zefox.org.57611 > spoke.dcn.davis.ca.us.domain: >>> 31749+ PTR? 1.20.1.50.in-addr.arpa. (40) >>> 10:39:46.288219 IP www.zefox.org.49710 > wheel.dcn.davis.ca.us.domain: >>> 31749+ PTR? 1.20.1.50.in-addr.arpa. (40) >>> 10:39:46.316239 IP wheel.dcn.davis.ca.us.domain > www.zefox.org.49710: >>> 31749 1/3/6 PTR 50-1-20-1.dsl.static.fusionbroadband.com. (291) >>> 10:39:46.318267 IP www.zefox.org.17061 > spoke.dcn.davis.ca.us.domain: >>> 37579+ PTR? 2.253.150.168.in-addr.arpa. (44) >>> 10:39:46.346851 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.17061: >>> 37579* 1/2/2 PTR spoke.dcn.davis.ca.us. (145) >>> 10:39:46.348674 IP www.zefox.org.40440 > spoke.dcn.davis.ca.us.domain: >>> 20572+ PTR? 1.253.150.168.in-addr.arpa. (44) >>> 10:39:51.420705 IP www.zefox.org.64019 > wheel.dcn.davis.ca.us.domain: >>> 20572+ PTR? 1.253.150.168.in-addr.arpa. (44) >>> 10:39:51.448850 IP wheel.dcn.davis.ca.us.domain > www.zefox.org.64019: >>> 20572* 1/2/2 PTR wheel.dcn.davis.ca.us. (145) >>> 10:40:40.147603 ARP, Request who-has >>> 50-1-20-1.dsl.static.fusionbroadband.com tell ns1.zefox.net, length 46 >>> 10:40:40.148844 IP www.zefox.org.46127 > spoke.dcn.davis.ca.us.domain: >>> 12186+ PTR? 29.20.1.50.in-addr.arpa. (41) >>> 10:40:40.176486 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.46127: >>> 12186 1/3/6 PTR ns1.zefox.net. (262) >>> 10:40:57.688225 ARP, Request who-has www.zefox.org tell gateway.zefox.net, >>> length 46 >>> 10:40:57.688305 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui >>> Unknown), length 28 >>> 10:42:14.488727 ARP, Request who-has www.zefox.org tell gateway.zefox.net, >>> length 46 >>> 10:42:14.488804 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui >>> Unknown), length 28 >>> 10:42:43.761226 ARP, Request who-has >>> 50-1-20-1.dsl.static.fusionbroadband.com tell www.zefox.com, length 46 >>> 10:42:43.762522 IP www.zefox.org.56181 > spoke.dcn.davis.ca.us.domain: >>> 28779+ PTR? 26.20.1.50.in-addr.arpa. (41) >>> 10:42:43.790361 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.56181: >>> 28779 1/3/6 PTR www.zefox.com. (265) >>> 10:43:31.289103 ARP, Request who-has www.zefox.org tell gateway.zefox.net, >>> length 46 >>> 10:43:31.289181 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui >>> Unknown), length 28 >>> >>> If I now start an inbound ping from one of my hosts it gets no reply and >>> tcpdump reports no additional traffic. With an outbound ping running >>> there's >>> at least a sparse reply. >>> >>> ^C >>> 28 packets captured >>> 28 packets received by filter >>> 0 packets dropped by kernel >>> root@www:/mnt # >>> >>> The "oui unknown" looks like some sort of failure..... >>> Can you ping www.zefox.org? I have no outside vantage point. >>> There is still no outbound ping running and I would expect >>> you'll get no or very sparse reply. >>> >>> >>> Thus far only the two Pi3s suffer from connectivity problems; Pi2s and a >>> Pi4 have >>> no difficulty on the same address block. Is there a switch for tcpdump >>> that will >>> limit records to relevant traffic? Otherwise it's a flood. >>> >>> These results were obtained after standing idle overnight and >>> are rather different (in ways I don't understand) from behavior >>> immediately after reboot, I'll have to repeat as I learn more. >> >> I wonder if there is a notable difference between >> monitoring traffic from 2 places: >> >> A) from the machine seeing the problem >> vs. >> B) from a machine not having problems but >> connected were all the traffic would be >> on the wire it is connected to. >> >> It may be that monitoring from both and >> comparing/contrasting the reported traffic >> from the two provides additional evidence. >> >> There may be modes of monitoring that are >> relevant for this. But I'm not familiar >> with any detail here. >> >> >> For reference: >> >> # ping www.zefox.org >> PING www.zefox.org (50.1.20.28): 56 data bytes >> ^C >> --- www.zefox.org ping statistics --- >> 32 packets transmitted, 0 packets received, 100.0% packet loss >> >> I found the command traceroute and it reports: >> >> # traceroute www.zefox.org >> traceroute to www.zefox.org (50.1.20.28), 64 hops max, 40 byte packets >> 1 192.168.1.1 (192.168.1.1) 0.697 ms 0.486 ms 1.277 ms >> 2 172.30.26.66 (172.30.26.66) 30.019 ms >> 172.30.26.67 (172.30.26.67) 41.720 ms >> 172.30.26.66 (172.30.26.66) 28.645 ms >> 3 68.85.243.125 (68.85.243.125) 8.967 ms >> 68.85.243.77 (68.85.243.77) 11.462 ms >> 68.85.243.125 (68.85.243.125) 10.254 ms >> 4 24.124.129.106 (24.124.129.106) 7.510 ms >> 96.216.60.165 (96.216.60.165) 10.176 ms >> 24.124.129.106 (24.124.129.106) 8.945 ms >> 5 68.85.243.197 (68.85.243.197) 10.837 ms >> 96.216.60.165 (96.216.60.165) 10.252 ms >> 68.85.243.197 (68.85.243.197) 16.036 ms >> 6 68.85.243.197 (68.85.243.197) 14.660 ms >> be-36211-cs01.seattle.wa.ibone.comcast.net (68.86.93.49) 14.629 ms >> 68.85.243.197 (68.85.243.197) 8.849 ms >> 7 be-2412-pe12.seattle.wa.ibone.comcast.net (96.110.34.142) 14.607 ms >> be-36221-cs02.seattle.wa.ibone.comcast.net (68.86.93.53) 14.122 ms >> be-2212-pe12.seattle.wa.ibone.comcast.net (96.110.34.134) 13.877 ms >> 8 be-2412-pe12.seattle.wa.ibone.comcast.net (96.110.34.142) 14.133 ms * >> 13.663 ms >> 9 be2075.ccr21.sfo01.atlas.cogentco.com (154.54.0.233) 30.176 ms * >> be3717.ccr22.sfo01.atlas.cogentco.com (154.54.86.209) 29.002 ms >> 10 be3717.ccr22.sfo01.atlas.cogentco.com (154.54.86.209) 28.477 ms >> be2430.ccr31.sjc04.atlas.cogentco.com (154.54.88.186) 27.203 ms >> be2075.ccr21.sfo01.atlas.cogentco.com (154.54.0.233) 28.515 ms >> 11 38.104.141.82 (38.104.141.82) 29.820 ms >> be2430.ccr31.sjc04.atlas.cogentco.com (154.54.88.186) 28.605 ms >> 38.104.141.82 (38.104.141.82) 33.735 ms >> 12 38.104.141.82 (38.104.141.82) 27.160 ms >> 0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net (135.180.179.146) 32.336 ms >> 38.104.141.82 (38.104.141.82) 31.867 ms >> 13 0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166) 31.761 ms >> 0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net (135.180.179.146) 29.864 ms >> 0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166) 31.711 ms >> 14 0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166) 30.373 ms >> gig1-1-1.gw.wscrca11.sonic.net (50.1.36.106) 35.567 ms >> 0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166) 31.146 ms >> 15 gig1-1-1.gw.davsca11.sonic.net (50.1.36.110) 31.513 ms >> gig1-1-1.gw.wscrca11.sonic.net (50.1.36.106) 31.203 ms >> gig1-1-1.gw.davsca11.sonic.net (50.1.36.110) 31.354 ms >> 16 gig1-1-1.gw.davsca11.sonic.net (50.1.36.110) 30.125 ms * 31.996 ms >> 17 * * * >> 18 * * * >> 19 * * * >> 20 * * * >> 21 * * * >> 22 * * * >> 23 * * * >> 24 * * * >> 25 * * * >> 26 * * * >> 27 * * * >> 28 * * * >> 29 * * * >> 30 * * * >> ^C >> >> (There did not seem to be much point in having it continue.) > > I found and built a port called net/mtr-nox11 > ("My traceroute") and tried it, letting it just > run. The initial try eventually got a connection > but reported a 99.2% packet loss as of when I > captured the below: > > My traceroute [v0.95] > amd64_ZFS (192.168.1.120) -> www.zefox.org (50.1.20.28) > 2022-05-01T12:40:22-0700 > Keys: Help Display mode Restart statistics Order of fields quit > Packets Pings > Host Loss% Snt Last Avg > Best Wrst StDev > 1. 192.168.1.1 0.0% 135 0.4 0.8 > 0.1 3.1 0.4 > 2. 172.30.26.66 0.0% 134 28.2 26.1 > 9.3 132.7 18.1 > 3. 68.85.243.77 0.0% 134 8.6 9.0 > 7.5 11.2 0.8 > 4. 24.124.129.106 0.0% 134 10.2 9.1 > 7.6 13.4 0.9 > 5. 96.216.60.165 0.0% 134 9.0 9.1 > 7.8 14.3 0.9 > 6. 68.85.243.197 0.0% 134 14.4 13.6 > 9.2 44.3 5.4 > 7. be-36241-cs04.seattle.wa.ibone.comcast.net 0.0% 134 16.8 14.9 > 13.0 22.6 1.1 > 8. be-2412-pe12.seattle.wa.ibone.comcast.net 0.0% 134 13.5 15.0 > 12.8 46.4 3.2 > 9. (waiting for reply) > 10. be2075.ccr21.sfo01.atlas.cogentco.com 0.0% 134 29.3 29.0 > 26.7 54.1 2.9 > 11. be2379.ccr31.sjc04.atlas.cogentco.com 0.0% 134 28.0 28.7 > 27.1 40.3 1.3 > 12. 38.104.141.82 0.0% 134 28.0 33.8 > 26.6 114.8 16.5 > 13. 0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net 0.0% 134 30.9 31.0 > 29.0 33.7 0.8 > 14. 0.xe-0-0-0.cr1.scrmca13.sonic.net 0.0% 134 31.1 32.3 > 29.3 93.2 6.7 > 15. gig1-1-1.gw.wscrca11.sonic.net 0.0% 134 31.3 34.9 > 29.5 330.4 26.5 > 16. gig1-1-1.gw.davsca11.sonic.net 0.0% 134 32.8 32.1 > 29.9 44.1 1.7 > 17. (waiting for reply) > 18. (waiting for reply) > 19. www.zefox.org 99.2% 134 74.9 74.9 > 74.9 74.9 0.0 > > I stopped and restarted it and so far no connection > -- waiting even longer than that first time: Snt > is now over 600. Rows 18 and 19 have not shown up, > the last is 17. > > . . . (some more time goes by) . . . > > I have now stopped it, avoiding the extra load on the > machines and network. > > Looks like there is some problem getting past > gig1-1-1.gw.davsca11.sonic.net . Apologies in advance if I'm just making noise. But here's what I see on a 10Gb network attempting the same traceroute(8) # traceroute www.zefox.org traceroute to www.zefox.org (50.1.20.28), 64 hops max, 40 byte packets 1 static-24-113-41-1.wavecable.com (24.113.41.1) 19.918 ms 16.258 ms 13.852 ms 2 174.127.183.72 (174.127.183.72) 18.036 ms 19.647 ms 18.428 ms 3 be4.cr2-sea-b.bb.as11404.net (174.127.137.16) 16.318 ms 19.963 ms 22.306 ms 4 be1.cr2-sea-a.bb.as11404.net (174.127.149.136) 19.391 ms 14.457 ms 15.808 ms 5 sea-b2-link.ip.twelve99.net (62.115.49.138) 19.613 ms 22.770 ms 20.330 ms 6 sjo-b23-link.ip.twelve99.net (62.115.118.169) 39.478 ms 32.428 ms 34.416 ms 7 palo-b24-link.ip.twelve99.net (62.115.115.216) 70.207 ms 41.846 ms 37.838 ms 8 sonicnet-ic350733-palo-b24.ip.twelve99-cust.net (62.115.181.227) 44.718 ms 33.959 ms 42.723 ms 9 0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net (135.180.179.146) 41.699 ms 42.660 ms 114.578 ms 10 0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166) 47.851 ms 51.590 ms 41.286 ms 11 gig1-1-1.gw.wscrca11.sonic.net (50.1.36.106) 51.199 ms 39.567 ms 40.553 ms 12 gig1-1-1.gw.davsca11.sonic.net (50.1.36.110) 45.005 ms 44.096 ms 41.183 ms 13 * * * 14 * www.zefox.org (50.1.20.28) 62.422 ms * A trip to sonic net indicates they brag on having better privacy than their competition. Are they using any privacy extensions that may affect your ability to ping(8) || traceroute(8) -- TCP/UDP/ICMP? Or is it just that gig1-1-1.gw.davsca11.sonic.net's BGP is out of date (stale)? HTH --Chris > > === > Mark Millard > marklmi at yahoo.com