(ipfw) Re: HELP! fetch: stuck forever OR error: RPC failed: curl 56 recv failure: Operation timed out

From: Ronald Klop <ronald_at_FreeBSD.org>
Date: Sun, 08 Dec 2024 19:30:36 UTC
Hi,

I can reproduce your error.

Today I updated my RPI4 from a build of Oct 23 to Dec 6. And I can reproduce the problem.
After about 2 hours scp exits with:
client_loop: send disconnect: Broken pipe
scp: Connection closed

Working:
FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #4 main-d2e7bb630b8-dirty: Wed Oct 23 00:55:12 CEST 2024     ronald@rpi4:/data/ronald/freebsd/obj/data/ronald/freebsd/src/main/arm64.aarch64/sys/GENERIC-NODEBUG arm64

Broken:
FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #5 main-839fb85336a-dirty: Sat Dec  7 22:33:27 CET 2024     ronald@rpi4:/data/ronald/freebsd/obj/data/ronald/freebsd/src/main/arm64.aarch64/sys/GENERIC-NODEBUG arm64

A cronjob which does a scp to another server didn't work anymore. When I go back to the previous BE it works fine again.
Ipfw disable firewall also makes the scp work.

Scp also seems to work fine if I replace the statefull firewall rules with stateless "pass all from any to any".

Regards,
Ronald.



Op 06-12-2024 om 21:09 schreef FreeBSD User:
> Am Fri, 6 Dec 2024 19:40:02 +0100 (CET)
> Ronald Klop <ronald-lists@klop.ws> schrieb:
> 
>> Might be useful to share your ipfw config.
> 
> Sorry, my posting must have been disturbing (having in mind a "deny any rule and then
> disabling the FW ...).
> 
> Well, the IPFW setup itself is explained quickly - I use almost the vanilla rc.conf-issued
> IPFW (settings: firewall_type="workstation", firewall_logif="YES",
> firewall_myservices="22/tcp", firewall_allowservices="any"). The hosts in question have the
> following kernel configuration, I provide the option tags that might be of interest or, if
> not, just for the record, as they are not part of GENERIC, see below.
> 
> Also, I'll provide some sysctl setting performed via /etc/sysctl.conf.local, see below.
> 
> The configuration and settings have been mostly unchanged over a couple of months for now and
> did not induce trouble so far.
> 
> As it deemed fit regarding time and my limited skills, I disabled and enabled piece by piece
> of the MAC_ and NETGRAPH_ options - without any success so far - my "measurement" is fetching
> emails via claws-mail (all TLS).  claws-mail reports "corrupted/broken stream", does have
> authetication issues and is de facto unusable - it doesn't refresh IMAP based email fetches
> and doesn't even quit without a hard kill.
> Another "indicator" is the time taken to "git pull" of ZFS filesystems: cloning and pulling
> takes unusual long (/usr/src is UFS/FFS, /usr/ports on a ZFS pool and since the problem
> occured, it makes a mutual difference).
> 
> While git pull or clone mutually stuck and claws-mail is endlessly fetching/authenticating
> emails and never responding back in a usable manner, performing
> 
> "ipfw disable firewall"
> 
> makes all of a sudden the system work again as usual and expected.
> 
> As reported - the problem spreads across all of my CURRENT hosts as I'm going to update them
> towards a recent CURRENT (they all share similar static kernel configs as described here). Most
> of the boxes do not show the weird reluctant behaviour when pulling via git, but weren't
> capable of cloning, bailing out with the timeout reported earlier.
> 
> I use one CURRENT box as my personal desktop, so no other (server) CURRENT show the Email
> problem in detail as described.
> 
> And, for the record: I haven't commented out the "options     IPFIREWALL" yet in the kernel
> config ...
> 
> 
> Kind regards
> 
> oh
> 
> [ KERNEL config different from vanilla GENERIC ]
> 
> options     RATELIMIT
> options     ZFS
> options     TCPHPTS
> options     MROUTING
> options     IPSEC
> options     SCTP
> 
> options     MAC_BSDEXTENDED
> options     MAC_PORTACL
> options     MAC_IPACL
> options     MAC_NTPD
> #options     MAC_DO
>   
> options     NETGRAPH
> options     NETGRAPH_IPFW
> options     NETGRAPH_ETHER
> options     NETGRAPH_EIFACE
> options     NETGRAPH_VLAN
> #options        NETGRAPH_NAT
> options     NETGRAPH_DEVICE
> #options        NETGRAPH_PPPOE
> options     NETGRAPH_SOCKET
> options     NETGRAPH_KSOCKET
> options     NETGRAPH_NETFLOW
> #options        NETGRAPH_CAR
> 
> # IPFW firewall
> options     IPFIREWALL
> options     IPFIREWALL_VERBOSE
> options     DUMMYNET        # traffic shaper
> 
> options     BPF_JITTER  # adds support for BPF just-in-time compiler.
> 
> # Pseudo devices not in GENERIC.
> device      enc     # IPsec device
> device      stf     # 6to4 IPv6 over IPv4 encapsulation
> device      carp    # Common address redundancy protocol
> device      lagg    # Link aggregation
> device      gre     # GRE Tunnel
> device      epair   # A pair of virtual back-to-back connected Ethernet interfaces
> device      if_bridge   # bridge device
> device      vxlan   # Virtual eXtensible LAN interface
> 
> 
> For the MAC_ Modules: the appropriate OIDs (sysctl) are disabled as far as the MAC module
> influence the initial behaviour if unconfigured, for instance
> (/etc/sysctl.conf.local)
> 
> [ /etc/sysctl.conf.local ]
> security.mac.bsdextended.enabled=0
> security.mac.mls.enabled=0
> security.mac.portacl.enabled=0
> security.mac.do.enabled=0
> security.mac.ipacl.ipv6=0
> security.mac.ipacl.ipv4=0
> #
> net.bpf.optimize_writers=1
> #
> net.inet.ip.fw.verbose=1
> #net.inet.ip.fw.verbose_limit=10
> net.inet.ip.fw.dyn_keep_states=1
> 
> 
> 
> 
> 
>>
>> Van: FreeBSD User <freebsd@walstatt-de.de>
>> Datum: 6 december 2024 03:47
>> Aan: freebsd-current@freebsd.org, freebsd-ipfw@freebsd.org
>> Onderwerp: Re: HELP! fetch: stuck forever OR error: RPC failed: curl 56 recv failure:
>> Operation timed out
>>
>>>
>>>
>>> Am Thu, 5 Dec 2024 17:33:54 +0100
>>> FreeBSD User  schrieb:
>>>
>>> I found the culprit!
>>>
>>> Disabling IPFW ("ipfw disable firewall") turns system back to normal!
>>>
>>> For the record: on recent CURRENT, since approx. Nov. 30 and/or December 1st CURRENT seems
>>> to corrupt network connections.
>>>
>>> IPFW is compiled statically into the kernel.
>>>
>>> The problem sketched below can be reproduced in a more or less obvious manner on recent
>>> CURRENT: git pull/git clone of a regular FreeBSD source repo or ports via git+https takes
>>> either a couple of time (up to several mintes to initiate the pull) - or, in some worse
>>> cases here, the box runs into
>>> error: RPC failed; curl 56 Recv failure: Operation timed out
>>>
>>> claws-mail complains about "corrupted/broken stream", fetching emails takes Aeons -
>>> forever, the client does not come back even after several hours.
>>>    
>>>> On Thu, 5 Dec 2024 16:55:00 +0100
>>>> Daniel Tameling  wrote:
>>>>    
>>>>> On Thu, Dec 05, 2024 at 11:51:03AM +0100, FreeBSD User wrote:
>>>>>> On Wed, 04 Dec 2024 17:20:39 +0000
>>>>>> "Dave Cottlehuber"  wrote:
>>>>>>
>>>>>> Thank you very much for responding!
>>>>>>        
>>>>>>> On Tue, 3 Dec 2024, at 19:46, FreeBSD User wrote:
>>>>>>>> On most recent CURRENT (on some boxes of ours, not all) fetch/git seem
>>>>>>>> to be stuck
>>>>>>>> forever fetching tarballs from ports, fetching Emails via claws-mail
>>>>>>>> (TLS), opening
>>>>>>>> websites via librewolf and firefox or pulling repositories via git.
>>>>>>>>
>>>>>>>> CURRENT: FreeBSD 15.0-CURRENT #1 main-n273978-b5a8abe9502e: Mon Dec  2
>>>>>>>> 23:11:07 CET 2024
>>>>>>>> amd64
>>>>>>>>
>>>>>>>> When performing "git pull" und /usr/ports, I received after roughly 5-7 minutes:
>>>>>>>>
>>>>>>>> error: RPC failed: curl 56 recv failure: Operation timed out
>>>>>>>
>>>>>>> Generally it would be worth seeing if the HTTP(S) layers are doing the right thing
>>>>>>> or not, and then working down from there, to tcpdump / wireshark and then if
>>>>>>> necessary into kernel itself.
>>>>>>
>>>>>> My skills are limited, according to packet analysis utilizing tcpdum/wireshark (and
>>>>>> theory,of course). I tried due to "a feeling" my used older Intel based NIC could
>>>>>> have some checksum issues like in the past (I saw e1000 driver updates recently
>>>>>> flowing into FreeBSD CURRENT).
>>>>>>>
>>>>>>> If fetch fails reliably in ports distfile fetching, then isolate a suitable
>>>>>>> tarball, and try it again in curl, with tcpdump already prepared to capture
>>>>>>> traffic to the remote host.
>>>>>>>
>>>>>>> tcpdump -w /tmp/curl.pcap -i ... host ...
>>>>>>>
>>>>>>> env SSLKEYLOGFILE=/tmp/ssl.keys curl -vsSLo /dev/null --trace
>>>>>>> /tmp/curl.log https://what.ev/er
>>>>>>>
>>>>>>> I would guess that between the two something useful should pop up.
>>>>>>>
>>>>>>> I like opening the pcap in wireshark, it often has angry red and black highlighted
>>>>>>> lines already giving me a hint.
>>>>>>>
>>>>>>> The SSLKEYLOGFILE can be imported into wireshark, and allows decrypting the TLS
>>>>>>> traffic as well in case there are issues further in. Very handy,
>>>>>>> see https://everything.curl.dev/usingcurl/tls/sslkeylogfile.html for how to do that.
>>>>>>>
>>>>>>> If your issues only occur with git pull, its also curl inside and supports similar
>>>>>>> debugging. Ferreting
>>>>>>> through https://stackoverflow.com/questions/6178401/how-can-i-debug-git-git-shell-related-problems/56094711#56094711 should get you similar info.
>>>>>>>
>>>>>>> A+
>>>>>>> Dave
>>>>>>>        
>>>>>>
>>>>>> Thanks for the hints and precious tips! I'll digg deeper into the matter.
>>>>>>
>>>>>> In the meanwhile, I updated some other machines running CURRENT since approx. two
>>>>>> weeks with an older CURRENT to the most recent one - and face similar but not
>>>>>> identical problems!
>>>>>> Updating exiting FreeBSD repositories, like src.git and ports.git, show no problems
>>>>>> except they take longer to accomplish than expected.
>>>>>> Cloning a repo is impossible, after 10 or 15 minutes I receive a timeout.
>>>>>>
>>>>>> On aCURRENT recently updated and worked flawlessly before (CURRENT now: FreeBSD
>>>>>> 15.0-CURRENT #5 main-n274014-b2bde8a6d39: Wed Dec  4 22:22:22 CET 2024 amd64),
>>>>>> cloning attempts for 14.2-RELENG ends up in this mess:
>>>>>>
>>>>>> # git clone --branch releng/14.2 https://git.freebsd.org/src.git 14.2-RELENG/src/
>>>>>> Cloning into '14.2-RELENG/src'...
>>>>>> error: RPC failed; curl 56 Recv failure: Operation timed out
>>>>>> fatal: expected 'packfile'
>>>>>>
>>>>>> This is nasty. The host now in question has an i350 based dual-port NIC - the host's
>>>>>> kernel is very similar to the box I reported the issue first time, both do have
>>>>>> customized kernels (in most cases, I compile several modules like ZFS and
>>>>>> several NETGRAPH modules statically into the kernel - a habit inherited from a small
>>>>>> FBSD project I configured (I wouldn't say developed) which does not allow loadable
>>>>>> kernel modules due to regulations.
>>>>>>
>>>>>> I hoped others would stumble over this tripwire in recent CURRENT sources, since the
>>>>>> phenomena and its distribution over a bunch of CURRENT boxes with different OS states
>>>>>> seemingly show different behviour.
>>>>>>
>>>>>> And for the record: I also build my ports via poudriere and mostly via make. I also
>>>>>> rebuilt in a two day's marathon all packages via "make -f" - for librewolf, curl and
>>>>>> so on to ensure having latest sources/packages.
>>>>>>
>>>>>> (I repeat myself here again, sorry, its for the record).
>>>>>>
>>>>>> Will report in on further development and "investigations"
>>>>>>
>>>>>> Kind regards and thanks,
>>>>>>
>>>>>> oh
>>>>>>
>>>>>>        
>>>>>
>>>>> This is a shot into the dark but is this a virtual machine? VirtualBox 7.1.0 had some
>>>>> networking issues that got fixed later.
>>>>
>>>> No, pure Hardware and FreeBSD ...
>>>>    
>>>>>
>>>>> Otherwise I would start with ping and traceroute to figure out if they show this issue
>>>>> and where it occurs.
>>>>>      
>>>>
>>>>    
>>>
>>>
>>>
>>> -- 
>>> O. Hartmann
>>>
>>>
>>>
>>>
>>>    
> 
> 
>