Network stack returning EFBIG?
Daniel Braniss
danny at cs.huji.ac.il
Thu Mar 20 13:32:06 UTC 2014
turn off TSO
the problems sound similar to the one I reported a while back. truing off tso fixed it.
danny
On Mar 20, 2014, at 3:26 PM, Garrett Wollman <wollman at bimajority.org> wrote:
> I recently put a new server running 9.2 (with a local patches for NFS)
> into production, and it's immediately started to fail in an odd way.
> Since I pounded this server pretty heavily and never saw the error in
> testing, I'm more than a little bit taken aback. We have identical
> hardware in production with 9.1, and I have the same kernel running
> just peachy on a machine with Chelsio T4 NICs. The problem machine has
> ixgbe(4):
>
> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> port 0x9c00-0x9c1f mem 0xdef80000-0xdeffffff,0xdef7c000-0xdef7ffff irq 24 at device 0.0 on pci2
> ix0: Using MSIX interrupts with 7 vectors
> ix0: Ethernet address: 04:7d:7b:a5:87:32
> ix0: PCI Express Bus: Speed 5.0GT/s Width x4
> ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> port 0x9880-0x989f mem 0xdee80000-0xdeefffff,0xdee7c000-0xdee7ffff irq 34 at device 0.1 on pci2
> ix1: Using MSIX interrupts with 7 vectors
> ix1: Ethernet address: 04:7d:7b:a5:87:33
> ix1: PCI Express Bus: Speed 5.0GT/s Width x4
>
> (pciconf tells me these are "82599EB 10-Gigabit SFI/SFP+ Network
> Connection". It's a bug that the driver doesn't tell me that.)
>
> These are glued together in a lagg(4) using LACP.
>
> Since we put this server into production, random network system calls
> have started failing with [EFBIG] or maybe sometimes [EIO]. I've
> observed this with a simple ping, but various daemons also log the
> errors:
> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too large [preauth]
> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL handshake. 5
>
> The machine eventually becomes unreachable and has to be rebooted from
> the console.
>
> So, can anyone tell me how this is possible, and what changed between
> 9.1 and 9.2 to cause it?
>
> -GAWollman
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
More information about the freebsd-net
mailing list