Network stack returning EFBIG?
Markus Gebert
markus.gebert at hostpoint.ch
Thu Mar 20 15:22:39 UTC 2014
On 20.03.2014, at 14:51, wollman at bimajority.org wrote:
> In article <21290.60558.750106.630804 at hergotha.csail.mit.edu>, I wrote:
>
>> Since we put this server into production, random network system calls
>> have started failing with [EFBIG] or maybe sometimes [EIO]. I've
>> observed this with a simple ping, but various daemons also log the
>> errors:
>> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too
>> large [preauth]
>> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL
>> handshake. 5
>
> I found at least one call stack where this happens and it does get
> returned all the way to userspace:
>
> 17 15547 _bus_dmamap_load_buffer:return
> kernel`_bus_dmamap_load_mbuf_sg+0x5f
> kernel`bus_dmamap_load_mbuf_sg+0x38
> kernel`ixgbe_xmit+0xcf
> kernel`ixgbe_mq_start_locked+0x94
> kernel`ixgbe_mq_start+0x12a
> if_lagg.ko`lagg_transmit+0xc4
> kernel`ether_output_frame+0x33
> kernel`ether_output+0x4fe
> kernel`ip_output+0xd74
> kernel`tcp_output+0xfea
> kernel`tcp_usr_send+0x325
> kernel`sosend_generic+0x3f6
> kernel`soo_write+0x5e
> kernel`dofilewrite+0x85
> kernel`kern_writev+0x6c
> kernel`sys_write+0x64
> kernel`amd64_syscall+0x5ea
> kernel`0xffffffff808443c7
This looks pretty similar to what we’ve seen when we got EFBIG:
3 28502 _bus_dmamap_load_buffer:return
kernel`_bus_dmamap_load_mbuf_sg+0x5f
kernel`bus_dmamap_load_mbuf_sg+0x38
kernel`ixgbe_xmit+0xcf
kernel`ixgbe_mq_start_locked+0x94
kernel`ixgbe_mq_start+0x12a
kernel`ether_output_frame+0x33
kernel`ether_output+0x4fe
kernel`ip_output+0xd74
kernel`rip_output+0x229
kernel`sosend_generic+0x3f6
kernel`kern_sendit+0x1a3
kernel`sendit+0xdc
kernel`sys_sendto+0x4d
kernel`amd64_syscall+0x5ea
kernel`0xffffffff80d35667
In our case it looks like some of the ixgbe tx queues get stuck, and some don’t. You can test, wether your server shows the same symptoms with this command:
# for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.5 -c 2 -W 1 10.0.0.1 | grep sendto; done
We also use 82599EB based ixgbe controllers on affected systems.
Also see these two threads on freebsd-net:
http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037967.html
http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html
I have started the second one, and there are some more details of what we were seeing in case you’re interested.
Then there is:
http://www.freebsd.org/cgi/query-pr.cgi?pr=183390
and:
https://bugs.freenas.org/issues/4560
Markus
More information about the freebsd-stable
mailing list