Network stack returning EFBIG?

Fri Mar 21 10:33:12 UTC 2014

On 21.03.2014, at 03:45, Rick Macklem <rmacklem at uoguelph.ca> wrote:

> Markus Gebert wrote:
>> 
>> On 20.03.2014, at 14:51, wollman at bimajority.org wrote:
>> 
>>> In article <21290.60558.750106.630804 at hergotha.csail.mit.edu>, I
>>> wrote:
>>> 
>>>> Since we put this server into production, random network system
>>>> calls
>>>> have started failing with [EFBIG] or maybe sometimes [EIO].  I've
>>>> observed this with a simple ping, but various daemons also log the
>>>> errors:
>>>> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File
>>>> too
>>>> large [preauth]
>>>> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete
>>>> SSL
>>>> handshake. 5
>>> 
>>> I found at least one call stack where this happens and it does get
>>> returned all the way to userspace:
>>> 
>>> 17  15547   _bus_dmamap_load_buffer:return
>>>             kernel`_bus_dmamap_load_mbuf_sg+0x5f
>>>             kernel`bus_dmamap_load_mbuf_sg+0x38
>>>             kernel`ixgbe_xmit+0xcf
>>>             kernel`ixgbe_mq_start_locked+0x94
>>>             kernel`ixgbe_mq_start+0x12a
>>>             if_lagg.ko`lagg_transmit+0xc4
>>>             kernel`ether_output_frame+0x33
>>>             kernel`ether_output+0x4fe
>>>             kernel`ip_output+0xd74
>>>             kernel`tcp_output+0xfea
>>>             kernel`tcp_usr_send+0x325
>>>             kernel`sosend_generic+0x3f6
>>>             kernel`soo_write+0x5e
>>>             kernel`dofilewrite+0x85
>>>             kernel`kern_writev+0x6c
>>>             kernel`sys_write+0x64
>>>             kernel`amd64_syscall+0x5ea
>>>             kernel`0xffffffff808443c7
>> 
>> This looks pretty similar to what we’ve seen when we got EFBIG:
>> 
>> 3  28502   _bus_dmamap_load_buffer:return
>>              kernel`_bus_dmamap_load_mbuf_sg+0x5f
>>              kernel`bus_dmamap_load_mbuf_sg+0x38
>>              kernel`ixgbe_xmit+0xcf
>>              kernel`ixgbe_mq_start_locked+0x94
>>              kernel`ixgbe_mq_start+0x12a
>>              kernel`ether_output_frame+0x33
>>              kernel`ether_output+0x4fe
>>              kernel`ip_output+0xd74
>>              kernel`rip_output+0x229
>>              kernel`sosend_generic+0x3f6
>>              kernel`kern_sendit+0x1a3
>>              kernel`sendit+0xdc
>>              kernel`sys_sendto+0x4d
>>              kernel`amd64_syscall+0x5ea
>>              kernel`0xffffffff80d35667
>> 
>> In our case it looks like some of the ixgbe tx queues get stuck, and
>> some don’t. You can test, wether your server shows the same symptoms
>> with this command:
>> 
>> # for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i
>> 0.5 -c 2 -W 1 10.0.0.1 | grep sendto; done
>> 
>> We also use 82599EB based ixgbe controllers on affected systems.
>> 
>> Also see these two threads on freebsd-net:
>> 
>> http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037967.html
>> http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html
>> 
>> I have started the second one, and there are some more details of
>> what we were seeing in case you’re interested.
>> 
>> Then there is:
>> 
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=183390
>> and:
>> https://bugs.freenas.org/issues/4560
>> 
> Well, the "before" printf() from my patch is indicating a packet > 65535
> and that will definitely result in a EFBIG. (There is no way that m_defrag()
> can squeeze > 64K into 32 MCLBYTES mbufs.)

Makes sense.

> Note that the EFBIG will be returned by the call that dequeues this packet
> and tries to transmit it (not necessarily the one that generated/queued the
> packet). This was pointed out by Ryan in a previous discussion of this.

I remember that email, and it also explains why a ping could fail when it happens to be on the same queue. On the other hand, would it explain why every single ping on certain queues starts to fail, while other queues are unaffected? Of course it could be that whatever triggers the problem, resends the huge segment immediately over the same TCP connection, and blocks one queue for some time by repeating this over and over quickly enough to kill every single ping packet. However this sounds unlikely to me. And once we saw the problem, I umounted all NFS shares and therefore eliminated all sources of huge packets, and the problem persisted. So, in my opinion, there must be more to it than just a packet too big once in a while.

> The code snippet from sys/netinet/tcp_output.c looks pretty straightforward:
>       /*
> 772 	* Limit a burst to t_tsomax minus IP,
> 773 	* TCP and options length to keep ip->ip_len
> 774 	* from overflowing or exceeding the maximum
> 775 	* length allowed by the network interface.
> 776 	*/
> 777 	if (len > tp->t_tsomax - hdrlen) {
> 778 	   len = tp->t_tsomax - hdrlen;
> 779 	   sendalot = 1;
> 780 	}
> If it is a TSO segment of > 65535, at a glance it would seem that this "if"
> is busted. Just to see, you could try replacing line# 777-778 with
>       if (len > IP_MAXPACKET - hdrlen) {
>           len = IP_MAXPACKET - hdrlen;
> which was what it was in 9.1. (Maybe t_tsomax isn't set correctly or somehow
> screws up the calculation?

I cannot answer your question, but this is an interesting catch. I’ll get this and your printfs in our 9.2 kernel as soon as I can.

Markus

> rick
> 
>> 
>> Markus
>> _______________________________________________
>> freebsd-net at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to
>> "freebsd-net-unsubscribe at freebsd.org"
>> 
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"