read() returns ETIMEDOUT on steady TCP connection

Andre Oppermann andre at freebsd.org
Tue Apr 22 23:47:13 UTC 2008


Andre Oppermann wrote:
> Mark Hills wrote:
>> On Mon, 21 Apr 2008, Andre Oppermann wrote:
>>
>>> Mark Hills wrote:
>>>> On Sun, 20 Apr 2008, Peter Jeremy wrote:
>>
>>>>> I can't explain the problem but it definitely looks like a resource
>>>>> starvation issue within the kernel.
>>>>
>>>> I've traced the source of the ETIMEDOUT within the kernel to 
>>>> tcp_timer_rexmt() in tcp_timer.c:
>>>>
>>>>   if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
>>>>           tp->t_rxtshift = TCP_MAXRXTSHIFT;
>>>>           tcpstat.tcps_timeoutdrop++;
>>>>           tp = tcp_drop(tp, tp->t_softerror ?
>>>>                         tp->t_softerror : ETIMEDOUT);
>>>>           goto out;
>>>>   }
>>>
>>> Yes, this is related to either lack of mbufs to create a segment
>>> or a problem in sending it.  That may be full interface queue, a
>>> bandwidth manager (dummynet) or some firewall internally rejecting
>>> the segment (ipfw, pf).  Do you run any firewall in stateful mode?
>>
>> There's no firewall running.
>>
>>>> I'm new to FreeBSD, but it seems to implies that it's reaching a 
>>>> limit of a number of retransmits of sending ACKs on the TCP 
>>>> connection receiving the inbound data? But I checked this using 
>>>> tcpdump on the server and could see no retransmissions.
>>>
>>> When you have internal problems the segment never makes it to the
>>> wire and thus you wont see it in tcpdump.
>>>
>>> Please report the output of 'netstat -s -p tcp' and 'netstat -m'.
>>
>> Posted below. You can see it it in there: "131 connections dropped by 
>> rexmit timeout"
>>
>>>> As a test, I ran a simulation with the necessary changes to increase 
>>>> TCP_MAXRXTSHIFT (including adding appropriate entries to 
>>>> tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to 
>>>> reduce the frequency of the problem occurring, but not to a usable 
>>>> level.
>>>
>>> Possible causes are timers that fire too early.  Resource starvation
>>> (you are doing a lot of traffic).  Or of course some bug in the code.
>>
>> As I said in my original email, the data transfer doesn't stop or 
>> splutter, it's simply cut mid-flow. Sounds like something happening 
>> prematurely.
>>
>> Thanks for the help,
> 
> The output doesn't show any obvious problems.  I have to write some
> debug code to run on your system.  I'll do that later today if time
> permits.  Otherwise tomorrow.

  http://people.freebsd.org/~andre/tcp_output-error-log.diff

Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output.  You likely get some (normal) noise from syncache.
What we are looking for is reports from tcp_output.

-- 
Andre


More information about the freebsd-net mailing list