svn commit: r222688 - head/sbin/hastd
Mikolaj Golub
trociny at freebsd.org
Fri Jun 10 16:43:53 UTC 2011
On Thu, 09 Jun 2011 11:31:31 -0700 Maxim Sobolev wrote:
MS> On 6/9/2011 6:10 AM, Mikolaj Golub wrote:
>> >>> Hmm, not sure what exactly is wrong? Sender does 3 writes to the TCP
>> >>> socket - 32k, 32k and 1071 bytes, while receiver does one
>> >>> recv(MSG_WAITALL) with the size of 66607. So I suspect sender's kernel
>> >>> does deliver two 32k packets and fills up receiver's buffer or
>> >>> something. And the remaining 1071 bytes stay somewhere in sender's
>> >>> kernel indefinitely, while recv() cannot complete in receiver's. Using
>> >>> the same size when doing recv() solves the issue for me.
>>
>> With MSG_WAITALL, if data to receive are larger than receive buffer, after
>> receiving some part of data it is drained to user buffer and the protocol is
>> notified (sending window update) that there is some space in the receive
>> buffer. So, normally, there should not be an issue with the scenario described
>> above. But there was a race in soreceive_generic(), I believe I have fixed in
>> r222454, when the connection could stall in sbwait. Do you still observe the
>> issue with only r222454 applied?
MS> The patch makes things slightly better, but it appears that there are
MS> still some "magic" buffer sizes that got stuck somewhere. Particularly
MS> 66607 bytes in my case. You can probably easily reproduce the issue by
MS> creating large disk with data of various kind (i.e. FreeBSD UFS with
MS> source/object code for example), enabling compression and setting
MS> block size to 128kb. Then at least if you run this scenario over WAN
MS> it should stuck from time to time when hitting that "magic" size. One
MS> can probably easily write simple test case in C with server part
MS> sending 32k, 32k and 1071 bytes and receiver reading the whole message
MS> with WAITALL. Unfortunately I am overloaded right now, so it's
MS> unlikely that I would do it.
I am observing HAST connection getting stuck in my tests. I don't know if this
is the same case as yours, but investigating my case two problems have been
exposed:
1) Automatic receive buffer sizing does not work for half-opened sockets (when
one direction is shutdowned).
2) It looks like in combination small recv buffer + MSG_WAITALL it is possible
to get state when recv window after shrinking to 0 is not reopened. The recv
window is permanently stuck at 0 and pending data is only sent via TCP window
probes (so one byte every few seconds).
Concerning (1) hastd sends data in one direction, so other direction is closed
by shutdown(2). The receiving socket is in FIN_WAIT_2 state. As I see from
netstat output recvbuf (R-HIWA) is 71680 and does not increase.
Proto Recv-Q Send-Q Local Address Foreign Address R-MBUF S-MBUF R-CLUS S-CLUS R-HIWA S-HIWA R-LOWA S-LOWA R-BCNT S-BCNT R-BMAX S-BMAX rexmt persist keep 2msl delack rcvtime
tcp4 58539 0 192.168.1.103.7772 192.168.1.103.61333 20 0 20 0 71680 43008 1 2048 82944 0 262144 262144 0.00 0.00 1799.42 0.00 0.00 0.58
tcp4 0 92160 192.168.1.103.61333 192.168.1.103.7772 0 27 0 24 0 92160 0 2048 0 103168 0 262144 0.00 4.52 1799.52 0.00 0.00 0.48
When I remove socket shutdown from HAST code, so the sockets are in
ESTABLISHED state, recvbuf grows almost by R-BMAX:
Proto Recv-Q Send-Q Local Address Foreign Address R-MBUF S-MBUF R-CLUS S-CLUS R-HIWA S-HIWA R-LOWA S-LOWA R-BCNT S-BCNT R-BMAX S-BMAX rexmt persist keep 2msl delack rcvtime
tcp4 0 0 192.168.1.103.7772 192.168.1.103.18457 0 0 0 0 219136 43008 1 2048 0 0 262144 262144 0.00 0.00 1793.15 0.00 0.00 6.85
tcp4 0 0 192.168.1.103.18457 192.168.1.103.7772 0 0 0 0 71680 198656 1 2048 0 0 262144 262144 0.00 0.00 1793.25 0.00 0.00 6.75
and I can't reproduce the issue (2) in this case.
Maxim, could you please try the attached patch and see if it fixes the issue
for you? The patch removes the code that shutdowns unused direction (actually,
it reverts r220271).
Concerning the issue (2), below are results of preliminary investigation :-).
When the connection stalls tcpdump output looks like below:
09:56:53.600274 IP 192.168.1.103.51645 > 192.168.1.103.7772: Flags [.], seq 515310314:515310315, ack 1135313145, win 8896, options [nop,nop,TS val 461359 ecr 1842953560], length 1
09:56:53.600323 IP 192.168.1.103.7772 > 192.168.1.103.51645: Flags [.], ack 1, win 0, options [nop,nop,TS val 1842954060 ecr 461359], length 0
09:56:58.600265 IP 192.168.1.103.51645 > 192.168.1.103.7772: Flags [.], seq 1:2, ack 1, win 8896, options [nop,nop,TS val 461859 ecr 1842954060], length 1
09:56:58.600322 IP 192.168.1.103.7772 > 192.168.1.103.51645: Flags [.], ack 2, win 0, options [nop,nop,TS val 1842954560 ecr 461859], length 0
...
kgdb shows that the reciving thread is in soreceive_generic->sbwait:
/*
* If we have less data than requested, block awaiting more (subject
* to any timeout) if:
* 1. the current count is less than the low water mark, or
* 2. MSG_WAITALL is set, and it is possible to do the entire
* receive operation at once if we block (resid <= hiwat).
* 3. MSG_DONTWAIT is not set
* If MSG_WAITALL is set but resid is larger than the receive buffer,
* we have to do the receive in sections, and thus risk returning a
* short count if a timeout or signal occurs after we start.
*/
if (m == NULL || (((flags & MSG_DONTWAIT) == 0 &&
so->so_rcv.sb_cc < uio->uio_resid) &&
(so->so_rcv.sb_cc < so->so_rcv.sb_lowat ||
((flags & MSG_WAITALL) && uio->uio_resid <= so->so_rcv.sb_hiwat)) &&
m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) {
...
error = sbwait(&so->so_rcv);
And here are some parameters from gdb:
uio:
uio_resid = 65536
so->so_rcv:
sb_cc = 63318,
sb_hiwat = 71680,
sb_mbcnt = 89600,
sb_mcnt = 22,
sb_ccnt = 21,
sb_mbmax = 262144,
sb_lowat = 1,
so->so_pcb->inp_ppcb:
t_state = 9,
t_flags = 525300,
rcv_nxt = 515310391,
rcv_adv = 515310391,
rcv_wnd = 8264,
rcv_up = 515310390,
t_maxseg = 14336,
The request (uio_resid) is 65536 bytes, 63318 bytes (sb_cc) were
recived, MSG_WAITALL is set and uio_resid less than buffer size (sb_hiwat), so
the thread is in sbwait() waiting for the rest of data (2218 bytes).
I suppose that although there is space in the buffer (according to sbspace()
recwin is sb_hiwat - sb_cc = 8362), this value is small to open window:
tcp_output():
/*
* Calculate receive window. Don't shrink window,
* but avoid silly window syndrome.
*/
if (recwin < (long)(so->so_rcv.sb_hiwat / 4) &&
recwin < (long)tp->t_maxseg)
recwin = 0;
So (if I have not missed something) we have a situation when recvbuf is almost
full but still has enough space to satisfy MSG_WAITALL request without
draining data to user buffer, so soreceive waits for data, but the sender
can't send them because window is 0, and window is not going to be increased
until the data is drained to user buffer...
I am going to check this writing a test case and send the results to net at .
--
Mikolaj Golub
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hastd.no_shutdown.patch
Type: text/x-patch
Size: 1573 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/svn-src-all/attachments/20110610/6f27697f/hastd.no_shutdown.bin
More information about the svn-src-all
mailing list