Terrible NFS performance under 9.2-RELEASE?
Rick Macklem
rmacklem at uoguelph.ca
Tue Jan 21 02:01:14 UTC 2014
Since this is getting long winded, I'm going to "cheat" and top post.
(Don't top post flame suit on;-)
You could try setting
net.inet.tcp.delayed_ack=0
via sysctl.
I just looked and it appears that TCP delays ACKs for a while, even
when TCP_NODELAY is set (I didn't know that). I honestly don't know
how much/if any effect these delayed ACKs will have, but is you
disable them, you can see what happens.
rick
> MIME-Version: 1.0
> Sender: jdavidlists at gmail.com
> Received: by 10.42.170.8 with HTTP; Sun, 19 Jan 2014 20:08:04 -0800
> (PST)
> In-Reply-To:
> <1349281953.12559529.1390174577569.JavaMail.root at uoguelph.ca>
> References: <52DC1241.7010004 at egr.msu.edu>
> <1349281953.12559529.1390174577569.JavaMail.root at uoguelph.ca>
> Date: Sun, 19 Jan 2014 23:08:04 -0500
> Delivered-To: jdavidlists at gmail.com
> X-Google-Sender-Auth: 2XgnsPkoaEEkfTqW1ZVFM_Lel3o
> Message-ID:
> <CABXB=RQDpva-fiMJDiRX_TZhkoQ9kZtk6n3i6=pw1z6cad_1KQ at mail.gmail.com>
> Subject: Re: Terrible NFS performance under 9.2-RELEASE?
> From: J David <j.david.lists at gmail.com>
> To: Rick Macklem <rmacklem at uoguelph.ca>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Sun, Jan 19, 2014 at 9:32 AM, Alfred Perlstein
> <alfred at freebsd.org> wrote:
> > I hit nearly the same problem and raising the mbufs worked for me.
> >
> > I'd suggest raising that and retrying.
>
> That doesn't seem to be an issue here; mbufs are well below max on
> both client and server and all the "delayed"/"denied" lines are
> 0/0/0.
>
>
> On Sun, Jan 19, 2014 at 12:58 PM, Adam McDougall
> <mcdouga9 at egr.msu.edu> wrote:
> > Also try rsize=32768,wsize=32768 in your mount options, made a huge
> > difference for me.
>
> This does make a difference, but inconsistently.
>
> In order to test this further, I created a Debian guest on the same
> host as these two FreeBSD hosts and re-ran the tests with it acting
> as
> both client and server, and ran them for both 32k and 64k.
>
> Findings:
>
>
> random random
> write rewrite read reread read write
>
> S:FBSD,C:FBSD,Z:64k
> 67246 2923 103295 1272407 172475 196
>
> S:FBSD,C:FBSD,Z:32k
> 11951 99896 223787 1051948 223276 13686
>
> S:FBSD,C:DEB,Z:64k
> 11414 14445 31554 30156 30368 13799
>
> S:FBSD,C:DEB,Z:32k
> 11215 14442 31439 31026 29608 13769
>
> S:DEB,C:FBSD,Z:64k
> 36844 173312 313919 1169426 188432 14273
>
> S:DEB,C:FBSD,Z:32k
> 66928 120660 257830 1048309 225807 18103
>
> So the rsize/wsize makes a difference between two FreeBSD nodes, but
> with a Debian node as either client or server, it no longer seems to
> matter much. And /proc/mounts on the debian box confirms that it
> negotiates and honors the 64k size as a client.
>
> On Sun, Jan 19, 2014 at 6:36 PM, Rick Macklem <rmacklem at uoguelph.ca>
> wrote:
> > Yes, it shouldn't make a big difference but it sometimes does. When
> > it
> > does, I believe that indicates there is a problem with your network
> > fabric.
>
> Given that this is an entirely virtual environment, if your belief is
> correct, where would supporting evidence be found?
>
> As far as I can tell, there are no interface errors reported on the
> host (checking both taps and the bridge) or any of the guests,
> nothing
> in sysctl dev.vtnet of concern, etc. Also the improvement from using
> debian on either side, even with 64k sizes, seems counterintuitive.
>
> To try to help vindicate the network stack, I did iperf -d between
> the
> two FreeBSD nodes while the iozone was running:
>
> Server:
>
> $ iperf -s
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 1.00 MByte (default)
>
> ------------------------------------------------------------
>
> [ 4] local 172.20.20.162 port 5001 connected with 172.20.20.169 port
> 37449
>
> ------------------------------------------------------------
>
> Client connecting to 172.20.20.169, TCP port 5001
>
> TCP window size: 1.00 MByte (default)
>
> ------------------------------------------------------------
>
> [ 6] local 172.20.20.162 port 28634 connected with 172.20.20.169
> port 5001
>
> Waiting for server threads to complete. Interrupt again to force
> quit.
>
> [ ID] Interval Transfer Bandwidth
>
> [ 6] 0.0-10.0 sec 15.8 GBytes 13.6 Gbits/sec
>
> [ 4] 0.0-10.0 sec 15.6 GBytes 13.4 Gbits/sec
>
>
> Client:
>
> $ iperf -c 172.20.20.162 -d
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 1.00 MByte (default)
>
> ------------------------------------------------------------
>
> ------------------------------------------------------------
>
> Client connecting to 172.20.20.162, TCP port 5001
>
> TCP window size: 1.00 MByte (default)
>
> ------------------------------------------------------------
>
> [ 5] local 172.20.20.169 port 32533 connected with 172.20.20.162
> port 5001
>
> [ 4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port
> 36617
>
> [ ID] Interval Transfer Bandwidth
>
> [ 5] 0.0-10.0 sec 15.6 GBytes 13.4 Gbits/sec
>
> [ 4] 0.0-10.0 sec 15.5 GBytes 13.3 Gbits/sec
>
>
> mbuf usage is pretty low.
>
> Server:
>
> $ netstat -m
>
> 545/4075/4620 mbufs in use (current/cache/total)
>
> 535/1819/2354/131072 mbuf clusters in use (current/cache/total/max)
>
> 535/1641 mbuf+clusters out of packet secondary zone in use
> (current/cache)
>
> 0/2034/2034/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max)
>
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
>
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
>
> 1206K/12792K/13999K bytes allocated to network (current/cache/total)
>
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>
> 0/0/0 sfbufs in use (current/peak/max)
>
> 0 requests for sfbufs denied
>
> 0 requests for sfbufs delayed
>
> 0 requests for I/O initiated by sendfile
>
> 0 calls to protocol drain routines
>
>
> Client:
>
> $ netstat -m
>
> 1841/3544/5385 mbufs in use (current/cache/total)
>
> 1172/1198/2370/32768 mbuf clusters in use (current/cache/total/max)
>
> 512/896 mbuf+clusters out of packet secondary zone in use
> (current/cache)
>
> 0/2314/2314/16384 4k (page size) jumbo clusters in use
> (current/cache/total/max)
>
> 0/0/0/8192 9k jumbo clusters in use (current/cache/total/max)
>
> 0/0/0/4096 16k jumbo clusters in use (current/cache/total/max)
>
> 2804K/12538K/15342K bytes allocated to network (current/cache/total)
>
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>
> 0/0/0 sfbufs in use (current/peak/max)
>
> 0 requests for sfbufs denied
>
> 0 requests for sfbufs delayed
>
> 0 requests for I/O initiated by sendfile
>
> 0 calls to protocol drain routines
>
>
>
> Here's 60 seconds of netstat -ss for ip and tcp from the server with
> 64k mount running ozone:
>
> ip:
>
> 4776 total packets received
>
> 4758 packets for this host
>
> 18 packets for unknown/unsupported protocol
>
> 2238 packets sent from this host
>
> tcp:
>
> 2244 packets sent
>
> 1427 data packets (238332 bytes)
>
> 5 data packets (820 bytes) retransmitted
>
> 812 ack-only packets (587 delayed)
>
> 2235 packets received
>
> 1428 acks (for 238368 bytes)
>
> 2007 packets (91952792 bytes) received in-sequence
>
> 225 out-of-order packets (325800 bytes)
>
> 1428 segments updated rtt (of 1426 attempts)
>
> 5 retransmit timeouts
>
> 587 correct data packet header predictions
>
> 225 SACK options (SACK blocks) sent
>
>
> And with 32k mount:
>
> ip:
>
> 24172 total packets received
>
> 24167 packets for this host
>
> 5 packets for unknown/unsupported protocol
>
> 26130 packets sent from this host
>
> tcp:
>
> 26130 packets sent
>
> 23506 data packets (5362120 bytes)
>
> 2624 ack-only packets (454 delayed)
>
> 21671 packets received
>
> 18143 acks (for 5362192 bytes)
>
> 20278 packets (756617316 bytes) received in-sequence
>
> 96 out-of-order packets (145964 bytes)
>
> 18143 segments updated rtt (of 17469 attempts)
>
> 1093 correct ACK header predictions
>
> 3449 correct data packet header predictions
>
> 111 SACK options (SACK blocks) sent
>
>
> So the 32k mount sends about 6x the packet volume. (This is on
> iozone's linear write test.)
>
> One thing I've noticed is that when the 64k connection bogs down, it
> seems to "poison" things for awhile. For example, iperf will start
> doing this afterward:
>
> From the client to the server:
>
> $ iperf -c 172.20.20.162
>
> ------------------------------------------------------------
>
> Client connecting to 172.20.20.162, TCP port 5001
>
> TCP window size: 1.00 MByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 172.20.20.169 port 14337 connected with 172.20.20.162
> port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.1 sec 4.88 MBytes 4.05 Mbits/sec
>
>
> Ouch! That's quite a drop from 13Gbit/sec. Weirdly, iperf to the
> debian node not affected:
>
> From the client to the debian node:
>
> $ iperf -c 172.20.20.166
>
> ------------------------------------------------------------
>
> Client connecting to 172.20.20.166, TCP port 5001
>
> TCP window size: 1.00 MByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 172.20.20.169 port 24376 connected with 172.20.20.166
> port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 20.4 GBytes 17.5 Gbits/sec
>
>
> From the debian node to the server:
>
> $ iperf -c 172.20.20.162
>
> ------------------------------------------------------------
>
> Client connecting to 172.20.20.162, TCP port 5001
>
> TCP window size: 23.5 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 172.20.20.166 port 43166 connected with 172.20.20.162
> port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 12.9 GBytes 11.1 Gbits/sec
>
>
> But if I let it run for longer, it will apprently figure things out
> and creep back up to normal speed and stay there until NFS strikes
> again. It's like the kernel is caching some sort of hint that
> connectivity to that other host sucks, and it has to either expire or
> be slowly overcome.
>
> Client:
>
> $ iperf -c 172.20.20.162 -t 60
>
> ------------------------------------------------------------
>
> Client connecting to 172.20.20.162, TCP port 5001
>
> TCP window size: 1.00 MByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 172.20.20.169 port 59367 connected with 172.20.20.162
> port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-60.0 sec 56.2 GBytes 8.04 Gbits/sec
>
>
> Server:
>
> $ netstat -I vtnet1 -ihw 1
>
> input (vtnet1) output
>
> packets errs idrops bytes packets errs bytes colls
>
> 7 0 0 420 0 0 0 0
>
> 7 0 0 420 0 0 0 0
>
> 8 0 0 480 0 0 0 0
>
> 8 0 0 480 0 0 0 0
>
> 7 0 0 420 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 11 0 0 12k 3 0 206 0
> <--- starts here
>
> 17 0 0 227k 10 0 660 0
>
> 17 0 0 408k 10 0 660 0
>
> 17 0 0 417k 10 0 660 0
>
> 17 0 0 425k 10 0 660 0
>
> 17 0 0 438k 10 0 660 0
>
> 17 0 0 444k 10 0 660 0
>
> 16 0 0 453k 10 0 660 0
>
> input (vtnet1) output
>
> packets errs idrops bytes packets errs bytes colls
>
> 16 0 0 463k 10 0 660 0
>
> 16 0 0 469k 10 0 660 0
>
> 16 0 0 482k 10 0 660 0
>
> 16 0 0 487k 10 0 660 0
>
> 16 0 0 496k 10 0 660 0
>
> 16 0 0 504k 10 0 660 0
>
> 18 0 0 510k 10 0 660 0
>
> 16 0 0 521k 10 0 660 0
>
> 17 0 0 524k 10 0 660 0
>
> 17 0 0 538k 10 0 660 0
>
> 17 0 0 540k 10 0 660 0
>
> 17 0 0 552k 10 0 660 0
>
> 17 0 0 554k 10 0 660 0
>
> 17 0 0 567k 10 0 660 0
>
> 16 0 0 568k 10 0 660 0
>
> 16 0 0 581k 10 0 660 0
>
> 16 0 0 582k 10 0 660 0
>
> 16 0 0 595k 10 0 660 0
>
> 16 0 0 595k 10 0 660 0
>
> 16 0 0 609k 10 0 660 0
>
> 16 0 0 609k 10 0 660 0
>
> input (vtnet1) output
>
> packets errs idrops bytes packets errs bytes colls
>
> 16 0 0 620k 10 0 660 0
>
> 16 0 0 623k 10 0 660 0
>
> 17 0 0 632k 10 0 660 0
>
> 17 0 0 637k 10 0 660 0
>
> 8.7k 0 0 389M 4.4k 0 288k 0
>
> 42k 0 0 2.1G 21k 0 1.4M 0
>
> 41k 0 0 2.1G 20k 0 1.4M 0
>
> 38k 0 0 1.9G 19k 0 1.2M 0
>
> 40k 0 0 2.0G 20k 0 1.3M 0
>
> 40k 0 0 2.0G 20k 0 1.3M 0
>
> 40k 0 0 2G 20k 0 1.3M 0
>
> 39k 0 0 2G 20k 0 1.3M 0
>
> 43k 0 0 2.2G 22k 0 1.4M 0
>
> 42k 0 0 2.2G 21k 0 1.4M 0
>
> 39k 0 0 2G 19k 0 1.3M 0
>
> 38k 0 0 1.9G 19k 0 1.2M 0
>
> 42k 0 0 2.1G 21k 0 1.4M 0
>
> 44k 0 0 2.2G 22k 0 1.4M 0
>
> 41k 0 0 2.1G 20k 0 1.3M 0
>
> 41k 0 0 2.1G 21k 0 1.4M 0
>
> 40k 0 0 2.0G 20k 0 1.3M 0
>
> input (vtnet1) output
>
> packets errs idrops bytes packets errs bytes colls
>
> 43k 0 0 2.2G 22k 0 1.4M 0
>
> 41k 0 0 2.1G 20k 0 1.3M 0
>
> 40k 0 0 2.0G 20k 0 1.3M 0
>
> 42k 0 0 2.2G 21k 0 1.4M 0
>
> 39k 0 0 2G 19k 0 1.3M 0
>
> 42k 0 0 2.1G 21k 0 1.4M 0
>
> 40k 0 0 2.0G 20k 0 1.3M 0
>
> 42k 0 0 2.1G 21k 0 1.4M 0
>
> 38k 0 0 2G 19k 0 1.3M 0
>
> 39k 0 0 2G 20k 0 1.3M 0
>
> 45k 0 0 2.3G 23k 0 1.5M 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
>
> It almost looks like something is limiting it to 10 packets per
> second. So confusing! TCP super slow start?
>
> Thanks!
>
> (Sorry Rick, forgot to reply all so you got an extra! :( )
>
> Also, here's the netstat from the client side showing the 10 packets
> per second limit and eventual recovery:
>
> $ netstat -I net1 -ihw 1
>
> input (net1) output
>
> packets errs idrops bytes packets errs bytes colls
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 15 0 0 962 11 0 114k 0
>
> 17 0 0 1.1k 10 0 368k 0
>
> 17 0 0 1.1k 10 0 411k 0
>
> 17 0 0 1.1k 10 0 425k 0
>
> 17 0 0 1.1k 10 0 432k 0
>
> 17 0 0 1.1k 10 0 439k 0
>
> 17 0 0 1.1k 10 0 452k 0
>
> 16 0 0 1k 10 0 457k 0
>
> 16 0 0 1k 10 0 467k 0
>
> 16 0 0 1k 10 0 477k 0
>
> 16 0 0 1k 10 0 481k 0
>
> 16 0 0 1k 10 0 495k 0
>
> 16 0 0 1k 10 0 498k 0
>
> 16 0 0 1k 10 0 510k 0
>
> 16 0 0 1k 10 0 515k 0
>
> 16 0 0 1k 10 0 524k 0
>
> 17 0 0 1.1k 10 0 532k 0
>
> input (net1) output
>
> packets errs idrops bytes packets errs bytes colls
>
> 17 0 0 1.1k 10 0 538k 0
>
> 17 0 0 1.1k 10 0 548k 0
>
> 17 0 0 1.1k 10 0 552k 0
>
> 17 0 0 1.1k 10 0 562k 0
>
> 17 0 0 1.1k 10 0 566k 0
>
> 16 0 0 1k 10 0 576k 0
>
> 16 0 0 1k 10 0 580k 0
>
> 16 0 0 1k 10 0 590k 0
>
> 17 0 0 1.1k 10 0 594k 0
>
> 16 0 0 1k 10 0 603k 0
>
> 16 0 0 1k 10 0 609k 0
>
> 16 0 0 1k 10 0 614k 0
>
> 16 0 0 1k 10 0 623k 0
>
> 16 0 0 1k 10 0 626k 0
>
> 17 0 0 1.1k 10 0 637k 0
>
> 18 0 0 1.1k 10 0 637k 0
>
> 17k 0 0 1.1M 34k 0 1.7G 0
>
> 21k 0 0 1.4M 42k 0 2.1G 0
>
> 20k 0 0 1.3M 39k 0 2G 0
>
> 19k 0 0 1.2M 38k 0 1.9G 0
>
> 20k 0 0 1.3M 41k 0 2.0G 0
>
> input (net1) output
>
> packets errs idrops bytes packets errs bytes colls
>
> 20k 0 0 1.3M 40k 0 2.0G 0
>
> 19k 0 0 1.2M 38k 0 1.9G 0
>
> 22k 0 0 1.5M 45k 0 2.3G 0
>
> 20k 0 0 1.3M 40k 0 2.1G 0
>
> 20k 0 0 1.3M 40k 0 2.1G 0
>
> 18k 0 0 1.2M 36k 0 1.9G 0
>
> 21k 0 0 1.4M 41k 0 2.1G 0
>
> 22k 0 0 1.4M 44k 0 2.2G 0
>
> 21k 0 0 1.4M 43k 0 2.2G 0
>
> 20k 0 0 1.3M 41k 0 2.1G 0
>
> 20k 0 0 1.3M 40k 0 2.0G 0
>
> 21k 0 0 1.4M 43k 0 2.2G 0
>
> 21k 0 0 1.4M 43k 0 2.2G 0
>
> 20k 0 0 1.3M 40k 0 2.0G 0
>
> 21k 0 0 1.4M 43k 0 2.2G 0
>
> 19k 0 0 1.2M 38k 0 1.9G 0
>
> 21k 0 0 1.4M 42k 0 2.1G 0
>
> 20k 0 0 1.3M 40k 0 2.0G 0
>
> 21k 0 0 1.4M 42k 0 2.1G 0
>
> 20k 0 0 1.3M 40k 0 2.0G 0
>
> 20k 0 0 1.3M 40k 0 2.0G 0
>
> input (net1) output
>
> packets errs idrops bytes packets errs bytes colls
>
> 24k 0 0 1.6M 48k 0 2.5G 0
>
> 6.3k 0 0 417k 12k 0 647M 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
>
> 6 0 0 360 0 0 0 0
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
>
More information about the freebsd-net
mailing list