slow writes on nfs with bge devices
Bruce Evans
bde at zeta.org.au
Sun Jan 21 06:25:24 UTC 2007
nfs writes much less well with bge NICs than with other NICs (sk, fxp, xl,
even rl). Sometimes writing a 20K source file from vi seems to take about
2 seconds instead of seeming to be instantaneous (this gets faster as the
system warms up). Iozone shows the problem more reproducibly. E.g.:
100Mbps fxp server -> 1Gbps bge 5701 client, udp:
%%%
IOZONE: Performance Test of Sequential File I/O -- V1.16 (10/28/92)
By Bill Norcott
Operating System: FreeBSD -- using fsync()
IOZONE: auto-test mode
MB reclen bytes/sec written bytes/sec read
1 512 1516885 291918639
1 1024 1158783 491354263
1 2048 1573651 715694105
1 4096 1223692 917431957
1 8192 729513 1097929467
2 512 1694809 281196631
2 1024 1379228 507917189
2 2048 1659521 789608264
2 4096 4606056 1064567574
2 8192 1142288 1318131028
4 512 1242214 298269971
4 1024 1853545 492110628
4 2048 2120136 742888430
4 4096 1896792 1121799065
4 8192 850210 1441812403
8 512 1563847 281422325
8 1024 1480844 492749552
8 2048 1658649 850165954
8 4096 2105283 1211348180
8 8192 2098425 1554875506
16 512 1508821 296842294
16 1024 1966239 527850530
16 2048 2036609 842656736
16 4096 1666138 1200594889
16 8192 2293378 1620824908
Completed series of tests
%%%
Here bge barely reaches 10Mbps speeds (~1.2 MB/S) for writing. Reading
is cached well and fast. 100Mbps xl on the same client with the same
server goes at full 100Mbps speed (11.77 MB/S for all file sizes
including larger ones since the disk is not the limit at 100Mbps).
1Gbps sk on a different client with the same server goes at full 100Nbps
speed.
Switching to tcp gives full 100 Mbps speed. However, when the bge link
speed is reduced to 100Mbps, udp becomes about 10 times slower than the
above and tcp becomes about as slow as the above (maybe a bit faster, but
far below 11.77 MB/S).
bge is also slow at nfs serving:
1Gbps bge 5701 server -> 1Gbps sk client:
%%%
IOZONE: Performance Test of Sequential File I/O -- V1.16 (10/28/92)
By Bill Norcott
Operating System: FreeBSD -- using fsync()
IOZONE: auto-test mode
MB reclen bytes/sec written bytes/sec read
1 512 36255350 242114472
1 1024 3051699 413319147
1 2048 22406458 632021710
1 4096 22447700 851162198
1 8192 3522493 1047562648
2 512 3270779 48125247
2 1024 28992179 46693718
2 2048 5956380 753318255
2 4096 27616650 1053311658
2 8192 5573338 48290208
4 512 9004770 47435659
4 1024 9576276 45601645
4 2048 30348874 85116667
4 4096 8635673 86150049
4 8192 9356773 47100031
8 512 9762446 46424146
8 1024 10054027 58344604
8 2048 9197430 60253061
8 4096 15934077 59476759
8 8192 8765470 47647937
16 512 5670225 46239891
16 1024 9425169 45950990
16 2048 9833515 46242945
16 4096 14812057 51313693
16 8192 9203742 47648722
Completed series of tests
%%%
Now the available bandwidth is 10 times larger and about 9/10 of it is
still not used, with a high variance. For larger files, the variance is
lower and the average speed is about 10MB/S. The disk can only do about
40MB/S and the slowest of the 1Gbps NICS (sk) can only sustain 80MB/S
through udp and about 50MB/S through tcp (it is limited by the 33 MHz
32-bit PCI bus and by being less smart than the bge interface). When the
bge NIC was on the system which is now the server with the fxp NIC, bge
and nfs worked unsurprisingly, just slower than I would have liked. The
write speed was 20-30MB/S for large files and 30-40MB/S for medium-sized
files, with low variance. This is the only configuration in which nfs/bge
worked as expected.
The problem is very old and not very hardware dependent. Similar behaviour
happens when some of the following are changed:
OS -> FreeBSD-~5.2 or FreeBSD-6
hardware -> newer amd64 CPU (Turion X2) with 5705 (iozone output for this
below) instead of old amd64 CPU with 5701. The newer amd64
normally runs an i386-SMP current kernel while the old amd64
was running an amd64-UP current kernel in the above tests,
but normally runs ~5.2 amd64-UP and behaves similarly with that.
The combination that seemed to work right was an AthlonXP
for the server with the same 5701 and any kernel. The only
strangeness with that was that current kernels gave a 5-10%
slower nfs server despite giving a 30-90% larger packet rate
for small packets.
IOZONE: Performance Test of Sequential File I/O -- V1.16 (10/28/92)
By Bill Norcott
Operating System: FreeBSD -- using fsync()
100Mbps fxp server -> 1Gbps bge 5705 client:
%%%
IOZONE: auto-test mode
MB reclen bytes/sec written bytes/sec read
1 512 2994400 185462027
1 1024 3074084 337817536
1 2048 2991691 576792985
1 4096 3074759 884740798
1 8192 3078019 1176892296
2 512 4262096 186709962
2 1024 2994468 339893080
2 2048 5112176 584846610
2 4096 4754187 909815165
2 8192 5100574 1212919611
4 512 5298715 187129017
4 1024 5302620 344445041
4 2048 4985597 590579630
4 4096 3703618 927711124
4 8192 5236177 1240896243
8 512 5142274 186899396
8 1024 6207933 345564808
8 2048 6162773 593088329
8 4096 6031445 936751120
8 8192 6072523 1224102288
16 512 5427113 186797193
16 1024 5065901 345544445
16 2048 5462338 595487384
16 4096 5256552 937013065
16 8192 5097101 1226320870
Completed series of tests
%%%
rl on a system with 1/20 as much CPU is faster than this.
The problem doesn't seem to affect much besides writes on nfs. The
bge 5701 works very well for most things. It has a much better bus
interface than the 5705 and works even better after moving it to the
old amd64 system (it can now saturate 1Gbps where on the AthlonXP it
only got 3/4 of the way, while the 5705 only gets 1/4 of the way).
I've been working on minimising network latency and maximising packet
rate, and normally have very low network latency (60-80 uS for ping)
and fairly high packet rates. The changes for this are not the caause
of the bug :-), since the behaviour is not affected by running kernels
without these changes or by sysctl''ing the changes to be null. However,
the problem looks like ones caused by large latencies combined with
non-streaming protocols. To write at just 11.77 MB/S, at least 8000
packets/second must be set from the client to the server. Working
clients sustain this rate, but broken clients the rate is much lower
and not sustained:
Output from netstat -s 1 on server while writing a ~1GB file via 5701/udp:
%%%
input (Total) output
packets errs bytes packets errs bytes colls
900 0 1513334 142 0 33532 0
1509 0 2564836 236 0 57368 0
1647 0 2295802 259 0 51106 0
1603 0 1502736 252 0 32926 0
1055 0 637014 163 0 13938 0
558 0 1542510 86 0 34340 0
984 0 989854 155 0 21816 0
864 0 1320786 135 0 38152 0
883 0 1558060 165 0 34340 0
1177 0 3780102 203 0 85850 0
2087 0 954212 331 0 21210 0
1187 0 1413568 190 0 31310 0
650 0 3320604 101 0 75346 0
1565 0 1706542 246 0 37976 0
2055 0 2360620 329 0 52318 0
1554 0 2416996 244 0 54226 0
1402 0 2579894 220 0 58176 0
1690 0 774488 267 0 16968 0
1323 0 3690650 209 0 83830 0
591 0 4519858 92 0 103110 0
%%%
There is no sign of any packet loss or switch problems. Forcing
1000baseTX full-duplex has no effect. Forcing 100baseTX full-duplex
makes the problem more obvious. The mtu is 1500 throughout since
only bge-5701 and sk support jumbo frames and I want to use udp for
nfs.
5705/udp is better:
%%%
input (Total) output
packets errs bytes packets errs bytes colls
5209 0 6607758 846 0 151702 0
4763 0 6684546 773 0 153520 0
4758 0 6618498 769 0 151298 0
3582 0 7057568 576 0 162498 0
4935 0 5115068 800 0 116756 0
4924 0 6622026 798 0 152802 0
4095 0 6018462 657 0 137450 0
4647 0 5270442 751 0 120594 0
4673 0 5451948 758 0 123624 0
2340 0 6001986 372 0 138168 0
3750 0 6150610 604 0 140996 0
%%%
sk/udp works right:
%%%
input (Total) output
packets errs bytes packets errs bytes colls
8638 0 12384676 1440 0 293062 0
8636 0 12415646 1439 0 293708 0
8637 0 12415646 1441 0 293708 0
8637 0 12415646 1439 0 293708 0
8637 0 12417160 1440 0 293708 0
8636 0 12413162 1439 0 293506 0
8637 0 12414132 1439 0 293708 0
8636 0 12417160 1440 0 293708 0
8637 0 12415646 1439 0 293708 0
8636 0 12417160 1440 0 293708 0
8637 0 12414676 1439 0 293506 0
%%%
sk is under ~5.2 with latency/throughput/efficiency optimizations
that don't have much effect here.
Bruce
More information about the freebsd-net
mailing list