Increase mps sequential read performance with ZFS/zvol
John
jwd at FreeBSD.org
Tue Feb 5 02:36:09 UTC 2013
Hi Folks,
I'm in the process of putting together another ZFS server and
after running some sequential read performance tests I'm thinking
things could be better. It's running 9.1-stable from late January:
FreeBSD vprzfs30p.unx.sas.com 9.1-STABLE FreeBSD 9.1-STABLE #1 r246079M
I have two HP D2700 shelves populated with 600GB drives connected
to a pair of LSI 9207-8e HBA cards installed in a Del R620 with 128GB
of ram, the OS installed an internal raid volume. The shelves are dual
channel, each LSI card with a channel through both shelves.
Gmultipath is used to bind the disks such that each disk can be
addressed by either controller and the I/O balanced.
The zfs pool consists of 24 mirrors, each pair one from each shelf.
The multipaths are rotated such that I/O is balanced between shelves
and controllers.
For testing, two 300GB zvols are created, each almost full:
NAME USED AVAIL REFER MOUNTPOINT
pool0 1.46T 11.4T 31K /pool0
pool0/lun000004 301G 11.4T 261G -
pool0/lun000005 301G 11.4T 300G -
Running a simple dd test:
# dd if=/dev/zvol/pool0/lun000005 of=/dev/null bs=512k
614400+0 records in
614400+0 records out
322122547200 bytes transferred in 278.554656 secs (1156406975 bytes/sec)
The drives are spread and balanced across four 6Gb/s channels, 1.1GB/s
seems a bit slow. Note, changing the bs= options makes no real difference.
Now, if I run 2 'dd' operations against different pools in parallel:
# dd if=/dev/zvol/pool0/lun000005 of=/dev/null bs=512k
614400+0 records in
614400+0 records out
322122547200 bytes transferred in 278.605380 secs (1156196435 bytes/sec)
# dd if=/dev/zvol/pool0/lun000004 of=/dev/null bs=512k
614400+0 records in
614400+0 records out
322122547200 bytes transferred in 282.065008 secs (1142015274 bytes/sec)
This tells me the I/O subsystem has plenty of overhead room available
such that the first 'dd' operation could run faster.
I've included some basic config information below. No kmem values in
/boot/loader.conf. I did play around with block_cap but it made no
difference. It seems like something is holding the system back.
Thanks for any ideas.
-John
Output from top during a single dd run:
5 root 11 -8 - 0K 208K zvol:i 1 5:11 41.65% zfskern
0 root 350 -8 0 0K 5600K - 5 3:59 15.23% kernel
1784 root 1 26 0 9944K 2072K CPU1 1 0:31 13.87% dd
The zvol:io state appears to be a simple loop wait loop waiting
for outstanding I/O requests to complete. How to get more I/O
requests going?
Sample of the highest number of I/O requests per controller:
dev.mps.0.io_cmds_highwater: 207
dev.mps.1.io_cmds_highwater: 126
IOCFACTS (identical):
mps0: <LSI SAS2308> port 0xec00-0xecff mem 0xdaff0000-0xdaffffff,0xdaf80000-0xdafbffff irq 48 at device 0.0 on pci5
mps0: Doorbell= 0x22000000
mps0: mps_wait_db_ack: successfull count(2), timeout(5)
mps0: Doorbell= 0x12000000
mps0: mps_wait_db_ack: successfull count(1), timeout(5)
mps0: mps_wait_db_ack: successfull count(1), timeout(5)
mps0: mps_wait_db_ack: successfull count(1), timeout(5)
mps0: mps_wait_db_ack: successfull count(1), timeout(5)
mps0: IOCFacts :
MsgVersion: 0x200
HeaderVersion: 0x1b00
IOCNumber: 0
IOCExceptions: 0x0
MaxChainDepth: 128
WhoInit: ROM BIOS
NumberOfPorts: 1
RequestCredit: 10240
ProductID: 0x2214
IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
FWVersion= 15-0-0-0
IOCRequestFrameSize: 32
MaxInitiators: 32
MaxTargets: 1024
MaxSasExpanders: 64
MaxEnclosures: 65
ProtocolFlags: 3<ScsiTarg,ScsiInit>
HighPriorityCredit: 128
MaxReplyDescriptorPostQueueDepth: 65504
ReplyFrameSize: 32
MaxVolumes: 0
MaxDevHandle: 1128
MaxPersistentEntries: 128
mps0: Firmware: 15.00.00.00, Driver: 14.00.00.01-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
And some output from 'gstat -f Z -I 300ms'
dT: 0.302s w: 0.300s filter: Z
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 202 202 25450 2.6 0 0 0.0 25.5| multipath/Z0
1 202 202 25046 6.2 0 0 0.0 36.6| multipath/Z2
7 185 185 23735 6.3 0 0 0.0 33.1| multipath/Z4
0 212 212 27125 5.4 0 0 0.0 30.4| multipath/Z6
0 169 169 21616 5.0 0 0 0.0 28.1| multipath/Z8
0 162 162 20768 5.0 0 0 0.0 25.7| multipath/Z10
0 175 175 22463 6.0 0 0 0.0 30.4| multipath/Z12
0 192 192 24582 4.4 0 0 0.0 32.1| multipath/Z14
2 169 169 21616 3.3 0 0 0.0 18.8| multipath/Z16
4 169 169 20808 4.1 0 0 0.0 23.0| multipath/Z18
2 195 195 24602 4.5 0 0 0.0 28.5| multipath/Z20
5 172 172 22039 4.4 0 0 0.0 22.7| multipath/Z22
0 166 166 21192 3.7 0 0 0.0 20.2| multipath/Z24
7 179 179 22887 5.4 0 0 0.0 27.8| multipath/Z26
7 172 172 22039 3.5 0 0 0.0 23.1| multipath/Z28
0 192 192 24582 3.8 0 0 0.0 25.5| multipath/Z30
1 175 175 22463 6.0 0 0 0.0 30.5| multipath/Z32
1 182 182 22907 3.9 0 0 0.0 25.6| multipath/Z34
0 212 212 27125 6.3 0 0 0.0 32.7| multipath/Z36
0 179 179 22483 4.8 0 0 0.0 27.5| multipath/Z38
2 185 185 23735 4.6 0 0 0.0 30.0| multipath/Z40
0 179 179 22887 4.5 0 0 0.0 28.2| multipath/Z42
3 195 195 25006 4.4 0 0 0.0 32.3| multipath/Z44
3 192 192 24582 4.0 0 0 0.0 30.5| multipath/Z46
0 0 0 0 0.0 0 0 0.0 0.0| multipath/Z48
0 179 179 22887 4.7 0 0 0.0 31.0| multipath/Z1
0 185 185 23331 4.1 0 0 0.0 24.8| multipath/Z3
0 175 175 21639 5.3 0 0 0.0 28.2| multipath/Z5
4 162 162 20768 5.1 0 0 0.0 26.6| multipath/Z7
0 195 195 25006 3.5 0 0 0.0 23.4| multipath/Z9
3 179 179 22887 5.0 0 0 0.0 25.7| multipath/Z11
4 159 159 20344 4.9 0 0 0.0 23.7| multipath/Z13
4 166 166 21192 4.3 0 0 0.0 25.1| multipath/Z15
0 169 169 21616 3.9 0 0 0.0 24.7| multipath/Z17
7 189 189 23334 4.2 0 0 0.0 25.7| multipath/Z19
4 169 169 21212 4.3 0 0 0.0 28.1| multipath/Z21
0 159 159 20344 5.3 0 0 0.0 25.8| multipath/Z23
5 185 185 23316 4.1 0 0 0.0 26.0| multipath/Z25
0 192 192 24582 4.9 0 0 0.0 30.6| multipath/Z27
0 172 172 22039 5.5 0 0 0.0 27.4| multipath/Z29
4 166 166 21192 4.2 0 0 0.0 23.7| multipath/Z31
0 169 169 20778 3.5 0 0 0.0 22.2| multipath/Z33
2 172 172 21232 5.1 0 0 0.0 29.4| multipath/Z35
3 169 169 21616 2.9 0 0 0.0 20.1| multipath/Z37
0 179 179 22887 5.2 0 0 0.0 32.0| multipath/Z39
0 212 212 26721 5.4 0 0 0.0 31.7| multipath/Z41
2 175 175 22463 4.4 0 0 0.0 28.0| multipath/Z43
0 179 179 22887 3.6 0 0 0.0 18.2| multipath/Z45
0 179 179 22887 4.3 0 0 0.0 28.3| multipath/Z47
0 0 0 0 0.0 0 0 0.0 0.0| multipath/Z49
Each individual disk on the system shows the capability of 255 tags:
# camcontrol tags da0 -v
(pass2:mps0:0:10:0): dev_openings 255
(pass2:mps0:0:10:0): dev_active 0
(pass2:mps0:0:10:0): devq_openings 255
(pass2:mps0:0:10:0): devq_queued 0
(pass2:mps0:0:10:0): held 0
(pass2:mps0:0:10:0): mintags 2
(pass2:mps0:0:10:0): maxtags 255
zpool:
# zpool status
pool: pool0
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
pool0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
multipath/Z0 ONLINE 0 0 0
multipath/Z1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
multipath/Z2 ONLINE 0 0 0
multipath/Z3 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
multipath/Z4 ONLINE 0 0 0
multipath/Z5 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
multipath/Z6 ONLINE 0 0 0
multipath/Z7 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
multipath/Z8 ONLINE 0 0 0
multipath/Z9 ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
multipath/Z10 ONLINE 0 0 0
multipath/Z11 ONLINE 0 0 0
...
mirror-21 ONLINE 0 0 0
multipath/Z42 ONLINE 0 0 0
multipath/Z43 ONLINE 0 0 0
mirror-22 ONLINE 0 0 0
multipath/Z44 ONLINE 0 0 0
multipath/Z45 ONLINE 0 0 0
mirror-23 ONLINE 0 0 0
multipath/Z46 ONLINE 0 0 0
multipath/Z47 ONLINE 0 0 0
spares
multipath/Z48 AVAIL
multipath/Z49 AVAIL
errors: No known data errors
More information about the freebsd-scsi
mailing list