Performance question - istgt with dual 10g data links to linux client

Tue Sep 25 19:33:09 UTC 2012

----- Gary Palmer's Original Message -----
> On Tue, Sep 25, 2012 at 03:09:53AM +0000, John wrote:
> > Hi Folks,
> > 
> >    I have a bsd 9.1 zfs server running the latest istgt connected to a
> > RHEL 6.1 system.  Regardless of how I configure the systems, I cannot seem
> > to exceed 1GB throughput. If I create a 25G /dev/md0 and export it via
> > istgt (no mpio here), format it with default xfs values, place a
> > 20G file on it, I get the following:
> > 
> > dd if=/usr2/20g of=/dev/null bs=512K
> > 40960+0 records in
> > 40960+0 records out
> > 21474836480 bytes (21 GB) copied, 21.4256 s, 1.0 GB/s
> > 
> >    Running the above /dev/md0 with mpio, dual paths on 10G cards, with
> > rr_minio set anywhere from 1 to 100 on the linux side:
> > 
> > [PortalGroup2]
> >    Comment "Two networks - one port"
> >    Portal DA1 10.59.6.14:5020          # 10G mtu 9000
> >    Portal DA2 10.60.6.14:5020          # 10G mtu 9000
> >    Comment "END: PortalGroup2"
> > 
> > mpatha (33000000051ed39a4) dm-0 FreeBSD,USE136EXHF_iSCSI
> > size=25G features='0' hwhandler='0' wp=rw
> > `-+- policy='round-robin 0' prio=1 status=active
> >   |- 11:0:0:0 sdd 8:48 active ready  running
> >   `- 12:0:0:0 sde 8:64 active ready  running
> > 
> > 
> > dd if=/usr2/20g of=/dev/null bs=1M
> > 20480+0 records in
> > 20480+0 records out
> > 21474836480 bytes (21 GB) copied, 20.0076 s, 1.1 GB/s
> > 
> >    I can see the traffic evenly across both interfaces. I simply
> > can't seem to get the parallelization factor up. Higher levels
> > of mpio have no effect.
> > 
> >    I realize I haven't included the entire configuration. I'm hoping
> > someone can give some high-level thoughts. I do need to maximize
> > single process large file i/o..
> 
> Have you tried doing two dd's at the same time when you have mpio running?
> I'd be curious to know if you get double the throughput or if each dd
> gets 0.5GB/s
> 
Hi Gary,

   Not what I expected... 1/2 the bandwith to each. Had to increase
the md0 size to place two different 10g files on the server otherwise
the local system cache handled the request for the trailing dd.

   Below are three test runs. First linear, 2nd two in paralell.

/tmp/dd.log.p1.1
40960+0 records in
40960+0 records out
21474836480 bytes (21 GB) copied, 20.4676 s, 1.0 GB/s
p1p1      Link encap:Ethernet  HWaddr 00:07:43:09:16:AE  
          inet addr:10.60.25.1  Bcast:10.255.255.255  Mask:255.255.0.0
          inet6 addr: fe80::207:43ff:fe09:16ae/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:1527823 errors:0 dropped:0 overruns:0 frame:0
          TX packets:771887 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:2000 
          RX bytes:10847808152 (10.1 GiB)  TX bytes:57137678 (54.4 MiB)
          Interrupt:38 Memory:df1fe000-df1fefff 

p2p1      Link encap:Ethernet  HWaddr 00:07:43:08:10:66  
          inet addr:10.59.25.1  Bcast:10.255.255.255  Mask:255.255.0.0
          inet6 addr: fe80::207:43ff:fe08:1066/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:1531304 errors:0 dropped:0 overruns:0 frame:0
          TX packets:773255 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:2000 
          RX bytes:10848499338 (10.1 GiB)  TX bytes:57238014 (54.5 MiB)
          Interrupt:40 Memory:df2fe000-df2fefff 

   Note, the two 10 (Chelsio) cards show equal traffic.

   Now the two runs in parallel:

/tmp/dd.log.p2.1
40960+0 records in
40960+0 records out
21474836480 bytes (21 GB) copied, 38.0411 s, 565 MB/s
/tmp/dd.log.p2.2
40960+0 records in
40960+0 records out
21474836480 bytes (21 GB) copied, 37.9768 s, 565 MB/s
p1p1      Link encap:Ethernet  HWaddr 00:07:43:09:16:AE  
          inet addr:10.60.25.1  Bcast:10.255.255.255  Mask:255.255.0.0
          inet6 addr: fe80::207:43ff:fe09:16ae/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:4592000 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2319975 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:2000 
          RX bytes:32541584154 (30.3 GiB)  TX bytes:168531554 (160.7 MiB)
          Interrupt:38 Memory:df1fe000-df1fefff 

p2p1      Link encap:Ethernet  HWaddr 00:07:43:08:10:66  
          inet addr:10.59.25.1  Bcast:10.255.255.255  Mask:255.255.0.0
          inet6 addr: fe80::207:43ff:fe08:1066/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:4596909 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2320666 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:2000 
          RX bytes:32542835932 (30.3 GiB)  TX bytes:168585152 (160.7 MiB)
          Interrupt:40 Memory:df2fe000-df2fefff 

   Ok. Now create 2 md devices, md0 & md1. Place a 20G file on each
device and export them via iscsi.

# mdconfig -l -v
md0	malloc	   23G
md1	malloc	   23G

 They show up on the linux system as two separate multipath devices:

mpathb (3300000008aa9becd) dm-1 FreeBSD,USE136EXHF_iSCSI
size=23G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- 17:0:0:0 sde 8:64 active undef  running
  `- 18:0:0:0 sdg 8:96 active undef  running
mpatha (33000000051ed39a4) dm-0 FreeBSD,USE136EXHF_iSCSI
size=23G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- 15:0:0:0 sdd 8:48 active undef  running
  `- 16:0:0:0 sdf 8:80 active undef  running

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/mpatha     23G   21G  3.0G  88% /usr2
/dev/mapper/mpathb     23G   21G  3.0G  88% /usr3

   Run a dd to each target reading out the 20G file.

dd if=/usr2/20g of=/dev/null bs=512K 
40960+0 records in
40960+0 records out
21474836480 bytes (21 GB) copied, 25.0937 s, 856 MB/s

dd if=/usr3/20g of=/dev/null bs=512K
40960+0 records in
40960+0 records out
21474836480 bytes (21 GB) copied, 24.8599 s, 864 MB/s

   More along the lines of what I expected. About 1.7GB
per second total coming from the server.

   Which leaves me with the question: Is istgt or multipath
the bottleneck?

   Thoughts?

Thanks!
John