mps driver chain_alloc_fail / performance ?

Wed Jan 18 04:30:11 UTC 2012

> -----Original Message-----
> From: John [mailto:jwd at freebsd.org]
> Sent: Tuesday, January 17, 2012 7:32 AM
> To: Desai, Kashyap; Kenneth D. Merry
> Cc: freebsd-scsi at freebsd.org
> Subject: Re: mps driver chain_alloc_fail / performance ?
> 
> ----- Desai, Kashyap's Original Message -----
> > Which driver version is this ? In our 09.00.00.00 Driver (which is in
> pipeline to be committed) has 2048 chain buffer counter.
> 
>    I'm not sure how to answer your question directly. We're using the
> driver
> that comes with FreeBSD. Not a driver directly from LSI. If we can get a
> copy
> of your 9.0 driver we can try testing against it.

If you type "sysctl -a |grep mps" you can see driver version..

> 
> > And our Test team has verified it with almost 150+ Drives.
> 
>    Currently, we have 8 shelves, 25 drives per shelf, dual attached
> configured with geom multipath using Active/Active. Ignoring SSDs and
> OS disks on the internal card, we see 400 da devices on mps1 & mps2.
> For the record, the shelves are:
> 
> ses0 at mps1 bus 0 scbus7 target 0 lun 0
> ses0: <HP D2700 SAS AJ941A 0131> Fixed Enclosure Services SCSI-5 device
> ses0: 600.000MB/s transfers
> ses0: Command Queueing enabled
> ses0: SCSI-3 SES Device
> 
> 
> > As suggested by Ken, Can you try increasing MPS_CHAIN_FRAMES  to 4096
> OR 2048
> 
>    Absolutely. The current value is 2048. We are currently running with
> this patch to increase the value and output a singular alerting message:
> 
> --- sys/dev/mps/mpsvar.h.orig	2012-01-15 19:28:51.000000000 -0500
> +++ sys/dev/mps/mpsvar.h	2012-01-15 20:14:07.000000000 -0500
> @@ -34,7 +34,7 @@
>  #define MPS_REQ_FRAMES		1024
>  #define MPS_EVT_REPLY_FRAMES	32
>  #define MPS_REPLY_FRAMES	MPS_REQ_FRAMES
> -#define MPS_CHAIN_FRAMES	2048
> +#define MPS_CHAIN_FRAMES	4096
>  #define MPS_SENSE_LEN		SSD_FULL_SIZE
>  #define MPS_MSI_COUNT		1
>  #define MPS_SGE64_SIZE		12
> @@ -242,8 +242,11 @@
>  		sc->chain_free--;
>  		if (sc->chain_free < sc->chain_free_lowwater)
>  			sc->chain_free_lowwater = sc->chain_free;
> -	} else
> +	} else {
>  		sc->chain_alloc_fail++;
> +		if (sc->chain_alloc_fail == 1)
> +			device_printf(sc->mps_dev,"Insufficient chain_list
> buffers.");
> +	}
>  	return (chain);
>  }
> 
> 
>    If the logic for outputting the message is appropriate I think
> it would be nice to get it committed.

If this works for you and you really want to commit, I would suggest to have module parameter to pass chain_max value.
Basically, current implementation is not the correct way to handle out of chain scenario.

Driver should calculate max chain required per HBA at run time from IOC fact reply from FW. And it should try to allocate those many
Chain buffer run time (instead of having #define for chain max ). 

If Driver does not find those memory from system @ run time, we should fail to detect HBA at load time.

>From our Linux Driver logs, I find out we need 29700 chain buffer required per HBA(SAS2008 PCI-Express).
So better to increase MPS_CHAIN_FRAMES to (24 * 1024), until we have more robust support in driver.

Hope this helps you.

~ Kashyap

> 
> > ~ Kashyap
> >
> > > Kenneth D. Merry said:
> > >
> > > The firmware on those boards is a little old.  You might consider
> > > upgrading.
> 
>    We updated the the FW this morning and we're now showing:
> 
> mps0: <LSI SAS2116> port 0x5000-0x50ff mem 0xf5ff0000-
> 0xf5ff3fff,0xf5f80000-0xf5fbffff irq 30 at device 0.0 on pci13
> mps0: Firmware: 12.00.00.00
> mps0: IOCCapabilities:
> 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDis
> c>
> mps1: <LSI SAS2116> port 0x7000-0x70ff mem 0xfbef0000-
> 0xfbef3fff,0xfbe80000-0xfbebffff irq 48 at device 0.0 on pci33
> mps1: Firmware: 12.00.00.00
> mps1: IOCCapabilities:
> 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDis
> c>
> mps2: <LSI SAS2116> port 0x6000-0x60ff mem 0xfbcf0000-
> 0xfbcf3fff,0xfbc80000-0xfbcbffff irq 56 at device 0.0 on pci27
> mps2: Firmware: 12.00.00.00
> mps2: IOCCapabilities:
> 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDis
> c>
> 
>    We last updated about around November of last year.
> 
> > > > # camcontrol inquiry da10
> > > > pass21: <HP EG0600FBLSH HPD2> Fixed Direct Access SCSI-5 device
> > > > pass21: Serial Number 6XR14KYV0000B148LDKM
> > > > pass21: 600.000MB/s transfers, Command Queueing Enabled
> > >
> > > That's a lot of drives!  I've only run up to 60 drives.
> 
>    See above. In general, I'm relatively pleased with how the system
> responds with all these drives.
> 
> > > >    When running the system under load, I see the following
> reported:
> > > >
> > > > hw.mps.2.allow_multiple_tm_cmds: 0
> > > > hw.mps.2.io_cmds_active: 0
> > > > hw.mps.2.io_cmds_highwater: 1019
> > > > hw.mps.2.chain_free: 2048
> > > > hw.mps.2.chain_free_lowwater: 0
> > > > hw.mps.2.chain_alloc_fail: 13307     <---- ??
> 
>    The current test case run is showing:
> 
> hw.mps.2.debug_level: 0
> hw.mps.2.allow_multiple_tm_cmds: 0
> hw.mps.2.io_cmds_active: 109
> hw.mps.2.io_cmds_highwater: 1019
> hw.mps.2.chain_free: 4042
> hw.mps.2.chain_free_lowwater: 3597
> hw.mps.2.chain_alloc_fail: 0
> 
>    It may be a few hours before it progresses to the point where it
> ran low last time.
> 
> > > Bump MPS_CHAIN_FRAMES to something larger.  You can try 4096 and see
> > > what happens.
> 
>    Agreed. Let me know if you thing there is anything we should add to
> the patch above.
> 
> > > >    A few layers up, it seems like it would be nice if the buffer
> > > > exhaustion was reported outside of debug being enabled... at least
> > > > maybe the first time.
> > >
> > > It used to report being out of chain frames every time it happened,
> > > which wound up being too much.  You're right, doing it once might be
> good.
> 
> Thanks, that's how I tried to put the patch together.
> 
> > > Once you bump up the number of chain frames to the point where you
> aren't
> > > running out, I doubt the driver will be the big bottleneck.  It'll
> probably
> > > be other things higher up the stack.
> 
> Question. What "should" the layer of code above the mps driver do if the
> driver
> returns ENOBUFS? I'm wondering if it might explain some incorrect
> results.
> 
> > > What sort of ZFS topology did you try?
> > >
> > > I know for raidz2, and perhaps for raidz, ZFS is faster if your
> number
> > > of data disks is a power of 2.
> > >
> > > If you want raidz2 protection, try creating arrays in groups of 10,
> so
> > > you wind up having 8 data disks.
> 
> The fasted we've seen is with a pool made of mirrors, though this uses
> up the most space. It also caused the most alloc fails (and leads to my
> question about ENOBUFS).
> 
> Thank you both for your help. Any comments are always welcome! If I
> haven't
> answered a question, or otherwise said something that doesn't make
> sense, let me know.
> 
> Thanks,
> John