Serious Dell Sadness - H200, H700, and H800

Neil Schelly nschelly at dyn.com
Fri Mar 25 12:49:36 UTC 2011


> Neil, you mentioned that there may be a performance hit from the extra
> read operation the patch executes. Does that mean for every single
> read
> or write operation, there is an extra read operation? Such that the
> number of I/Os to the disk is multiplied by two? Or is it only an
> extra
> read operation at the end of an interrupt or something (forgive my
> ignorance, I'm not fully versed on how interrupts affect the bus)? If
> the latter, would the performance hit only be like 1-2% in practice?
> If
> the former, would that mean a 50% performance hit?

Scott's off the cuff estimate of the performance hit was 1-5%.  Here's his description of what the patch actually accomplishes.

> What my patch does is to re-flush the bus at the end of the interrupt
> handler and check for any new command completions that have happened
> while the handler was running. This isn't a perfect solution,
> unfortunately. First, it adds cost through extra PCI bus reads needed
> for the flush. Second, and most importantly, it doesn't completely
> close the race; even after the recheck is complete, an
> interrupt+completion could be transmitted from the controller in
> between the driver doing that re-check and then returning to the OS.
> So a race could still exist, albeit a lot smaller than it was when no
> recheck was done. The only real way to close the race is to have
> interrupt latching work properly so that interrupts don't get lost.

Ultimately, it appears that the PCI emulation of the controller firmwares doesn't quite handle the interrupt latching properly, causing lost interrupts.  I suspect most other (other OS) implementations of this driver are using MSI to request PCI Express semantics, and that the firmware has been more thoroughly tested using the edge-triggered interrupts there.  While I wouldn't doubt that this patch could go into the driver code and make it a better driver, it's worth mentioning that the "right" way to fix it may be to switch to using the more robust and better performing PCI Express semantics.

--
Neil Schelly
Director of Uptime
Dynamic Network Services, Inc.
W: 603-296-1581
M: 508-410-4776
http://www.dyndns.com


More information about the freebsd-scsi mailing list