[PATCH] Possible fix to recent data corruption on HEAD since
USB2
John Baldwin
jhb at freebsd.org
Thu Apr 16 20:03:25 UTC 2009
On Thursday 16 April 2009 2:47:38 pm Alexey Shuvaev wrote:
> On Thu, Apr 16, 2009 at 01:36:18PM -0400, John Baldwin wrote:
> > Due to some good sleuthing by avg@,
> > there is a patch that might fix the recent
> > reports of data corruption on current. It would explain some of the recent
> > reports where a file that was read would have missing gaps of bytes. The
> > problem is with the BUS_DMA_KEEP_PG_OFFSET changes to bus_dma. When a bounce
> > page was used by USB2, the changes to bus_dma would actually change the
> > starting virtual and physical addresses of the bounce page. When the bounce
> > page was no longer needed it was left in this bogus state. Later if another
> > device used the same bounce page for DMA it would use the wrong offset and
> > address. The issue there is if the second device was doing a full page of
> > I/O. In that case the DMA from the device would actually spill over into the
> > next page which could in theory be used by another DMA request. It could
> > also break alignment assumptions (since the previous PG_OFFSET may not be
> > aligned and the bus_dma code assumes bounce pages for the !PG_OFFSET case are
> > page aligned). The quick fix is to always restore the bounce page to the
> > normal state when a PG_OFFSET DMA request is finished. I'd actually prefer
> > not ever touching the page's starting addresses, but those changes would be
> > more invasive I believe.
> >
> > http://www.FreeBSD.org/~jhb/patches/dma_sg.patch
> >
> Am I right that hardware prerequisite in order to observe these problems
> is amd64 + 4Gb or more of RAM?
Well, i386 with PAE would do it as well. Basically, you need USB + one other
device that use bounce pages and the other device ends up with corruption.
> Is it possible to fabricate some (artificial) test case to stress this
> particular situation (interleaved use of bounce pages by USB and some other
> device (?HDD?))?
I haven't constructed one though it might be possible to do so.
> Asking because as I understand the data corruption is silent
> and affected consumer (of bounce pages) should have some mechanism
> of detecting this (e.g. zfs' CRCs).
> In my case stess testing unpatched system till UFS filesystems are dead
> is no fun...
Understood. I know some other folks are going to test this and if there is
early success that may make the risk easier to take.
--
John Baldwin
More information about the freebsd-current
mailing list