[PATCH] Possible fix to recent data corruption on HEAD since
USB2
Robert Noland
rnoland at FreeBSD.org
Fri Apr 17 19:51:08 UTC 2009
On Fri, 2009-04-17 at 06:36 -0400, Damian Gerow wrote:
> Scott Long wrote:
> : John Baldwin wrote:
> : > On Thursday 16 April 2009 2:47:38 pm Alexey Shuvaev wrote:
> : >> On Thu, Apr 16, 2009 at 01:36:18PM -0400, John Baldwin wrote:
> : >>> Due to some good sleuthing by avg@,
> : >>> there is a patch that might fix the recent
> : >>> reports of data corruption on current. It would explain some of the recent
> : >>> reports where a file that was read would have missing gaps of bytes. The
> : >>> problem is with the BUS_DMA_KEEP_PG_OFFSET changes to bus_dma. When a bounce
> : >>> page was used by USB2, the changes to bus_dma would actually change the
> : >>> starting virtual and physical addresses of the bounce page. When the bounce
> : >>> page was no longer needed it was left in this bogus state. Later if another
> : >>> device used the same bounce page for DMA it would use the wrong offset and
> : >>> address. The issue there is if the second device was doing a full page of
> : >>> I/O. In that case the DMA from the device would actually spill over into the
> : >>> next page which could in theory be used by another DMA request. It could
> : >>> also break alignment assumptions (since the previous PG_OFFSET may not be
> : >>> aligned and the bus_dma code assumes bounce pages for the !PG_OFFSET case are
> : >>> page aligned). The quick fix is to always restore the bounce page to the
> : >>> normal state when a PG_OFFSET DMA request is finished. I'd actually prefer
> : >>> not ever touching the page's starting addresses, but those changes would be
> : >>> more invasive I believe.
> : >>>
> : >>> http://www.FreeBSD.org/~jhb/patches/dma_sg.patch
> : >>>
> : >> Am I right that hardware prerequisite in order to observe these problems
> : >> is amd64 + 4Gb or more of RAM?
> : >
> : > Well, i386 with PAE would do it as well. Basically, you need USB + one other
> : > device that use bounce pages and the other device ends up with corruption.
> : >
> : >> Is it possible to fabricate some (artificial) test case to stress this
> : >> particular situation (interleaved use of bounce pages by USB and some other
> : >> device (?HDD?))?
> : >
> : > I haven't constructed one though it might be possible to do so.
> : >
> : >> Asking because as I understand the data corruption is silent
> : >> and affected consumer (of bounce pages) should have some mechanism
> : >> of detecting this (e.g. zfs' CRCs).
> : >> In my case stess testing unpatched system till UFS filesystems are dead
> : >> is no fun...
> : >
> : > Understood. I know some other folks are going to test this and if there is
> : > early success that may make the risk easier to take.
> : >
> :
> : I have pretty high confidence that John and Andriy found the problem and
> : fixed it with this patch. It'll be good to get it tested, but I think
> : that the risk to tester will be pretty low.
>
> Having been running the patch for sixteen hours now, I can safely say that
> it fixes my issues.
I think that I agree... I crashed my amd64 box a few times last night
and haven't had massive damage, which is refreshing... I haven't been
brave enough to panic with more than usb keyboard though...
robert.
> - Damian
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
--
Robert Noland <rnoland at FreeBSD.org>
FreeBSD
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20090417/01f50c2b/attachment.pgp
More information about the freebsd-current
mailing list