ZFS checksum errors on umass(4) insertion
John Baldwin
jhb at freebsd.org
Thu Apr 16 20:26:23 UTC 2009
On Thursday 16 April 2009 2:36:48 pm Richard Todd wrote:
> Damian Gerow <dgerow at afflictions.org> writes:
> > 1) Reverting the extended attribute locking change (r189967) does not change
> > the situation for me. I still experience checksum issues and data loss.
> > (Unsurprisingly.)
> >
> > 2) Without umass loaded, I have been completely unable to trigger the issue.
> >
> > 3) Once umass is loaded, and the symptoms start cropping up, unloading umass
> > does not make them go away (again, unsurprisingly). What I haven't yet
> > tested, but am currently working towards, is whether removing umass stops
> > further checksum errors from ocurring.
> >
> > 4) r189967 does remove some LORs for me, even though I don't use (that I
> > know of) extended attributes.
> >
> > 5) It seems that so long as umass is used at all, the symptoms will
> > eventually show up. I've been able to trigger the symptoms by inserting
> > then removing a umass device immediately after boot, then ramping up the
> > workload.
> >
> > 6) The only difference made by vfs.zfs.debug=1 is that zfs reclaims are
> > logged.
> >
> > I'm at a bit of a loss as to what to test next, other than checking for an
> > increased number of checksum errors after unloading umass. However, I'm not
> > convinced this is going to highlight the actual problem. I'm all ears as to
> > what to test for at this point, as I'm running out of ideas.
>
> I have a question or two, and an idea.
>
> The questions:
>
> 1) How much RAM do you have, is it 4G or more? (I'm guessing the
> answer is "yes".)
>
> 2) What does "sysctl -a | grep bounced" say? Check this both before and after
> loading umass and seeing the bug triggered.
>
> My idea: I suspect a bug in the bounce-buffer code that does I/O to memory
> space beyond the area a given piece of hardware can access directly thru DMA.
> I've had some similar issues with checksum errors, and they seem to have gone
> away since lowering hw.physmem to 3400M in loader.conf, which cuts memory
> usage down below the point where anything needs to use bounce buffers.
> You might try lowering hw.physmem and see if that helps; check with the
> "sysctl -a | grep bounced" command, you should be seeing something like
>
> hw.busdma.zone0.total_bounced: 0
> hw.busdma.zone1.total_bounced: 0
> hw.busdma.zone2.total_bounced: 0
>
> if no bounce-buffer usage is going on. (The number of zones may be different
> on your system.)
Can you please try http://www.FreeBSD.org/~jhb/patches/dma_pg.patch? This
lines up with your analysis in that it fixes a problem in the bounce buffer
code that was introduced with the new USB stack (and only triggers when the
USB code has to use a bounce buffer).
--
John Baldwin
More information about the freebsd-current
mailing list