ZFS: reproducable inability to accesss a pool (process hangs; other pools fine)

Pawel Jakub Dawidek pjd at FreeBSD.org
Wed Nov 7 03:57:36 PST 2007


On Mon, Oct 22, 2007 at 05:35:21PM +0200, Peter Schuller wrote:
> Hello,
> 
> On the same system I recently posted about on -stable, with RELENG_7
> from a few days ago, I am now running a SiL 3114 on a raidz2 in
> degraded mode with one disk missing (it is degraded by design because
> I wanted to create a 5 disk array but only had 4).
> 
> For the purpose of discovering any stability issues with the 3114
> controller I did some stress tests that have yet to reveil controller
> problems, but has triggered what appears to be a ZFS problem.
> 
> Test case:
> 
> /promraid       - root of the pool in question
> /promraid/ports - copy of /usr/ports tree from my machine
> /promraid/1     - empty directory
> /promraid/2     - empty directory
> 
> I now run concurrently in two shells:
> 
> while [ 1 ] ; do rsync -a /promraid/ports /promraid/1/pp ; rm -rf /promraid/1/pp ; done
> 
> and:
> 
> while [ 1 ] ; do rsync -a /promraid/ports /promraid/2/pp ; rm -rf /promraid/2/pp ; done
> 
> This runs fine for some hours, but eventually I end up with hung
> rsyncs in "zfs" state according to op. Attempting to e.g. ls /promraid
> hangs as well. Yet ZFS continues working (another pool is entirely
> fine), and there are no errors in dmesg.
> 
> iostat -x does NOT indicate that it is perpetually waiting on I/O from
> a disk or something likethat (0% utilization). The processes are
> unkillable, even by SIGKILL.
> 
> I should have this environment for a few more days, so can hopefully
> reproduce this again. It has happened at least twice already (the
> first time I was in X and X hung; I thought I had a panic so re-ran
> the tests in the console; these two times I didn't get a panic but I
> am unsure whether the failure case is different).
> 
> Does anyone have suggestions for what to do to produce the best
> information possible? Given that there are no errors, no panic, etc.
> 
> One obvious bit is to ktrace them I realize, if that gives me anything
> (the size of the trace if I were to trace it from the beginning would,
> I suspect, be prohibitive). Will do that next time.

I've found a deadlock recently. Can you enter DDB, find spa_zio_intr_X
threads, run 'tr <pid>' on theirs PIDs and send me the output?

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20071107/47e0fe77/attachment.pgp


More information about the freebsd-fs mailing list