kern/137037: [zfs] [hang] zfs rollback on root causes FreeBSD
to freeze in few seconds
Peter Much
pmc at citylink.dinoex.sub.org
Tue Aug 11 08:20:07 UTC 2009
The following reply was made to PR kern/137037; it has been noted by GNATS.
From: Peter Much <pmc at citylink.dinoex.sub.org>
To: bug-followup at FreeBSD.org, killasmurf86 at gmail.com
Cc:
Subject: Re: kern/137037: [zfs] [hang] zfs rollback on root causes FreeBSD to freeze in few seconds
Date: Tue, 11 Aug 2009 09:34:10 +0200
I considered to do more investigations before reporting my issue,
but after seeing this bug report I think an interim report from
my side should not harm.
I also experience system failures after rollback, and the significant
similarity is that in my case also the rollbacks succeed, and the
system continues to work for some seconds (or sometimes even longer)
before it fails.
The failure is either (seldom) a system freeze or (much more often)
an instanteous reboot without dumping. I am currently investigating
about methods to capture some useful data. Maybe, if it freezes,
running "watchdog" can trick it to do a dump...
I am running 7.2-STABLE as of mid-July (that is ZFS V13).
I admit I am someway low on memory to run ZFS (memory is
ordered ;) ), but I use it only for a very limited number of
filesystems and specific tasks, and I am watching carefully about
my mem usage. Nevertheless, if the system would run out of memory,
I would expect an orderly panic and not some hard reset or freeze.
I am not using geli or anything like, also I am not working with
the root; what I am doing is mainly an extensive use of the rollback
feature, from script, in a way like this:
while <some stuff>
do
zfs mount jb/x
mount -t zfs jb/p /jb/x/p
... do some work ...
umount /jb/x/p
umount /jb/x
zfs rollback jb/x at base
zfs rollback jb/p at base
done
At first I tried this without the unmounting, but the crashes
were so reproducible that I considered that unfunctional. With
the unmounting it looked functional first, but now I also experience
crashes about every 12 hours.
Beware: this is an interim report, I have not yet extensively
verified against possibilities of my own mistakes. Take it with
the appropriate grains of salt. ;)
------------------------------
Update: I was able to obtain a dump. After running the above loop
in a tough way and staying on the console, it suddenly
started to do havoc, reported that it were not able to unmount the
filesystems or could not detect them (something I also had seen
occasionally before) and then dropped me into the debugger at
_sx_xlock+0x16 lock cmpxchgl %edx,0x10(%ecx)
The backtrace see attached below - but beware, since the havoc had
already started before, this will very likely NOT point to the
root cause of the problem. But maybe it gives some first impression.
I suppose this should be reproducible, but in any case I would be glad
to provide further data if requested (or do further tests).
And as said before - if this is a result of low memory, then I am
just sorry. ;)
Ah, btw, its a dual Pentium3 SMP machine.
(gdb) add-symbol-file /usr/src/sys/i386/compile/D1R72V1/modules/usr/src/sys/modules/zfs/zfs.ko 0xc0a59860
add symbol table from file "/usr/src/sys/i386/compile/D1R72V1/modules/usr/src/sys/modules/zfs/zfs.ko" at
.text_addr = 0xc0a59860
(gdb) bt
#0 doadump () at pcpu.h:196
#1 0xc05e8be6 in boot (howto=260) at ../../../kern/kern_shutdown.c:418
#2 0xc05e8f07 in panic (fmt=Variable "fmt" is not available.
) at ../../../kern/kern_shutdown.c:574
#3 0xc046ed77 in db_panic (addr=Could not find the frame base for "db_panic".
) at ../../../ddb/db_command.c:446
#4 0xc046f52a in db_command (last_cmdp=0xc0932a54, cmd_table=0x0, dopager=1) at ../../../ddb/db_command.c:413
#5 0xc046f645 in db_command_loop () at ../../../ddb/db_command.c:466
#6 0xc047117c in db_trap (type=12, code=0) at ../../../ddb/db_main.c:228
#7 0xc0617581 in kdb_trap (type=12, code=0, tf=0xdb76b9fc) at ../../../kern/subr_kdb.c:524
#8 0xc0855adf in trap_fatal (frame=0xdb76b9fc, eva=76) at ../../../i386/i386/trap.c:929
#9 0xc0855d8b in trap_pfault (frame=0xdb76b9fc, usermode=0, eva=76) at ../../../i386/i386/trap.c:851
#10 0xc0856786 in trap (frame=0xdb76b9fc) at ../../../i386/i386/trap.c:529
#11 0xc083b70b in calltrap () at ../../../i386/i386/exception.s:166
#12 0xc05f0a56 in _sx_xlock (sx=0x3c, opts=0, file=0xc0b4953d "/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c", line=1807) at atomic.h:149
#13 0xc0a79185 in dmu_buf_update_user (db_fake=0x0, old_user_ptr=0xc2de3000, user_ptr=0x0, user_data_ptr_ptr=0x0, evict_func=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1807
#14 0xc0ad0cab in zfs_znode_dmu_fini (zp=0xc2de3000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:557
#15 0xc0aef214 in zfs_freebsd_reclaim (ap=0xdb76baf0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4385
#16 0xc0871602 in VOP_RECLAIM_APV (vop=0xc0b55560, a=0xdb76baf0) at vnode_if.c:1566
#17 0xc066d28f in vgonel (vp=0xc355ce04) at vnode_if.h:819
#18 0xc0670f26 in vflush (mp=0xc3db25a0, rootrefs=0, flags=Variable "flags" is not available.
) at ../../../kern/vfs_subr.c:2408
#19 0xc0aee0c8 in zfs_umount (vfsp=0xc3db25a0, fflag=134217728, td=0xc312bd80) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1005
#20 0xc066a201 in dounmount (mp=0xc3db25a0, flags=134217728, td=0xc312bd80) at ../../../kern/vfs_mount.c:1290
#21 0xc066a957 in unmount (td=0xc312bd80, uap=0xdb76bcfc) at ../../../kern/vfs_mount.c:1186
#22 0xc08560f5 in syscall (frame=0xdb76bd38) at ../../../i386/i386/trap.c:1089
#23 0xc083b770 in Xint0x80_syscall () at ../../../i386/i386/exception.s:262
#24 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(gdb)
More information about the freebsd-fs
mailing list