_sx_xlock_hard panic - is this zfs_zget panic deadlock ? (8.1-RELEASE)

David P Discher dpd at bitgravity.com
Tue Aug 23 20:50:58 UTC 2011


Hey FreeBSD FS - I got a new one, well, new for us, possibly already fixed.

I got the following panic - 

	Fatal trap 9: general protection fault while in kernel mode
	cpuid = 13; apic id = 21
	instruction pointer     = 0x20:0xffffffff80514848
	stack pointer           = 0x28:0xffffff9be057f230
	frame pointer           = 0x28:0xffffff9be057f2b0
	code segment            = base 0x0, limit 0xfffff, type 0x1b
							= DPL 0, pres 1, long 1, def32 0, gran 1
	processor eflags        = interrupt enabled, resume, IOPL = 0
	current process         = 94019 (rsync)
	[thread pid 94019 tid 102556 ]
	Stopped at      _sx_xlock_hard+0xd8:    movl    0x290(%r12),%r8d

More kgdb output below.  It looks like the operation in kern_sx panic'ed :

#9  0xffffffff80514848 in _sx_xlock_hard (sx=0xffffff008040e0d8, tid=18446742975725269984, opts=Variable "opts" is not available.
) at /usr/src/sys/kern/kern_sx.c:513
513                                     x = SX_OWNER(x);


The calling chain looks like 

  zfs_zget -> dmu_bonus_hold -> dnode_hold_impl -> _sx_xlock -> _sx_xlock_hard

The best I can tell, is this looks like a race or deadlock issue, leaked lock, etc,
something along those lines. My searching has under covered a patch in -head/-stable
that may address this particular issue:

	------------------------------------------------------------------------
	r209097 | mm | 2010-06-12 04:22:45 -0700 (Sat, 12 Jun 2010) | 8 lines
	Changed paths:
	   M /head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
	
	Fix ZFS panic deadlock: cycle in blocking chain via zfs_zget
	
	OpenSolaris onnv-revision:      9774:0bb234ab2287
	
	Obtained from:  OpenSolaris (Bug ID 6788152)
	Approved by:    pjd, delphij (mentor)
	MFC after:      3 days
	
	------------------------------------------------------------------------


I'd like a little bit of confirmation or validation or otherwise if this panic I'm
looking at could in fact be fixed by r209097, or is this something entirely different ?

Thanks !

---
David P. Discher
dpd at bitgravity.com * AIM: bgDavidDPD
BITGRAVITY * http://www.bitgravity.com


(kgdb) bt
#0  doadump () at pcpu.h:223
#1  0xffffffff801f0e5c in db_fncall (dummy1=Variable "dummy1" is not available.
) at /usr/src/sys/ddb/db_command.c:548
#2  0xffffffff801f1191 in db_command (last_cmdp=0xffffffff80b105e0, cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
#3  0xffffffff801f13e0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#4  0xffffffff801f3339 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff8053aff5 in kdb_trap (type=9, code=0, tf=0xffffff9be057f180) at /usr/src/sys/kern/subr_kdb.c:535
#6  0xffffffff8079519d in trap_fatal (frame=0xffffff9be057f180, eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:772
#7  0xffffffff80795a9a in trap (frame=0xffffff9be057f180) at /usr/src/sys/amd64/amd64/trap.c:588
#8  0xffffffff8077c827 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:223
#9  0xffffffff80514848 in _sx_xlock_hard (sx=0xffffff008040e0d8, tid=18446742975725269984, opts=Variable "opts" is not available.
) at /usr/src/sys/kern/kern_sx.c:513
#10 0xffffffff80514c89 in _sx_xlock (sx=0xffffff008040e0d8, opts=Variable "opts" is not available.
) at sx.h:148
#11 0xffffffff8104c82d in dnode_hold_impl (os=0xffffff002964f400, object=Variable "object" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:607
#12 0xffffffff81042c7a in dmu_bonus_hold (os=Variable "os" is not available.
) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:147
#13 0xffffffff81081e77 in zfs_zget (zfsvfs=0xffffff00295de000, obj_num=10078547, zpp=0xffffff9be057f518)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:869
#14 0xffffffff81092fb3 in zfs_dirent_lock (dlpp=0xffffff9be057f520, dzp=0xffffff13942938d0, name=0xffffff9be057f5f0 "n", zpp=0xffffff9be057f518, 
    flag=Variable "flag" is not available.
) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:321
#15 0xffffffff81093209 in zfs_dirlook (dzp=0xffffff13942938d0, name=0xffffff9be057f5f0 "n", vpp=0xffffff9be057f970, flags=Variable "flags" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:413
#16 0xffffffff810a2850 in zfs_lookup (dvp=0xffffff08daee6760, nm=0xffffff9be057f5f0 "n", vpp=0xffffff9be057f970, cnp=0xffffff9be057f998, nameiop=0, 
    cr=0xffffff028c6a5200, td=0xffffff005b0973e0, flags=Variable "flags" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1171
#17 0xffffffff810a3791 in zfs_freebsd_lookup (ap=0xffffff9be057f750)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4059
#18 0xffffffff807e08f4 in VOP_CACHEDLOOKUP_APV (vop=0xffffffff81107b20, a=0xffffff9be057f750) at vnode_if.c:187
#19 0xffffffff80580190 in vfs_cache_lookup (ap=Variable "ap" is not available.
) at vnode_if.h:80
#20 0xffffffff807e33ec in VOP_LOOKUP_APV (vop=0xffffffff81107b20, a=0xffffff9be057f810) at vnode_if.c:123
#21 0xffffffff80586864 in lookup (ndp=0xffffff9be057f940) at vnode_if.h:54
#22 0xffffffff80587797 in namei (ndp=0xffffff9be057f940) at /usr/src/sys/kern/vfs_lookup.c:269
#23 0xffffffff80594e02 in kern_statat_vnhook (td=0xffffff005b0973e0, flag=Variable "flag" is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:2346
#24 0xffffffff80595025 in kern_statat (td=Variable "td" is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:2327
#25 0xffffffff805950ea in lstat (td=Variable "td" is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:2390
#26 0xffffffff8079564b in syscall (frame=0xffffff9be057fc80) at /usr/src/sys/amd64/amd64/trap.c:945
#27 0xffffffff8077cb01 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:374
#28 0x00000008007810fc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) up
#1  0xffffffff801f0e5c in db_fncall (dummy1=Variable "dummy1" is not available.
) at /usr/src/sys/ddb/db_command.c:548
548             *rv = (*f)(args[0], args[1], args[2], args[3], args[4], args[5],
(kgdb) up
#2  0xffffffff801f1191 in db_command (last_cmdp=0xffffffff80b105e0, cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
445                 (*cmd->fcn)(addr, have_addr, count, modif);
(kgdb) up
#3  0xffffffff801f13e0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
498                 db_command(&db_last_command, &db_cmd_table, /* dopager */ 1);
(kgdb) up
#4  0xffffffff801f3339 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
229                     db_command_loop();
(kgdb) up
#5  0xffffffff8053aff5 in kdb_trap (type=9, code=0, tf=0xffffff9be057f180) at /usr/src/sys/kern/subr_kdb.c:535
535             handled = kdb_dbbe->dbbe_trap(type, code);
(kgdb) up
#6  0xffffffff8079519d in trap_fatal (frame=0xffffff9be057f180, eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:772
772                     if (kdb_trap(type, 0, frame))
(kgdb) up
#7  0xffffffff80795a9a in trap (frame=0xffffff9be057f180) at /usr/src/sys/amd64/amd64/trap.c:588
588                     trap_fatal(frame, 0);
(kgdb) up
#8  0xffffffff8077c827 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:223
223             call    trap
Current language:  auto; currently asm
(kgdb) up
#9  0xffffffff80514848 in _sx_xlock_hard (sx=0xffffff008040e0d8, tid=18446742975725269984, opts=Variable "opts" is not available.
) at /usr/src/sys/kern/kern_sx.c:513
513                                     x = SX_OWNER(x);
Current language:  auto; currently c
(kgdb) p x
$1 = 281466386776064
(kgdb) p sx
$2 = (struct sx *) 0xffffff008040e0d8
(kgdb) p *sx
$3 = {lock_object = {lo_name = 0x2901fc0044005e <Address 0x2901fc0044005e out of bounds>, lo_flags = 2104063083, lo_data = 1953056627, 
    lo_witness = 0x154407f4763bc41}, sx_lock = 281466386776064}
(kgdb) list
508                      * running or the state of the lock changes.
509                      */
510                     x = sx->sx_lock;
511                     if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) == 0) {
512                             if ((x & SX_LOCK_SHARED) == 0) {
513                                     x = SX_OWNER(x);
514                                     owner = (struct thread *)x;
515                                     if (TD_IS_RUNNING(owner)) {
516                                             if (LOCK_LOG_TEST(&sx->lock_object, 0))
517                                                     CTR3(KTR_LOCK,
(kgdb) up 
#10 0xffffffff80514c89 in _sx_xlock (sx=0xffffff008040e0d8, opts=Variable "opts" is not available.
) at sx.h:148
148                     error = _sx_xlock_hard(sx, tid, opts, file, line);
(kgdb) up
#11 0xffffffff8104c82d in dnode_hold_impl (os=0xffffff002964f400, object=Variable "object" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:607
607             mutex_enter(&dn->dn_mtx);
(kgdb) list
602                             dnode_destroy(dn);
603                             dn = winner;
604                     }
605             }
606
607             mutex_enter(&dn->dn_mtx);
608             type = dn->dn_type;
609             if (dn->dn_free_txg ||
610                 ((flag & DNODE_MUST_BE_ALLOCATED) && type == DMU_OT_NONE) ||
611                 ((flag & DNODE_MUST_BE_FREE) && type != DMU_OT_NONE)) {
(kgdb) print dn
$4 = (dnode_t *) 0xffffff008040e000
(kgdb) print *dn
$5 = {dn_struct_rwlock = {lock_object = {lo_name = 0xffffffff81100b7c "dn->dn_struct_rwlock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, 
    sx_lock = 1}, dn_link = {list_next = 0xffffff14dcffd630, list_prev = 0xffffff14e5f64c40}, dn_objset = 0xffffff002964f400, 
  dn_object = 4611409328054323539, dn_dbuf = 0x10c837, dn_phys = 0x3f, dn_type = 538976288, dn_bonuslen = 8224, dn_bonustype = 75 'K', 
  dn_nblkptr = 77 'M', dn_checksum = 51 '3', dn_compress = 48 '0', dn_nlevels = 49 '1', dn_indblkshift = 51 '3', dn_datablkshift = 72 'H', 
  dn_datablkszsec = 18247, dn_datablksz = 1145130323, dn_maxblkid = 4698733595085963320, dn_next_nblkptr = "08iH", dn_next_nlevels = "athc", 
  dn_next_indblkshift = " iUH", dn_next_bonuslen = {16695, 12851, 12339, 12353}, dn_next_blksz = {909397057, 538980384, 538976288, 538976288}, 
  dn_dirty_link = {{list_next = 0x2f00400080102020, list_prev = 0x7020002004000}, {list_next = 0xfc10003f00103fff, list_prev = 0xfffffff010000fb}, {
      list_next = 0x78000300070000, list_prev = 0x7800780078}, {list_next = 0x0, list_prev = 0x6170e001f0000}}, dn_mtx = {lock_object = {
      lo_name = 0x2901fc0044005e <Address 0x2901fc0044005e out of bounds>, lo_flags = 2104063083, lo_data = 1953056627, 
      lo_witness = 0x154407f4763bc41}, sx_lock = 281466386776064}, dn_dirty_records = {{list_size = 70088136784871424, 
      list_offset = 6724054219973732112, list_head = {list_next = 0xca00000001, list_prev = 0xcca250005a874000}}, {list_size = 2807702982, 
      list_offset = 0, list_head = {list_next = 0x401c409c0000, list_prev = 0x0}}, {list_size = 3096229038784512, 
      list_offset = 1004619926008233984, list_head = {list_next = 0x200440000001fa20, list_prev = 0x7000000000230}}, {list_size = 2258422671148295, 
      list_offset = 74596396452349191, list_head = {list_next = 0x4235413700060308, list_prev = 0x5db90000180a0000}}}, dn_ranges = {{
      avl_root = 0x8000ffff, avl_compar = 0, avl_offset = 8589934592, avl_numnodes = 0, avl_size = 0}, {avl_root = 0x0, avl_compar = 0, 
      avl_offset = 0, avl_numnodes = 0, avl_size = 0}, {avl_root = 0x0, avl_compar = 0, avl_offset = 61, avl_numnodes = 0, 
      avl_size = 2026619832316723200}, {avl_root = 0x0, avl_compar = 0x21101f, avl_offset = 0, avl_numnodes = 0, avl_size = 65011713}}, 
  dn_allocated_txg = 0, dn_free_txg = 0, dn_assigned_txg = 0, dn_notxholds = {cv_description = 0x0, cv_waiters = 1503985664}, 
  dn_dirtyctx = DN_UNDIRTIED, dn_dirtyctx_firstset = 0x0, dn_tx_holds = {rc_count = 0}, dn_holds = {rc_count = 1}, dn_dbufs_mtx = {lock_object = {
      lo_name = 0xffffffff81100b9e "dn->dn_dbufs_mtx", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, dn_dbufs = {
    list_size = 224, list_offset = 176, list_head = {list_next = 0xffffff07072075f0, list_prev = 0xffffff07072075f0}}, 
  dn_bonus = 0xffffff0abe353700, dn_zio = 0x0, dn_zfetch = {zf_rwlock = {lock_object = {lo_name = 0xffffffff81100ddf "zf->zf_rwlock", 
        lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, zf_stream = {list_size = 112, list_offset = 88, list_head = {
        list_next = 0xffffff159f7c5dd8, list_prev = 0xffffff159f7c5dd8}}, zf_dnode = 0xffffff008040e000, zf_stream_cnt = 1, zf_alloc_fail = 1}}
(kgdb) 


More information about the freebsd-fs mailing list