ZFS crashing during snapdir lookup for non-existent snapshot...

Sean Chittenden sean at chittenden.org
Wed Oct 10 20:57:08 UTC 2012


Using a FreeBSD -STABLE build from 2012-09-17, I now have the ability to crash FreeBSD/ZFS within a few hours of stress testing. It appears as though there's a locking problem when attempting to interrogate stats on a ZFS snapshot that doesn't exist any more. I believe the scenario is as follows:

Background:

*) `zfs set snapdir=visible` /was/ set on a data set

*) Snapshots were being run once an hour for weeks, long enough for zabbix to auto-discover the snapshots as valid file systems.

*) `zfs inherit snapdir` was recently set (about a week ago), but zabbix is still attempting to inquire about no snapshots that are no longer visible or exist.


After snapshots were deleted through the normal process of aging, zabbix is still interrogating the file system attempting to acquire information about the now deleted snapshots.

FreeBSD crashes once every few minutes when zabbix is running and pulling ZFS information about the now hidden (or most likely deleted) snapshots. I believe that zabbix is using getfsspec(3) with the now stale snapshot name in rapid succession and is somehow triggering a race when there are two concurrent calls to two different non-existent snapshots.

-sc


kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid = 0; apic id = 00
kernel: fault virtual address    = 0x368
kernel: fault code               = supervisor read data, page not present
kernel: instruction pointer      = 0x20:0xffffffff80922be2
kernel: stack pointer            = 0x28:0xffffff8487d7b0d0
kernel: frame pointer            = 0x28:0xffffff8487d7b170
kernel: code segment             = base 0x0, limit 0xfffff, type 0x1b
kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags = interrupt enabled, resume, IOPL = 0
kernel: current process          = 3536 (zabbix_agentd)
kernel: trap number              = 12
kernel: panic: page fault
kernel: cpuid = 0
kernel: KDB: stack backtrace:
kernel: #0 0xffffffff80950800 at kdb_backtrace+0x60
kernel: #1 0xffffffff8091ac2d at panic+0x1fd
kernel: #2 0xffffffff80c21858 at trap_fatal+0x388
kernel: #3 0xffffffff80c21b23 at trap_pfault+0x2b3
kernel: #4 0xffffffff80c212b5 at trap+0x5b5
kernel: #5 0xffffffff80c0ba22 at calltrap+0x8
kernel: #6 0xffffffff8092271e at _sx_xlock+0x5e
kernel: #7 0xffffffff816e9384 at zfsctl_snapdir_lookup+0x124
kernel: #8 0xffffffff80cb385f at VOP_LOOKUP_APV+0x5f
kernel: #9 0xffffffff809a307f at lookup+0x5ef
kernel: #10 0xffffffff809a263d at namei+0x62d
kernel: #11 0xffffffff809b2b39 at kern_statfs+0x89
kernel: #12 0xffffffff809b2a80 at sys_statfs+0x20
kernel: #13 0xffffffff80c22134 at amd64_syscall+0x334

FreeBSD example.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Mon Sep 17 04:34:37 UTC 2012     root at example.com:/usr/obj/usr/src/sys/GENERIC  amd64

0xffffffff80922be2 is in _sx_xlock_hard (/usr/src/sys/kern/kern_sx.c:546).
541			x = sx->sx_lock;
542			if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) == 0) {
543				if ((x & SX_LOCK_SHARED) == 0) {
544					x = SX_OWNER(x);
545					owner = (struct thread *)x;
546					if (TD_IS_RUNNING(owner)) {
547						if (LOCK_LOG_TEST(&sx->lock_object, 0))
548							CTR3(KTR_LOCK,
549						    "%s: spinning on %p held by %p",
550							    __func__, sx, owner);




--
Sean Chittenden
sean at chittenden.org



More information about the freebsd-fs mailing list