Reproduceable freeze with quotas enabled
Attila Nagy
bra at fsn.hu
Fri Nov 4 23:02:47 PST 2005
Hello,
Robert Watson wrote:
>
> On Thu, 3 Nov 2005, Attila Nagy wrote:
>
>> Hello,
>>
>> I have an "easily" (at least to me) reproduceable freeze with both
>> 6-STABLE and 7-CURRENT on an amd64 SMP machine.
>>
>> What I do is simply copy a lot of directories, files and symlinks with
>> different uids from another machine to this one, using rsync.
>
>
> This is probably a vnode lock leak or deadlock. As it looks like you
> have both the ability to get into the debugger and also a serial or
> other remote console, the output of:
>
> show allpcpu
> alltrace
> show lockedvnods
> show allocks
>
> would be helpful. There are also "options DEBUG_LOCKS" which extends
> the debugging information available via "show lockedvnods" with
> information about where the lock was acquired, which can be quite helpful.
>
> Robert N M Watson
>
>>
>> With quotas enabled it freezes constantly, without quotas it doesn't.
>>
>> I have MP_WATCHDOG enabled (the kernel config is the default
>> GENERIC+SMP with debuggers and MP_WATCHDOG only) so after the freeze
>> it drops me into the debugger in which I can see the following:
>>
>> FreeBSD/amd64 (helen) (ttyv0)
>>
>> login: Watchdog timer: 3
>> Watchdog timer: 2
>> Watchdog timer: 1
>> Watchdog timer: 0
>> Watchdog firing!
>> NMI ... going to debugger
>> [thread pid 157 tid 100143 ]
>> Stopped at vinactive+0x83: jz 0x365245a9
>> db> bt
>> Tracing pid 157 tid 100143 td 0xffffff007b9c14c0
>> vinactive() at vinactive+0x83
>> vput() at vput+0x1cf
>> qsync() at qsync+0x260
>> ffs_sync() at ffs_sync+0x159
>> sync_fsync() at sync_fsync+0x191
>> VOP_FSYNC_APV() at VOP_FSYNC_APV+0x3d
>> sched_sync() at sched_sync+0x428
>> fork_exit() at fork_exit+0x86
>> fork_trampoline() at fork_trampoline+0xe
>> --- trap 0, rip = 0, rsp = 0xffffffffb2029d00, rbp = 0 ---
>> db> ps
>> 157 ffffff007b9489c0 0 0 0 0000204 [CPU 0] syncer
This is an HP machine, so I can access it via it's ILO port. But it
mangles larger outputs, so alltrace will be truncated.
I will find a way to get a serial console attached to it, but until that...
I've recompiled the kernel with "options DEBUG_LOCKS", rebooted and
started the rsync process.
This is what I see now:
login: Watchdog timer: 3
Watchdog timer: 2
Watchdog timer: 1
Watchdog timer: 0
Watchdog firing!
NMI ... going to debugger
[thread pid 14 tid 100010 ]
db> bt
Tracing pid 14 tid 100010 td 0xffffff007b94ebe0
acpi_cpu_c1() at acpi_cpu_c1+0x6
acpi_cpu_idle() at acpi_cpu_idle+0x174
idle_proc() at idle_proc+0x14e
fork_exit() at fork_exit+0x86
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb1e81d00, rbp = 0 ---
db> ps
[...]
14 ffffff007b97b680 0 0 0 000020c [CPU 0] idle: cpu0
[...]
db> show allpcpu
cpuid = 0
curthread = 0xffffff007b94ebe0: pid 14 "idle: cpu0"
curpcb = 0xffffffffb1e81d10
fpcurthread = none
idlethread = 0xffffff007b94ebe0: pid 14 "idle: cpu0"
cpuid = 1
curthread = 0xffffff007b94e000: pid 13 "idle: cpu1"
curpcb = 0xffffffffb1e68d10
fpcurthread = none
idlethread = 0xffffff007b94e000: pid 13 "idle: cpu1"
cpuid = 2
curthread = 0xffffff007b9014c0: pid 157 "syncer"
curpcb = 0xffffffffb20e8d10
fpcurthread = none
idlethread = 0xffffff007b94dbe0: pid 12 "idle: cpu2"
cpuid = 3
curthread = 0xffffff007b94d980: pid 11 "mp_watchdog cpu 3"
curpcb = 0xffffffffb1e5ed10
fpcurthread = none
idlethread = 0xffffff007b94d980: pid 11 "mp_watchdog cpu 3"
db> show lockedvnods
Locked vnodes
0xffffff0061c05470: tag syncer, type VNON
usecount 1, writecount 0, refcount 2 mountedhere 0
flags ()
lock type syncer: EXCL (count 1) by thread 0xffffff007b9014c0 (pid
157)
0xffffff00418086a8: tag ufs, type VDIR
usecount 1, writecount 0, refcount 90 mountedhere 0
flags ()
lock type ufs: EXCL (count 1) by thread 0xffffff005d435980 (pid 1180)
ino 18983427, on dev da1s1d
0xffffff0015a1b6a8: tag ufs, type VLNK
usecount 0, writecount 0, refcount 1 mountedhere 0
flags (VI_DOOMED)
lock type ufs: EXCL (count 1) by thread 0xffffff005d435980 (pid 1180)
ino 19058341, on dev da1s1d
WARNING this is not complete!
db> alltrace
--- syscall (93, FreeBSD ELF64, select), rip = 0x800d4c14c, rsp =
0x7fffffffc2d8
, rbp = 0x1 ---
Tracing command rsync pid 1180 tid 100267 td 0xffffff005d435980
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
turnstile_wait() at turnstile_wait+0x272
_mtx_lock_sleep() at _mtx_lock_sleep+0xa6
msleep() at msleep+0x2a6
acquire() at acquire+0x76
debuglockmgr() at debuglockmgr+0x500
vop_stdlock() at vop_stdlock+0x3e
VOP_LOCK_APV() at VOP_LOCK_APV+0x49
VOP_LOCK_APV() at VOP_LOCK_APV+0x49
debug_vn_lock() at debug_vn_lock+0x8f
dqsync() at dqsync+0x95
dqrele() at dqrele+0x68
ufs_reclaim() at ufs_reclaim+0x100
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x3d
vgonel() at vgonel+0x12f
vnlru_free() at vnlru_free+0x29f
getnewvnode() at getnewvnode+0x2ca
ffs_vget() at ffs_vget+0xe9
ffs_valloc() at ffs_valloc+0x109
ufs_makeinode() at ufs_makeinode+0x6e
ufs_symlink() at ufs_symlink+0x3f
VOP_SYMLINK_APV() at VOP_SYMLINK_APV+0x65
kern_symlink() at kern_symlink+0x3d3
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (57, FreeBSD ELF64, symlink), rip = 0x8006b93ec, rsp =
0x7fffffffb7d
8, rbp = 0xa64388 ---
Tracing command sh pid 1178 tid 100202 td 0xffffff00601f8be0
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
sleepq_wait_sig() at sleepq_wait_sig+0xe
cv_wait_sig() at cv_wait_sig+0x17f
kern_select() at kern_select+0xbd4
select() at select+0x3e
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (93, FreeBSD ELF64, select), rip = 0x4165ac, rsp =
0x7fffffffe948, r
bp = 0x7fffffffe970 ---
Tracing command adjkerntz pid 271 tid 100195 td 0xffffff0062431000
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
sleepq_wait_sig() at sleepq_wait_sig+0xe
msleep() at msleep+0x21b
kern_sigsuspend() at kern_sigsuspend+0xb1
sigsuspend() at sigsuspend+0x40
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x80067a18c, rsp =
0x7ffffff
fed28, rbp = 0xfffffffffffff1f0 ---
(null)() at 0x80067a18c
*** error reading from address fffffffffffff1f8 ***
I will try to provide a full alltrace output tomorrow.
Thanks,
--
Attila Nagy e-mail: Attila.Nagy at fsn.hu
Adopt a directory on our free software phone: +3630 306 6758
server! http://www.fsn.hu/?f=brick
More information about the freebsd-amd64
mailing list