amd64/163815: HDD timeout on ZFS + SB7x0 SATA Controller [AHCI]

Zaphod zaphod at berentweb.com
Wed Jan 4 15:20:05 UTC 2012


>Number:         163815
>Category:       amd64
>Synopsis:       HDD timeout on ZFS + SB7x0 SATA Controller [AHCI]
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-amd64
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jan 04 15:20:05 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Zaphod
>Release:        9.0
>Organization:
NA
>Environment:
FreeBSD 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0 r228984: Fri Dec 30 12:57:09 EET 2011   amd64
>Description:
Problem first showed its self during port builds (heavy HDD usage):
--------------------------------
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 32262, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 66056, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 82746, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 44091, size: 4096
ahcich0: Timeout on slot 29 port 0
ahcich0: is 00000000 cs 000000ff ss e00000ff rs e00000ff tfd c0 serr 00000000 cmd 0004e017
ahcich0: AHCI reset...
ahcich0: SATA connect time=100us status=00000123
ahcich0: AHCI reset: device found
(ada0:ahcich0:0:0:0): Command timed out
(ada0:ahcich0:0:0:0): Retrying command
-------------------------------
Now, after latest update to /usr/src, buildworld breaks with "seg.fault 11" message, but actually due to swap_pager timeout. Break is near clang/lib/ARCMigrate/TransAutoreleasePool.cpp (but where is not so relevant). Also, CPU usage is not very heavy before system freeze.

Hardware & Setup Info:
- controller: ahci0 at pci0:0:17:0: class=0x010601 card=0x43911002 chip=0x43911002 rev=0x00 hdr=0x00 device= 'SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]' on 'RS780 Host Bridge'. Board: Biostar A780L
- HDD is SAMSUNG HD322HJ, 320GB, ATA-8-ACS revision 3b, all FS on ZFS.
- CPU: K8 [Athlon64/Opteron
- mem/swap: RAM 1 GB / swap 2 GB (not zfs). Usage during buildworld: max RAM 65% / max swap 58% 

Previously built full-debug enabled kernel shows some errors as:
kernel: lock order reversal:
kernel: 1st 0xfffffe0010598248 filedesc structure (filedesc structure) @ /asp/src/sys/kern/kern_descrip.c:1197
kernel: 2nd 0xfffffe001052ccf0 zfs (zfs) @ /asp/src/sys/kern/vfs_subr.c:4245
kernel: KDB: stack backtrace:
kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kernel: kdb_backtrace() at kdb_backtrace+0x37
kernel: _witness_debugger() at _witness_debugger+0x65
kernel: witness_checkorder() at witness_checkorder+0x833
kernel: __lockmgr_args() at __lockmgr_args+0xd9d
kernel: vop_stdlock() at vop_stdlock+0x39
kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
kernel: _vn_lock() at _vn_lock+0x68
kernel: knlist_remove_kq() at knlist_remove_kq+0xfc
kernel: knote_fdclose() at knote_fdclose+0x177
kernel: kern_close() at kern_close+0xe8
kernel: amd64_syscall() at amd64_syscall+0x27b
kernel: Xfast_syscall() at Xfast_syscall+0xf7
kernel: --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8015abcdc, rsp = 0x7fffffffd868, rbp = 0x801807b20 ---

Another one:
kernel: lock order reversal:
kernel: 1st 0xfffffe0018e3a448 filedesc structure (filedesc structure) @ /asp/src/sys/kern/kern_descrip.c:1197
kernel: 2nd 0xfffffe0004533cf0 devfs (devfs) @ /asp/src/sys/kern/vfs_subr.c:4245
kernel: KDB: stack backtrace:
kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kernel: kdb_backtrace() at kdb_backtrace+0x37
kernel: _witness_debugger() at _witness_debugger+0x65
kernel: witness_checkorder() at witness_checkorder+0x833
kernel: __lockmgr_args() at __lockmgr_args+0xd9d
kernel: vop_stdlock() at vop_stdlock+0x39
kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
kernel: _vn_lock() at _vn_lock+0x68
kernel: knlist_remove_kq() at knlist_remove_kq+0xfc
kernel: knote_fdclose() at knote_fdclose+0x177
kernel: kern_close() at kern_close+0xe8
kernel: amd64_syscall() at amd64_syscall+0x27b
kernel: Xfast_syscall() at Xfast_syscall+0xf7
kernel: --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8015abcdc, rsp = 0x7fffffffd498, rbp = 0x80180e230 ---
kernel: lock order reversal:
kernel: 1st 0xfffffe0018e3a448 filedesc structure (filedesc structure) @ /asp/src/sys/kern/kern_descrip.c:1197
kernel: 2nd 0xfffffe000b4f4a78 pseudofs (pseudofs) @ /asp/src/sys/kern/vfs_subr.c:4245
kernel: KDB: stack backtrace:
kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kernel: kdb_backtrace() at kdb_backtrace+0x37
kernel: _witness_debugger() at _witness_debugger+0x65
kernel: witness_checkorder() at witness_checkorder+0x833
kernel: __lockmgr_args() at __lockmgr_args+0xd9d
kernel: vop_stdlock() at vop_stdlock+0x39
kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
kernel: _vn_lock() at _vn_lock+0x68
kernel: knlist_remove_kq() at knlist_remove_kq+0xfc
kernel: knote_fdclose() at knote_fdclose+0x177
kernel: kern_close() at kern_close+0xe8
kernel: amd64_syscall() at amd64_syscall+0x27b
kernel: Xfast_syscall() at Xfast_syscall+0xf7
kernel: --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8015abcdc, rsp = 0x7fffffffd498, rbp = 0x80180e230 ---

More details posted in forum: http://forums.freebsd.org/showthread.php?t=27452
>How-To-Repeat:

>Fix:
NA

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-amd64 mailing list