Swapping deadlock due to aic/scsi errors?
Dave Dolson
ddolson at sandvine.com
Wed Aug 6 12:27:37 PDT 2003
We have a reproducible bug characterized by the system
becoming unresponsive (but db may be entered).
System is based on FreeBSD 4.7 (i386)
Using the aic79xx scsi driver.
Common elements:
pagedaemon waiting in wswbuf0
(waiting for free page from swapper?)
swapper waiting in vmwait
(waiting for free page from disk?)
nsw_wcount_async=0
If any procs page fault, they will be waiting on swread
then the following message will be seen (Once every 20s):
swap_pager: indefinite wait buffer: device: #da/0x30001, blkno: 10352, size:
4096
I believe that the swapper is waiting for the scsi drive
to call vunmapbuf() after asynchronously sending the page
to be swapped out.
The following message is sometimes seen, followed
by a "dump card state": "SCB 0x1f - timed out"
I would like to add some debugging to detect the lost command
and possibly retry it. Can someone suggest where the lost
command is supposed to be detected, and where the retry is
supposed to occur.
(I've been looking through the cam and ahd code, but need
some direction)
Thanks in advance,
David Dolson (ddolson at sandvine.com, www.sandvine.com)
More information about the freebsd-scsi
mailing list