Swapping deadlock due to aic/scsi errors?

Dave Dolson ddolson at sandvine.com
Wed Aug 6 12:27:37 PDT 2003


We have a reproducible bug characterized by the system
becoming unresponsive (but db may be entered).
System is based on FreeBSD 4.7 (i386)
Using the aic79xx scsi driver.

Common elements:
pagedaemon waiting in wswbuf0 
  (waiting for free page from swapper?)
swapper waiting in vmwait
  (waiting for free page from disk?)
nsw_wcount_async=0

If any procs page fault, they will be waiting on swread 
then the following message will be seen (Once every 20s):
swap_pager: indefinite wait buffer: device: #da/0x30001, blkno: 10352, size:
4096

I believe that the swapper is waiting for the scsi drive 
to call vunmapbuf() after asynchronously sending the page
to be swapped out.

The following message is sometimes seen, followed 
by a "dump card state":  "SCB 0x1f - timed out"

I would like to add some debugging to detect the lost command 
and possibly retry it.  Can someone suggest where the lost
command is supposed to be detected, and where the retry is 
supposed to occur.

(I've been looking through the cam and ahd code, but need
some direction)

Thanks in advance,
David Dolson (ddolson at sandvine.com, www.sandvine.com)



More information about the freebsd-scsi mailing list