Boot hangs on ips0: resetting adapter,
this may take up to 5 minutes
John Baldwin
jhb at freebsd.org
Thu Mar 23 18:59:24 UTC 2006
On Thursday 23 March 2006 04:14, Oleg Sharoiko wrote:
> Hi!
>
> On Mon, 13 Mar 2006, John Baldwin wrote:
>
> JB>> To make GENERIC usable it's enough to comment
> JB>> options PREEMPTION
> JB>> Not sure if this helps much.
> JB>It could point to a bug in a driver.
>
> All this time I was doing experiments, but the more I did the less I
> understood. Now I'd say that I suppose the problem is not with a
> particular device, but rather with a number of devices installed in the
> system. The things are different depending on hardware setup and kernel
> configuration. Just a few examples:
>
> The only configuration which I've never seen failing was with no pci cards
> installed and several devices disabled in BIOS (mouse, floppy, ata, serial
> ata). This way the system boots fine with GENERIC kernel. As soon as I
> install additional scsi card (adaptec 29160) SCB timeouts start happening
> on internal scsi adapter during "Waiting 5 seconds for SCSI devices to
> settle". The system would still boot after "ahd0: Recovery Initiated -
> Card was not paused". If I remove bge driver from kernel (keeping
> additional scsi in system) this timeouts go away.
>
> The GENERIC kernel on the system with no pci cards and all devices
> enabled in BIOS sometimes boots and sometimes hangs with last line "lo0:
> bpf attached". The same happens with kernel without bge with the exception
> that for this one chances that it would boot are higher.
>
> When ips pci card is installed the GENERIC kernel would definitely hang
> at boot. Kernel without bge would boot almost for sure. On SMP kernel I
> was even able to kldload bge when boot have been completed. The same
> action on UP system produces rather strange results. If I boot to
> singleuser mode and load if_bge than the system returns to command prompt
> and I can edit command line and everything looks normal. But as soon as I
> try to execute something (I suppose disk io is a point here, but I'm not
> sure) the system becomes extremely slow. It takes about 30 seconds to
> print a single character on console. The same happens if I load if_bge in
> multiuser mode.
This points to an interrupt storm.
> One thing is common to all cases: when system hangs (or becomes slow)
> Ctrl+Alt+Esc wouldn't work, but sending break on com port still would and
> it's possible to get into kernel debugger. Unfortunately this doesn't help
> me. To be true I don't think I can cope with this on my own. I setup
> remote gdb for this box but it gives nothing to me, due to lack of
> knowledge on how interrupt delivery works and how interrupt handling is
> done in FreeBSD. Would it be possible for you, John, or maybe for someone
> else to look at this box. I can provide full remote access to it with
> remote gdb, serial console and ip kvm.
Can you drop into the debugger and do 'show intrcnt' after you have triggered
the interrupt storm from bge?
--
John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve" = http://www.FreeBSD.org
More information about the freebsd-scsi
mailing list