Boot hangs on ips0: resetting adapter, this may take up to 5 minutes

John Baldwin jhb at freebsd.org
Thu Mar 23 18:59:24 UTC 2006


On Thursday 23 March 2006 04:14, Oleg Sharoiko wrote:
> Hi!
> 
> On Mon, 13 Mar 2006, John Baldwin wrote:
> 
> JB>> To make GENERIC usable it's enough to comment
> JB>> options      PREEMPTION
> JB>> Not sure if this helps much.
> JB>It could point to a bug in a driver.
> 
>  All this time I was doing experiments, but the more I did the less I 
> understood. Now I'd say that I suppose the problem is not with a 
> particular device, but rather with a number of devices installed in the 
> system. The things are different depending on hardware setup and kernel 
> configuration. Just a few examples:
> 
>  The only configuration which I've never seen failing was with no pci cards 
> installed and several devices disabled in BIOS (mouse, floppy, ata, serial 
> ata). This way the system boots fine with GENERIC kernel. As soon as I 
> install additional scsi card (adaptec 29160) SCB timeouts start happening 
> on internal scsi adapter during "Waiting 5 seconds for SCSI devices to 
> settle". The system would still boot after "ahd0: Recovery Initiated - 
> Card was not paused". If I remove bge driver from kernel (keeping 
> additional scsi in system) this timeouts go away.
> 
>  The GENERIC kernel on the system with no pci cards and all devices 
> enabled in BIOS sometimes boots and sometimes hangs with last line "lo0: 
> bpf attached". The same happens with kernel without bge with the exception 
> that for this one chances that it would boot are higher.
> 
>  When ips pci card is installed the GENERIC kernel would definitely hang 
> at boot. Kernel without bge would boot almost for sure. On SMP kernel I 
> was even able to kldload bge when boot have been completed. The same 
> action on UP system produces rather strange results. If I boot to 
> singleuser mode and load if_bge than the system returns to command prompt 
> and I can edit command line and everything looks normal. But as soon as I 
> try to execute something (I suppose disk io is a point here, but I'm not 
> sure) the system becomes extremely slow. It takes about 30 seconds to 
> print a single character on console. The same happens if I load if_bge in 
> multiuser mode.

This points to an interrupt storm.

>  One thing is common to all cases: when system hangs (or becomes slow) 
> Ctrl+Alt+Esc wouldn't work, but sending break on com port still would and 
> it's possible to get into kernel debugger. Unfortunately this doesn't help 
> me. To be true I don't think I can cope with this on my own. I setup 
> remote gdb for this box but it gives nothing to me, due to lack of 
> knowledge on how interrupt delivery works and how interrupt handling is 
> done in FreeBSD. Would it be possible for you, John, or maybe for someone 
> else to look at this box. I can provide full remote access to it with 
> remote gdb, serial console and ip kvm.

Can you drop into the debugger and do 'show intrcnt' after you have triggered
the interrupt storm from bge?

-- 
John Baldwin <jhb at FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org


More information about the freebsd-scsi mailing list