amr(4) driver broken with old hardware
Andy Farkas
chuzzwassa at gmail.com
Mon Dec 28 07:26:36 UTC 2009
Hello -scsi,
My main gateway box has been running FreeBSD 4.12 for quite a while until
I recently decided to upgrade it, via the source upgrade route:
1/ cvsup RELENG_5_5_0_RELEASE, make (GENERIC) world, install world, reboot.
2/ cvsup RELENG_6_0_0_RELEASE, make (GENERIC) world, install world, reboot.
3/ cvsup RELENG_6_1_0_RELEASE, make (GENERIC) world, install world, reboot.
6.1-RELEASE is where the problem started. Processes started hanging during
disk I/O. 6.0-R works flawlessly and is able to do buildworlds without fail.
The disk controller is a:
amr0 at pci0:6:0: class=0x018000 card=0x00000000 chip=0x9010101e rev=0x03 hdr=0x00
vendor = 'American Megatrends Inc.'
device = 'MegaRAID 428 Ultra Fast Wide SCSI RAID Controller'
class = mass storage
After some research, I discovered there was a "mega update" merged into the
amr(4) driver between 6.0-R and 6.1-R. So this is what I am concentrating on.
Booting back into 6.0-R I built the 6.1-R source again and included options
DDB, KDB, and BREAK_TO_DEBUGGER in the kernel config. I ran this kernel for
a bit until it started hanging. I then did a 'shutdown now' and some processes
were still hung but I got the single user prompt. I typed 'reboot', it hung.
Then I pressed CTRL-ALT-ESC and got the DDB prompt. I type 'panic' and got
a crash dump. I then rebooted back to a working 6.0-R kernel.
So I have a crash dump and wish to track down what happened. Here is what ps
says about the dump:
<div>
hewey# ps axlHwwwM /var/crash/vmcore.3 -N /boot/kernel/kernel -O lockname
UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME
COMMAND PID LOCK TT STAT TIME COMMAND
0 0 0 4 96 0 0 0 - WLs ?? 0:00.00
[swapper] 0 - ?? WLs 0:00.00 [swapper]
0 1 0 0 8 0 724 0 wait DLs ?? 0:00.31
[init] 1 - ?? DLs 0:00.31 [init]
0 2 0 0 -8 0 0 0 - DL ?? 0:56.69
[g_event] 2 - ?? DL 0:56.69 [g_event]
0 3 0 0 -8 0 0 0 - DL ?? 0:40.25
[g_up] 3 - ?? DL 0:40.25 [g_up]
0 4 0 0 -8 0 0 0 - DL ?? 0:53.29
[g_down] 4 - ?? DL 0:53.29 [g_down]
0 5 0 0 8 0 0 0 - DL ?? 0:00.00
[thread taskq] 5 - ?? DL 0:00.00 [thread taskq]
0 6 0 0 8 0 0 0 - DL ?? 0:00.00
[kqueue taskq] 6 - ?? DL 0:00.00 [kqueue taskq]
0 7 0 0 -8 0 0 0 - DL ?? 0:05.39
[fdc0] 7 - ?? DL 0:05.39 [fdc0]
0 8 0 0 -16 0 0 0 psleep DL ?? 0:01.68
[pagedaemon] 8 - ?? DL 0:01.68 [pagedaemon]
0 9 0 4 20 0 0 0 psleep DL ?? 0:00.00
[vmdaemon] 9 - ?? DL 0:00.00 [vmdaemon]
0 10 0 0 -16 0 0 0 ktrace DL ?? 0:00.00
[ktrace] 10 - ?? DL 0:00.00 [ktrace]
0 11 0 49 171 0 0 0 - RL ?? 5045:25.09
[idle] 11 - ?? RL 5045:25.09 [idle]
0 12 0 0 -44 0 0 0 - WL ?? 1:58.20
[swi1: net] 12 - ?? WL 1:58.20 [swi1: net]
0 13 0 0 -32 0 0 0 - WL ?? 13:22.54
[swi4: clock sio 13 - ?? WL 13:22.54 [swi4: clock sio]
0 14 0 0 -36 0 0 0 - WL ?? 0:00.00
[swi3: vm] 14 - ?? WL 0:00.00 [swi3: vm]
0 15 0 0 -16 0 0 0 - DL ?? 1:05.02
[yarrow] 15 - ?? DL 1:05.02 [yarrow]
0 16 0 0 -40 0 0 0 - WL ?? 0:00.00
[swi2: cambio] 16 - ?? WL 0:00.00 [swi2: cambio]
0 17 0 0 -28 0 0 0 - WL ?? 0:00.00
[swi5: +] 17 - ?? WL 0:00.00 [swi5: +]
0 18 0 0 -24 0 0 0 - WL ?? 0:00.00
[swi6: +] 18 - ?? WL 0:00.00 [swi6: +]
0 19 0 0 -24 0 0 0 - WL ?? 0:00.00
[swi6: task queu 19 - ?? WL 0:00.00 [swi6: task queue]
0 20 0 0 -64 0 0 0 - WL ?? 0:00.00
[irq14: ata0] 20 - ?? WL 0:00.00 [irq14: ata0]
0 21 0 0 -64 0 0 0 - WL ?? 0:00.00
[irq15: ata1] 21 - ?? WL 0:00.00 [irq15: ata1]
0 22 0 0 -64 0 0 0 - WL ?? 0:06.59
[irq11: amr0] 22 - ?? WL 0:06.59 [irq11: amr0]
0 23 0 0 -68 0 0 0 - WL ?? 0:42.83
[irq10: fxp0] 23 - ?? WL 0:42.83 [irq10: fxp0]
0 24 0 0 -60 0 0 0 - RL ?? 0:00.10
[irq1: atkbd0] 24 - ?? RL 0:00.10 [irq1: atkbd0]
0 25 0 0 -60 0 0 0 - WL ?? 0:00.00
[irq7: ppc0] 25 - ?? WL 0:00.00 [irq7: ppc0]
0 26 0 0 -48 0 0 0 - WL ?? 0:12.15
[swi0: sio] 26 - ?? WL 0:12.15 [swi0: sio]
0 27 0 0 171 0 0 0 pgzero DL ?? 0:23.90
[pagezero] 27 - ?? DL 0:23.90 [pagezero]
0 28 0 0 -16 0 0 0 psleep DL ?? 0:07.56
[bufdaemon] 28 - ?? DL 0:07.56 [bufdaemon]
0 29 0 0 -4 0 0 0 getblk DL ?? 0:59.97
[syncer] 29 - ?? DL 0:59.97 [syncer]
0 30 0 0 -7 0 0 0 bo_wwa DL ?? 0:03.41
[vnlru] 30 - ?? DL 0:03.41 [vnlru]
0 31 0 0 -16 0 0 0 sdflus DL ?? 0:17.20
[softdepflush] 31 - ?? DL 0:17.20 [softdepflush]
0 32 0 0 96 0 0 0 - DL ?? 1:37.15
[schedcpu] 32 - ?? DL 1:37.15 [schedcpu]
100 572 1 0 -4 0 14756 0 ufs D ?? 6:23.72
[squid] 572 - ?? D 6:23.72 [squid]
100 604 572 0 -84 0 0 0 - ZW ?? 0:00.00
<defunct> 604 - ?? ZW 0:00.00 <defunct>
0 11202 1 0 -4 0 1304 0 ufs D ?? 0:34.76
[find] 11202 - ?? D 0:34.76 [find]
0 12415 1 0 8 0 1632 0 wait Ds ?? 0:00.03
[sh] 12415 - ?? Ds 0:00.03 [sh]
0 12416 12415 0 -8 0 1232 0 biord D+ ?? 0:00.04
[reboot] 12416 - ?? D+ 0:00.04 [reboot]
hewey#
</div>
As you can see PIDs 29, 30, and 31 are stuck because of disk I/O.
But this is where I am stuck. I do not know kgdb well enough to debug
the crash dump I have. The only thing I have learnt to do is 'info
threads' and then 'thread 22'. Then, bang, brick wall.
Is it worth persuing this to debug the driver for such an old piece of
hardware in an old Pentium-Pro 200MHz?
Or can anyone give me a clue as to some kgdb commands to probe into why
the processes are stuck in IO?
Thanks,
-andyf
More information about the freebsd-scsi
mailing list