[ahd driver] 12.3: kernel crash when stopping disks

From: Peter <pmc_at_citylink.dinoex.sub.org>
Date: Thu, 09 Dec 2021 12:23:43 UTC
> Dec  5 01:08:25 <local0.info> edge gstopd[64139]: Error received from stop unit command
> Dec  5 01:08:25 <kern.crit> edge kernel: ahd0: Recovery Initiated - Card was not paused
> Dec  5 01:08:25 <kern.crit> edge kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
> Dec  5 01:08:25 <kern.crit> edge kernel: ahd0: Dumping Card State at program address 0x7e Mode 0x22

> Dec  5 01:08:25 <kern.crit> edge kernel: (pass0:ahd0:0:0:0): SCB 247 - timed out
> Dec  5 01:08:25 <kern.crit> edge kernel: (pass0:ahd0:0:0:0): Queuing a BDR SCB
> Dec  5 01:08:25 <kern.crit> edge kernel: (pass0:ahd0:0:0:0): Bus Device Reset Message Sent


Hija,

 I had a closer look into this one:

There must be a timeout flaw in the driver logic. I tried to run the
STOP UNIT from camcontrol with "-t 30", but nevertheless these
controller errors happen to appear after some 5 or 10 seconds.
So whereever it gets the timeout from, it is not the right one.

The kernel crash is then an occasional consequence of these strange
timeouts - it happened only once, while the erroneous timeouts happen
more often.

I now workaround the issue: as the STOP UNIT is the only concerned
command, I invoke that with the IMMED bit, not waiting until the disk
finally stops: as this is only for saving the rain-forests (any my
power bill), I don't care if or when the disks might manage to stop. 

No more problems or errors since that.


cheerio,
PMc