SATA disks suddenly stop working
Elliot Schlegelmilch
elliot at schlegelmilch.org
Mon Mar 2 11:20:50 PST 2009
Alexander Motin wrote:
[snip]
>>
>> ata2: <ATA channel 0> on atapci1
>> ata2: AHCI reset...: 2
>> ata2: SATA connect time=0ms
>> ata2: ready wait time=0ms52 (12272 MB)
>> ata2: software reset port 15...
>> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001
>> ata2: software reset set timeout
>> ata2: software reset port 0...
>> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001
>> ata2: software reset set timeout
>> ata2: SIGNATURE: ffffffff
>> ata2: Unknown signature, assuming disk device
>> ata2: AHCI reset done: devices=00000001
>> ata2: [MPSAFE]
>> ata2: [ITHREAD]
>>
>> One for each channel, up to ata7.
>
> Does it happen during boot or what do you mean by unable to reattach
> drive now?
Yes, I saw the above during boot.
What I mean by unable to reattach is describing the old behavior:
Sometimes my ad12 would fall off the bus, and I could usually retrieve
it by 'atacontrol detach ata6; atacontrol attach ata6;'
Now it's: ata6: still BUSY after softreset
and attempting the detach/attach results in:
Tracing pid 12 tid 100007 td 0xffffff0001afb390
device_get_parent() at device_get_parent+0x1
ata_start() at ata_start+0x1c5
ata_reinit() at ata_reinit+0x1dd
ata_completed() at ata_completed+0x75
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x68
ithread_loop() at ithread_loop+0xb2
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xfffffffe4004ad40, rbp = 0 ---
This isn't a huge deal, and is probably a red herring, as I suspect
the disk is going bad at this point. This is running Feb 1 kernel, as
I recall. However, it can and has stayed attached for weeks at a time
before.
>> atapci0 at pci0:0:31:1: class=0x01018a card=0x948115d9 chip=0x269e8086
>> rev=0x09 hdr=0x00
>> vendor = 'Intel Corporation'
>> device = '631xESB/632xESB/3100 Ultra ATA Storage Controller'
>> class = mass storage
>> subclass = ATA
>>
>> The last known kernel which works was Dec 17, but trying to rebuild a
>> kernel from that date doesn't see the SATA disks either (as the kernel
>> which sees the disks zfs doesn't work.) Or perhaps I'm csup'ing
>> incorrectly.
>
> Haven't you tried to just touched reset sequence on 15.
Do you mean a kernel on Feb 15? Was there more that happened between
15th and the 22nd or so?
> When you succeed to boot, can you try to make some experiments against
> HEAD, may be some of them fix the problem:
> 1) comment that line inside ata_ahci_issue_cmd():
> ATA_OUTL(ctlr->r_res2, ATA_AHCI_P_FBS + offset, (port << 8) |
> 0x00000001);
>
> 2) comment these lines inside ata_sata_phy_reset():
> if ((ATA_IDX_INL(ch, ATA_SCONTROL) & ATA_SC_DET_MASK) ==
> ATA_SC_DET_IDLE)
> return ata_sata_connect(ch);
>
> 3) comment first that line inside ata_ahci_softreset():
> return (-1);
>
> Thanks.
>
I'll try these patches and report back right after I freshen up my backups. :)
More information about the freebsd-current
mailing list