lsi1064e
Eugene M. Zheganin
eugene at zhegan.in
Thu Jun 2 12:58:49 UTC 2011
Hi.
I'm using FreeBSD 8.2 and IBM system x 3250 servers which are bundled
with an onboard LSI 1064e controller.
I'm using 'em with geom_mirror and zfs (I have like dozen of these).
Last time I noticed weird thing on a server with gmirror: one drive died
and the server hung up until it was rebooted. This week I was examining
some zfs-related freezes (I guess its about arc size, but someone on the
irc told me that disks timeouts can be the reason too) and I was
experimenting on my test server (waiting for being put into the
production). And I noticed some wrong (at least I think it's wrong)
behaviour: keeping in mind that last time I got freeze when drive died,
I pulled out one of two drives in a zfs mirrored pool. Then I got
immediate freeze - all of the disk operations were freezed, but the
system was alive. I entered the kernel debugger and saw a bunch of
proccesses in D state, including some of the zfs threads.
I updated the LSI1064e firmware (last 1.30.xx found on the IBM site),
the BIOS, but nothing helps. When one of the disks is pulled out
(there's no need to do that in production, but I guess the exact same
thing happens when the drive dies along with all of its electric
circuits) the system waits indefinitely, until the drive is pushed back,
or until the server is rebooted. Then (if the drive is pushed back) the
mpt driver realises that either the drive was reset, or that device was
lost (I don't know what this depends from).
Funny thing: after the drive is pulled out and pushed back, and the
camcontrol rescan is issued, you can pull it out again, and this time
(and any time after that) the system willl detect that drive is gone
quite fast, and no disk operations freeze will happen.
You can imagine that this behaviour is not the one anyone expects when
drive dies. So I want to ask - if this, perhaps, can be tuned, so the
system will keep running and somehow will detect that the drive is
failed in some short time, like 3-15 seconds ? Or is this a bug and I
need to write a pr ?
Thanks.
Eugene.
More information about the freebsd-scsi
mailing list