sporadic CAM (all devices) outage on 11-stable, mps(4), ahci(4) and bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot]
Harry Schmalzbauer
freebsd at omnilan.de
Wed Jun 7 08:18:33 UTC 2017
Bezüglich Harry Schmalzbauer's Nachricht vom 01.06.2017 21:10 (localtime):
> Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 20:55 (localtime):
>> Take a look at PR 212914. Could that be the issue? It was MFC'd to stable/11
>> with r309273 on Nov 28th, 2016.
> Thanks a lot, but that's unrelated.
Unfortunately, today a similar lockup occured :-(
I was informed by mps(4):
(da1:mps0:0:3:0): READ(10). CDB: 28 00 06 7e 4d 53 00 00 10 00
(da1:mps0:0:3:0): CAM status: Unrecoverable Host Bus Adapter Error
(da1:mps0:0:3:0): Retrying command
(da1:mps0:0:3:0): WRITE(10). CDB: 2a 00 06 f8 c5 1f 00 00 38 00
(da1:mps0:0:3:0): CAM status: Unrecoverable Host Bus Adapter Error
(da1:mps0:0:3:0): Retrying command
(da1:mps0:0:3:0): WRITE(10). CDB: 2a 00 06 f8 c5 1f 00 00 38 00
(da1:mps0:0:3:0): CAM status: SCSI Status Error
(da1:mps0:0:3:0): SCSI status: Check Condition
(da1:mps0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset,
or bus device reset occurred)
(da1:mps0:0:3:0): Error 6, Retries exhausted
(da1:mps0:0:3:0): Invalidating pack
But it seemed all drives got lost again (although the kernel message
couldn't be printed anymore), since on another still responsive
(memorydisk rootfs) session I could get the zpool status and zfs
reported all members having outstanding requests:
pool: cetusPsys
state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool
clear'.
see: http://illumos.org/msg/ZFS-8000-JQ
scan: none requested
config:
NAME STATE READ WRITE CKSUM
cetusPsys ONLINE 370 13 0
mirror-0 ONLINE 40 12 0
gpt/cetusSYSzd1of4 ONLINE 3 26 0
da2 ONLINE 3 16 0
mirror-1 ONLINE 700 9 0
gpt/cetusSYSzd2of4 ONLINE 3 9 0
da3 ONLINE 3 54 0
I'll do anything I can do to help tracking this problem, since the one
thing happened which I have taken massive precaution not to happen... a
freezing hypervisor :-(
Thanks,
-harry
(In case one is following any of my other recent PRs: This time, no
passthru-enabled-VM was involved. The latter causes some very serious
memory corruption IMHO... This machine is a XEON E3 with ECC, neither
MBC nor MCE reports ECC errors...
More information about the freebsd-scsi
mailing list