Server with 3TB Crashing at boot

Konstantin Belousov kostikbel at gmail.com
Sat Mar 14 13:47:40 UTC 2015


On Sat, Mar 14, 2015 at 02:22:52PM +0100, Michael Fuckner wrote:
> On 3/14/2015 12:43 PM, Steven Hartland wrote:
> >
> >
> > On 14/03/2015 06:59, Michael Fuckner wrote:
> >> On 3/13/2015 10:17 PM, Ryan Stone wrote:
> >>> On Fri, Mar 13, 2015 at 4:42 PM, Michael Fuckner <michael at fuckner.net>
> >>> wrote:
> >>>
> >>>>    Now I can kldload zfs without exploding kernel. I'll do some more
> >>>> tests
> >>>> tomorrow, but this looks fine!
> >>>>
> >>>
> >>> Excellent news!  I'd be interested to know whether this fixes the panics
> >>> that you saw when zfs.ko was loaded by the bootloader.  It's definitely
> >>> possible, as the symptoms of this bug are likely to be random memory
> >>> corruption after zfs initializes, but your crash happened pretty
> >>> early on
> >>> and I'm not sure whether zfs would have had a chance to do anything that
> >>> early.
> >>>
> >>> Thanks for all of the work that you did to debug this.
> >>
> >>
> >> Currently there is another issue that prevents me from testing ZFS:
> >> only one HBA gets initialized.
> >>
> >> mpr0: 9300-8i with 8x Intel S3700
> >> mpr1: 9300-4i4e with 4x Intel S3700 and an external JBOD with 24HDD.
> >>
> >> mpr0 initializes fine, mpr1 fails
> >>
> >> root at s4l:~ # dmesg |grep mpr
> >> mpr1: <LSI SAS3008> port 0xf000-0xf0ff mem 0xfb100000-0xfb10ffff irq
> >> 112 at device 0.0 on pci195
> >> mpr1: IOCFacts  :
> >> mpr1: Firmware: 07.00.01.00, Driver: 05.255.05.00-fbsd
> >> mpr1: IOCCapabilities:
> >> 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
> >>
> >> mpr1: Cannot allocate queues memory
> >> mpr1: mpr_iocfacts_allocate failed to alloc queues with error 12
> >> mpr1: mpr_attach IOC Facts based allocation failed with error 12
> >> device_attach: mpr1 attach returned 12
> 
> did your patch for queue size and sense size.
> 
> I just saw:
> 
> http://dedi3.fuckner.net/~molli123/temp/mpr2.cap (just the second half 
> of the boot, but nothing changed but the two debug echos)
> 
> 
> root at s4l:~ # dmesg |grep mpr
> mpr0: <LSI SAS3008> port 0x2000-0x20ff mem 0xaba00000-0xaba0ffff irq 42 
> at device 0.0 on pci4
> mpr0: IOCFacts  :
> mpr0: Firmware: 07.00.01.00, Driver: 05.255.05.00-fbsd
> mpr0: IOCCapabilities: 
> 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
> mpr0: attempting to allocate 1 MSI-X vectors (96 supported)
> mpr0: using IRQ 300 for MSI-X
> mpr1: <LSI SAS3008> port 0xf000-0xf0ff mem 0xfb100000-0xfb10ffff irq 112 
> at device 0.0 on pci195
> mpr1: IOCFacts  :
> mpr1: Firmware: 07.00.01.00, Driver: 05.255.05.00-fbsd
> mpr1: IOCCapabilities: 
> 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
> mpr1: Cannot allocate sense size 258048 memory
> mpr1: mpr_iocfacts_allocate failed to alloc queues with error 12
> mpr1: mpr_attach IOC Facts based allocation failed with error 12
> device_attach: mpr1 attach returned 12
> (probe0:mpr0:0:0:0): Down reving Protocol Version from 4 to 0?
> da0 at mpr0 bus 0 scbus0 target 0 lun 0
> pass0 at mpr0 bus 0 scbus0 target 0 lun 0
> 
> In Bios I have VT-d enabled. In the VT-d Menu there are two other options:
> ATS (Non-Iscoh VT-D Engine ATS Support), default: enabled
ATS is not used by FreeBSD right now, and your device probably does not
support it as well.

> Coherency Support(Non Iscoh VT_D Engine Coherency Support), default: 
> Disabled
There is a bug in 10.1 which incorrectly invalidates IOMMU pages cache.
Enabling the coherency works around the bug.

> 
> In the Processor Configuration tab there also was
> extended apic support (default is disabled)
On 10.1 this option should not result in any behaviour change.

> 
> Should I give this option a try?

Up to you.  I am somewhat curious whether it boots with DMAR enabled and
what happens if it does not.


More information about the freebsd-hackers mailing list