[stable-ish 9] Dell R815 ipmi(4) attach failure
Alexander Motin
mav at FreeBSD.org
Fri Apr 6 17:27:15 UTC 2012
On 04/06/12 20:12, Doug Ambrisko wrote:
> Alexander Motin writes:
> [ Charset ISO-8859-1 unsupported, converting... ]
> | On 04/04/12 21:47, John Baldwin wrote:
> |> On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
> |>> John Baldwin writes:
> |>> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
> |>> |> John Baldwin writes:
> |>> |> | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
> |>> |> |> Doug Ambrisko writes:
> |>> |> |> | John Baldwin writes:
> |>> |> |> | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
> |>> |> |> | |> Sean Bruno writes:
> |>> |> |> | |> | Noting a failure to attach to the onboard IPMI controller
> |> with
> |>> | this
> |>> |> | dell
> |>> |> |> | |> | R815. Not sure what to start poking at and thought I'd
> |> though
> |>> | this
> |>> |> | over
> |>> |> |> | |> | here for comment.
> |>> |> |> | |> |
> |>> |> |> | |> | -bash-4.2$ dmesg |grep ipmi
> |>> |> |> | |> | ipmi0: KCS mode found at io 0xca8 on acpi
> |>> |> |> | |> | ipmi1:<IPMI System Interface> on isa0
> |>> |> |> | |> | device_attach: ipmi1 attach returned 16
> |>> |> |> | |> | ipmi1:<IPMI System Interface> on isa0
> |>> |> |> | |> | device_attach: ipmi1 attach returned 16
> |>> |> |> | |> | ipmi0: Timed out waiting for GET_DEVICE_ID
> |>> |> |> | |>
> |>> |> |> | |> I've run into this recently. A quick hack to fix it is:
> |>> |> |> | |>
> |>> |> |> | |> Index: ipmi.c
> |>> |> |> | |>
> [snip]
> |>> | If you use "-ct" then you get a file you can feed into schedgraph.
> |>> | However, just reading the log, it seems that IRQ 20 keeps preempting
> |>> | the KCS worker thread preventing it from getting anything done. Also,
> |>> | there seem to be a lot of threads on CPU 0's runqueue waiting for a
> |>> | chance to run (load average of 12 or 13 the entire time). You can try
> |>> | just bumping up the max timeout from 3 seconds to higher perhaps. Not
> |>> | sure why IRQ 20 keeps firing though. It might be related to USB, so
> |>> | you could try fiddling with USB options in the BIOS perhaps, or disabling
> |>> | the USB drivers to see if that fixes IPMI.
> |>>
> |>> Tried without USB in kernel:
> |>> http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
> |>
> |> Hmm, it's still just running constantly (note that the idle thread is
> |> _never_ scheduled). The lion's share of the time seems to be spent in
> |> "xpt_thrd". Note that there are several places where nothing happens except
> |> that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks. I
> |> would maybe start debugging that to see what in the world it is doing. Maybe
> |> it is polling some hardware down in xpt_action() (i.e., xpt_action() for a
> |> single bus called down into a driver and it is just spinning using polling
> |> instead of sleeping and waiting for an interrupt).
> |
> | "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus
> | on attach and by controller driver on hot-plug events. For some
> | controllers it may be quite CPU-hungry. For example, for legacy ATA
> | controllers, where bus reset may take many seconds of hardware polling,
> | while devices just spinning up. For ahci(4) it was improved about year
> | ago to not use polling when possible, but it still may loop for some
> | time if controller is not responding on reset. What mfi(4), mentioned in
> | log, does during scanning, I am not sure.
>
> I thought that mfi(4) could be an issue. There are some ata controllers
> with nothing attached. I built a GENERIC with USB and mfi commented out
> and then the timeout issue went away:
> ipmi0: KCS mode found at io 0xca8 on acpi
> ipmi1:<IPMI System Interface> on isa0
> device_attach: ipmi1 attach returned 16
> ipmi1:<IPMI System Interface> on isa0
> device_attach: ipmi1 attach returned 16
> ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1
> ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211
> ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272
> ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332
> ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0
>
> Without mfi and with USB and it had issues:
> ipmi0: KCS mode found at io 0xca8 on acpi
> ipmi1:<IPMI System Interface> on isa0
> device_attach: ipmi1 attach returned 16
> ipmi1:<IPMI System Interface> on isa0
> device_attach: ipmi1 attach returned 16
> ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
> ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137
> ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199
> ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259
> ipmi0: Timed out waiting for GET_DEVICE_ID
> ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0
>
> I can post more ktrdump traces if needed. A 1U Dell machine without
> mfi also has this problem. As John mentioned it might be good to
> bump up the timeout from 3s to 6s. I did that with the USB no mfi
> kernel and that passed:
>
> % dmesg | grep ipmi
> ipmi0: KCS mode found at io 0xca8 on acpi
> ipmi1:<IPMI System Interface> on isa0
> device_attach: ipmi1 attach returned 16
> ipmi1:<IPMI System Interface> on isa0
> device_attach: ipmi1 attach returned 16
> ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
> ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137
> ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199
> ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259
> ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0
>
> So maybe we need to agressively bump up the timeout. I put a
> timeout since I didn't want the system to hang. Anyone have a
> good idea of a timeout. I thought I tried 6s initially and it
> had issues but then the machine I was playing with had 3 mfi
> cards and various disks hanging off it.
I have no idea about IPMI timeout to propose, but can't that check be
remade opposite: if response received -- use it, otherwise -- check
error value? Obviously it is not IPMI problem that CPU is busy, but
ability to work in those conditions would be a bonus.
--
Alexander Motin
More information about the freebsd-stable
mailing list