svn commit: r253708 - head/sys/dev/ipmi
Sean Bruno
sean_bruno at yahoo.com
Mon Jul 29 22:51:04 UTC 2013
[sbruno_comment_blocks == 4]
>
> The identify function in 7.x has no such check:
>
> static void
> ipmi_isa_identify(driver_t *driver, device_t parent)
> {
> struct ipmi_get_info info;
> uint32_t devid;
>
> if (ipmi_smbios_identify(&info) && info.iface_type != SSIF_MODE &&
> device_find_child(parent, "ipmi", -1) == NULL) {
Ok then what is this ^^^^^^^^^ ? Doesn't this mean that if
device_find_child() returns a child node that we should abort? Is that
not the same as what I'm going on about?
>
> The only check in 7 was the one you just moved in ipmi_isa_attach():
>
> static int
> ipmi_isa_attach(device_t dev)
> {
> struct ipmi_softc *sc = device_get_softc(dev);
> struct ipmi_get_info info;
> const char *mode;
> int count, error, i, type;
>
> /*
> * Pull info out of the SMBIOS table. If that doesn't work, use
> * hints to enumerate a device.
> */
> if (!ipmi_smbios_identify(&info) &&
> !ipmi_hint_identify(dev, &info))
> return (ENXIO);
>
> /*
> * Give other drivers precedence. Unfortunately, this doesn't
> * work if we have an SMBIOS table that duplicates a PCI device
> * that's later on the bus than the PCI-ISA bridge.
> */
> if (ipmi_attached)
> return (EBUSY);
> ...
> }
>
> > Am I just confused on the bus relationship here?
> >
> > We've gone over this a couple of times in different emails on different
> > lists. I've just never sat down and walked through the code. If you
> > see a better way to keep ipmi(4) from erroneously attaching to the ISA
> > interface, let me know.
>
> I haven't seen any mention of this problem before. I've seen threads about
> the watchdog issue (trying to set watchdog on shutdown which wants to use
> threads, etc.), but not this.
>
http://lists.freebsd.org/pipermail/freebsd-stable/2012-March/067023.html
The thread gets broken apart by the mail list software though.
Somewhere in there I point at ipmi1 things. But hell if I can find it
anymore.
> Also, can you provide the console messages without this patch? The previous
> check in ipmi_isa_attach() is long before we touch the BMC or ever get
> around to creating /dev/ipmi1. (Just because you see ipmi1: in dmesg doesn't
> mean it's created /dev/ipmi1.)
>
Definitely does *not* create /dev/ipmi1.
> > > > Move the check for ipmi_attached out of the ipmi_isa_attach function and into
> > > > the ipmi_isa_identify function. Remove the check of the device tree for
> > > > ipmi devices attached.
> > > >
> > > > This probing appears to make Broadcom management firmware on Dell machines
> > > > crash and emit NMI EISA warnings at various times requiring power cycles
> > > > of the machines to restore.
> > >
> > > This makes no sense. All you are doing is skipping ipmi_smbios_identify()
> > > which just looks at the SMBIOS table in RAM. It doesn't try to probe the
> > > BMC at all (no register accesses, etc.). If just reading a table in memory
> > > causes side effects, then running dmidecode in userland should be hosing your
> > > machines as well.
> > >
> >
> > Probably right. I'm not exactly sure what is making the Broadcom
> > firmware fall over and die. But when I see the console emitting "NMI
> > EISA" error messages, this ususally requires a full reboot as the
> > network interface has crashed. Hopefully, I can dig up more "fact" soon
> > as testing continues.
>
> I'd rather be sure this is the right fix, and if it isn't I'd prefer to
> revert this as I don't think it is actually fixing anything.
>
It definitely does *not* have the effect that I advertised in my commit
message.
the commit DOES:
-- remove any attempt to do anything in ipmi_isa_* functions.
-- does not emit any errors on attach failure (which are noisy and
distracting).
-- make attaching to ipmi0 more "reliable" by blindly raising the
timeout value to 6 seconds. (6 seconds is the totally empirical
value I came up with in testing that never failed to attach across
100+ reboots).
I disagree that it should be reverted. We can argue about it if you
wish and I'm open to modifying back to the original code. I don't think
I'd agree with removing the error messages on attachment failure though.
I view the attachment failures as "sysadmin noise" but they should be
there *if* there is a real attach failure.
sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freebsd.org/pipermail/svn-src-all/attachments/20130729/1c49248f/attachment.sig>
More information about the svn-src-all
mailing list