svn commit: r253708 - head/sys/dev/ipmi

Sean Bruno sean_bruno at yahoo.com
Mon Jul 29 22:51:04 UTC 2013


[sbruno_comment_blocks == 4]

> 
> The identify function in 7.x has no such check:
> 
> static void
> ipmi_isa_identify(driver_t *driver, device_t parent)
> {
> 	struct ipmi_get_info info;
> 	uint32_t devid;
> 
> 	if (ipmi_smbios_identify(&info) && info.iface_type != SSIF_MODE &&
> 	    device_find_child(parent, "ipmi", -1) == NULL) {

Ok then what is this ^^^^^^^^^ ?  Doesn't this mean that if
device_find_child() returns a child node that we should abort?  Is that
not the same as what I'm going on about?

> 
> The only check in 7 was the one you just moved in ipmi_isa_attach():
> 
> static int
> ipmi_isa_attach(device_t dev)
> {
> 	struct ipmi_softc *sc = device_get_softc(dev);
> 	struct ipmi_get_info info;
> 	const char *mode;
> 	int count, error, i, type;
> 
> 	/*
> 	 * Pull info out of the SMBIOS table.  If that doesn't work, use
> 	 * hints to enumerate a device.
> 	 */
> 	if (!ipmi_smbios_identify(&info) &&
> 	    !ipmi_hint_identify(dev, &info))
> 		return (ENXIO);
> 
> 	/*
> 	 * Give other drivers precedence.  Unfortunately, this doesn't
> 	 * work if we have an SMBIOS table that duplicates a PCI device
> 	 * that's later on the bus than the PCI-ISA bridge.
> 	 */
> 	if (ipmi_attached)
> 		return (EBUSY);
> 	...
> }
> 
> > Am I just confused on the bus relationship here?
> > 
> > We've gone over this a couple of times in different emails on different
> > lists.  I've just never sat down and walked through the code.  If you
> > see a better way to keep ipmi(4) from erroneously attaching to the ISA
> > interface, let me know.
> 
> I haven't seen any mention of this problem before.  I've seen threads about
> the watchdog issue (trying to set watchdog on shutdown which wants to use
> threads, etc.), but not this.
> 

http://lists.freebsd.org/pipermail/freebsd-stable/2012-March/067023.html

The thread gets broken apart by the mail list software though.
Somewhere in there I point at ipmi1 things.  But hell if I can find it
anymore.

> Also, can you provide the console messages without this patch?  The previous
> check in ipmi_isa_attach() is long before we touch the BMC or ever get
> around to creating /dev/ipmi1.  (Just because you see ipmi1: in dmesg doesn't
> mean it's created /dev/ipmi1.)
> 
Definitely does *not* create /dev/ipmi1.

> > > >   Move the check for ipmi_attached out of the ipmi_isa_attach function and into
> > > >   the ipmi_isa_identify function.  Remove the check of the device tree for
> > > >   ipmi devices attached.
> > > >   
> > > >   This probing appears to make Broadcom management firmware on Dell machines
> > > >   crash and emit NMI EISA warnings at various times requiring power cycles
> > > >   of the machines to restore.
> > > 
> > > This makes no sense.  All you are doing is skipping ipmi_smbios_identify()
> > > which just looks at the SMBIOS table in RAM.  It doesn't try to probe the
> > > BMC at all (no register accesses, etc.).  If just reading a table in memory
> > > causes side effects, then running dmidecode in userland should be hosing your
> > > machines as well.
> > > 
> > 
> > Probably right.  I'm not exactly sure what is making the Broadcom
> > firmware fall over and die.  But when I see the console emitting "NMI
> > EISA" error messages, this ususally requires a full reboot as the
> > network interface has crashed.  Hopefully, I can dig up more "fact" soon
> > as testing continues.
> 
> I'd rather be sure this is the right fix, and if it isn't I'd prefer to
> revert this as I don't think it is actually fixing anything.
> 
It definitely does *not* have the effect that I advertised in my commit
message.

the commit DOES:
-- remove any attempt to do anything in ipmi_isa_* functions.
-- does not emit any errors on attach failure (which are noisy and
   distracting).
-- make attaching to ipmi0 more "reliable" by blindly raising the
   timeout value to 6 seconds.  (6 seconds is the totally empirical
   value I came up with in testing that never failed to attach across
   100+ reboots).

I disagree that it should be reverted.  We can argue about it if you
wish and I'm open to modifying back to the original code.  I don't think
I'd agree with removing the error messages on attachment failure though.
I view the attachment failures as "sysadmin noise" but they should be
there *if* there is a real attach failure.

sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freebsd.org/pipermail/svn-src-all/attachments/20130729/1c49248f/attachment.sig>


More information about the svn-src-all mailing list