Re: git: 3cf97e91fac5 - main - Revert "newbus: Change attach failure behavior"

From: Warner Losh <imp_at_bsdimp.com>
Date: Tue, 06 Dec 2022 16:09:36 UTC
On Tue, Dec 6, 2022 at 3:57 AM Hans Petter Selasky <hps@selasky.org> wrote:

> On 12/6/22 03:09, Warner Losh wrote:
> > The branch main has been updated by imp:
> >
> > URL:
> https://cgit.FreeBSD.org/src/commit/?id=3cf97e91fac5f53fc0375bc816cc541a8864ffc4
> >
> > commit 3cf97e91fac5f53fc0375bc816cc541a8864ffc4
> > Author:     Warner Losh <imp@FreeBSD.org>
> > AuthorDate: 2022-12-05 23:57:58 +0000
> > Commit:     Warner Losh <imp@FreeBSD.org>
> > CommitDate: 2022-12-06 00:00:26 +0000
> >
> >      Revert "newbus: Change attach failure behavior"
> >
> >      This reverts commit 68c3f0302106643207dcdfe3b414810e245228e5. There
> are
> >      some weird crashes when KVMs switch caused by this, so revert this
> >      commit until they are sorted out.
> >
> >      Reported by:            cy@
> >      Sponsored by:           Netflix
> > ---
> >   UPDATING            | 2 ++
> >   sys/kern/subr_bus.c | 2 +-
> >   2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/UPDATING b/UPDATING
> > index 099066031b8e..001ec9f6de3a 100644
> > --- a/UPDATING
> > +++ b/UPDATING
> > @@ -43,6 +43,8 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 14.x IS SLOW:
> >       needs to use devctl to re-enable the device, and reprobe it (or set
> >       the sysctl/tunable hw.bus.disable_failed_devices=false).
> >
> > +     NOTE: This was reverted 20221205 due to unexpected compatibility
> issues
> > +
> >   20221122:
> >       pf no longer accepts 'scrub fragment crop' or 'scrub fragment
> drop-ovl'.
> >       These configurations are no longer automatically reinterpreted as
> > diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c
> > index 6a5ec4efc38d..b9615b033007 100644
> > --- a/sys/kern/subr_bus.c
> > +++ b/sys/kern/subr_bus.c
> > @@ -69,7 +69,7 @@ SYSCTL_NODE(_hw, OID_AUTO, bus, CTLFLAG_RW |
> CTLFLAG_MPSAFE, NULL,
> >   SYSCTL_ROOT_NODE(OID_AUTO, dev, CTLFLAG_RW | CTLFLAG_MPSAFE, NULL,
> >       NULL);
> >
> > -static bool disable_failed_devs = true;
> > +static bool disable_failed_devs = false;
> >   SYSCTL_BOOL(_hw_bus, OID_AUTO, disable_failed_devices, CTLFLAG_RWTUN,
> &disable_failed_devs,
> >       0, "Do not retry attaching devices that return an error from
> DEVICE_ATTACH the first time");
> >
>
> Thinking about it, this flag shouldn't be set for USB devices and HUBS
> and such. Probably only makes sense for PCI devices, though there is
> something called thunderbolt too, which may fail during probe/attach,
> because the user yanked the device.
>

I think it makes perfect sense for all devices everywhere. When a device
goes
away like you say, it's device_t will be gone soonish and this flag will
clear if
it is reinserted in the future. The bus will get a signal for that yanking
and will
remove the device_t (now maybe we have a bug in device deletion when that
happens, which is what I suspected when I saw this and a couple other
tracebacks).


> Regarding the assert in the USB stack, maybe the state was not correctly
> set on the device_t ?
>

It's unclear to me. Newbus doesn't guarantee certain states to the bus
drivers, so
maybe the assert in the USB stack is incorrectly strict on what states
it assumes the device is in? I'm unsure. I haven't looked deeply enough to
know
what exactly is going on. Since there were problems and I didn't have time
to do
the proper deep dive, I just reverted for now and will revisit when I have
the time.

Warner