Re: iflib_timer() vs ixl_admin_timer() race

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Wed, 12 Jan 2022 17:16:19 UTC
Thanks, Krzystof,

Grepping now for iflib_admin_intr_deferred() through the sources I see 
the same issue in other Intel NIC drivers, plus bnxt(4).  So the main 
controversy I see is this: either admin intr should not stop and restart 
the callouts (and then question is why it does that now), or if it 
should be so heavy-weight, it should not be called so regularly (and 
then question why so many drivers do it).

In few drivers I've found it even more interesting: iflib_timer() calls 
IFDI_TIMER(), which calls ixgbe_if_timer(), which calls 
iflib_admin_intr_deferred(), which in its turn restart the 
iflib_timer().  Ouroboros. ;)

On 12.01.2022 11:46, Galazka, Krzysztof wrote:
> Hi Alexander,
> 
> Thank you for pointing this out. ixl_admin_timer is used by a callout so 
> I don’t think we can acquire any locks there. IIRC it was added to let 
> tools for NVM update interact with FW while interface is down, so maybe 
> stopping it when interface goes UP would be an option. Let me think this 
> through.
> 
> Thanks,
> 
> Krzysiek
> 
> *From:* Eric Joyner <erj@freebsd.org>
> *Sent:* Wednesday, January 12, 2022 8:22 AM
> *To:* Alexander Motin <mav@freebsd.org>
> *Cc:* Joyner, Eric <eric.joyner@intel.com>; Galazka, Krzysztof 
> <krzysztof.galazka@intel.com>
> *Subject:* Re: iflib_timer() vs ixl_admin_timer() race
> 
> Well, I think the situation is more-or-less as you say -- it's just that 
> the intent of ixl_admin_timer() is supposed to have the adapter's admin 
> queue processed constantly, regardless of interrupt status or down/up 
> status. I think as a shorthand we just called _task_fn_admin because it 
> handles a bunch of other things as well as getting down to calling 
> ixl_if_update_admin_status() which does the admin queue processing. But 
> as you found, I don't think it was originally written to be called 
> periodically on a regular basis like iflib_timer(), so the callouts are 
> interacting badly.
> 
> My first thought would be to replace the call to 
> iflib_admin_intr_deferred() in ixl_admin_timer() with 
> ixl_if_update_admin_status() while taking the CTX_LOCK(). I'm not sure 
> everything else in _task_fn_admin() needs to run periodically like that 
> in the driver. That would avoid the callouts getting stopped every 500 
> ticks.
> 
> I CC'd my coworker Krzysztof who currently owns the driver; he might 
> have better thoughts on this.
> 
> - Eric
> 
> On Tue, Jan 11, 2022 at 10:46 AM Alexander Motin <mav@freebsd.org 
> <mailto:mav@freebsd.org>> wrote:
> 
>     Yes.  I've reverted my patch for now to not break anything, but all
>     this
>     situation does not look right for me too, so I'd appreciate your look.
> 
>     On 11.01.2022 12:21, Eric Joyner wrote:
>      > Hi,
>      >
>      > I came back from vacation yesterday, but I'll look into this for you
>      > today. It's not a good situation when changing the period of the
>     iflib
>      > timer breaks something in the driver...
>      >
>      > - Eric
>      >
>      > On Sun, Jan 9, 2022 at 8:19 PM Alexander Motin <mav@freebsd.org
>     <mailto:mav@freebsd.org>
>      > <mailto:mav@freebsd.org <mailto:mav@freebsd.org>>> wrote:
>      >
>      >     Hi Eric,
>      >
>      >     In 90bc1cf65778 I've done what looked like a trivial
>     optimization.  But
>      >     just after committing it I've noticed weird race it created
>     between
>      >     iflib_timer() and ixl_admin_timer() timers.  I see ixl(4) calls
>      >     iflib_admin_intr_deferred() every 0.5s, which calls
>     _task_fn_admin(),
>      >     which every time stops and restart txq->ift_timer callout (AKA
>      >     iflib_timer()), which actually has exactly the same period of
>     0.5s.  It
>      >     seems before my change iflib_timer() managed to run once for
>     all TX
>      >     queues before being stopped, but after the change due to
>     additional
>      >     jitter many of callouts are getting stopped before firing.
>      >
>      >     Could you please sched some light for me on what is going on
>     there?
>      >     That race between two callouts looks like potential bug to
>     me, working
>      >     by some coinncedence, especially considering iflib_timer()
>     period is
>      >     tunable.
>      >
>      >     Thanks in advance.
>      >
>      >     --
>      >     Alexander Motin
>      >
> 
>     -- 
>     Alexander Motin
> 
> ------------------------------------------------------------------------
> *Intel Technology Poland sp. z o.o.
> * ul. Słowackiego 173 | 80-298 Gdańsk | Sąd Rejonowy Gdańsk Północ | VII 
> Wydział Gospodarczy Krajowego Rejestru Sądowego - KRS 101882 | NIP 
> 957-07-52-316 | Kapitał zakładowy 200.000 PLN.
> 
> Ta wiadomość wraz z załącznikami jest przeznaczona dla określonego 
> adresata i może zawierać informacje poufne. W razie przypadkowego 
> otrzymania tej wiadomości, prosimy o powiadomienie nadawcy oraz trwałe 
> jej usunięcie; jakiekolwiek przeglądanie lub rozpowszechnianie jest 
> zabronione.
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). If you are not the intended 
> recipient, please contact the sender and delete all copies; any review 
> or distribution by others is strictly prohibited.
> 

-- 
Alexander Motin