kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST
Jack Vogel
jfvogel at gmail.com
Fri Apr 19 19:40:01 UTC 2013
The following reply was made to PR kern/176446; it has been noted by GNATS.
From: Jack Vogel <jfvogel at gmail.com>
To: John Baldwin <jhb at freebsd.org>
Cc: FreeBSD Net <freebsd-net at freebsd.org>, bug-followup at freebsd.org,
Mike Karels <mike at karels.net>
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving
out-of-order packet process and spurious RST
Date: Fri, 19 Apr 2013 12:32:59 -0700
--14dae9cdc48767db0904dabbc98d
Content-Type: text/plain; charset=ISO-8859-1
Thanks John, I'm incorporating your changes into my source tree. I also
plan on changing the
"glue" between mq_start and mq_start_locked on igb after some UDP testing
that was done, and
believe ixgbe should follow suit. Results there have shown the latency is
just too high if I only use
the task_enqueue... What works best is to always queue to the buf ring, but
then also always to
do the TRY_LOCK. I will update HEAD as soon as I handle an internal
firedrill I have today :)
Jack
On Fri, Apr 19, 2013 at 9:27 AM, John Baldwin <jhb at freebsd.org> wrote:
> A second patch. This is not something I mentioned before, but I had this
> in
> my checkout. In the legacy IRQ case this could also result in out-of-order
> processing. It also fixes a potential OACTIVE-stuck type bug that we used
> to
> have in igb. I have no way to test this, so it would be good if some other
> folks could test this.
>
> The patch changes ixgbe_txeof() return void and changes the few places that
> checked its return value to ignore it. While it is true that ixgbe has a
> tx
> processing limit (which I think is dubious.. TX completion processing is
> very
> cheap unlike RX processing, so it seems to me like it should always run to
> completion as in igb), in the common case I think the result will be to do
> what igb used to do: poll the ring at 100% CPU (either in the interrupt
> handler or in the task it keeps rescheduling) waiting for pending TX
> packets
> to be completed (which is pointless: the host CPU can't make the NIC
> transmit
> packets any faster by polling).
>
> It also changes the interrupt handlers to restart packet transmission
> synchronously rather than always deferring that to a task (the former is
> what
> (nearly) all other drivers do). It also fixes the interrupt handlers to be
> consistent (one looped on txeof but not the others). In the case of the
> legacy interrupt handler it is possible it could fail to restart packet
> transmission if there were no pending RX packets after rxeof returned and
> txeof fully cleaned its ring without this change.
>
> It also fixes the legacy interrupt handler to not re-enable the interrupt
> if
> it schedules the task but to wait until the task completes (this could
> result
> in concurrent, out-of-order RX processing).
>
> Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c
> ===================================================================
> --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c (revision
> 249553)
> +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c (working
> copy)
> @@ -149,7 +149,7 @@
> static void ixgbe_enable_intr(struct adapter *);
> static void ixgbe_disable_intr(struct adapter *);
> static void ixgbe_update_stats_counters(struct adapter *);
> -static bool ixgbe_txeof(struct tx_ring *);
> +static void ixgbe_txeof(struct tx_ring *);
> static bool ixgbe_rxeof(struct ix_queue *);
> static void ixgbe_rx_checksum(u32, struct mbuf *, u32);
> static void ixgbe_set_promisc(struct adapter *);
> @@ -1431,7 +1414,10 @@
> }
>
> /* Reenable this interrupt */
> - ixgbe_enable_queue(adapter, que->msix);
> + if (que->res != NULL)
> + ixgbe_enable_queue(adapter, que->msix);
> + else
> + ixgbe_enable_intr(adapter);
> return;
> }
>
> @@ -1449,8 +1435,9 @@
> struct adapter *adapter = que->adapter;
> struct ixgbe_hw *hw = &adapter->hw;
> struct tx_ring *txr = adapter->tx_rings;
> - bool more_tx, more_rx;
> - u32 reg_eicr, loop = MAX_LOOP;
> + struct ifnet *ifp = adapter->ifp;
> + bool more;
> + u32 reg_eicr;
>
>
> reg_eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
> @@ -1461,17 +1448,19 @@
> return;
> }
>
> - more_rx = ixgbe_rxeof(que);
> + more = ixgbe_rxeof(que);
>
> IXGBE_TX_LOCK(txr);
> - do {
> - more_tx = ixgbe_txeof(txr);
> - } while (loop-- && more_tx);
> + ixgbe_txeof(txr);
> +#if __FreeBSD_version >= 800000
> + if (!drbr_empty(ifp, txr->br))
> + ixgbe_mq_start_locked(ifp, txr, NULL);
> +#else
> + if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
> + ixgbe_start_locked(txr, ifp);
> +#endif
> IXGBE_TX_UNLOCK(txr);
>
> - if (more_rx || more_tx)
> - taskqueue_enqueue(que->tq, &que->que_task);
> -
> /* Check for fan failure */
> if ((hw->phy.media_type == ixgbe_media_type_copper) &&
> (reg_eicr & IXGBE_EICR_GPI_SDP1)) {
> @@ -1484,7 +1473,10 @@
> if (reg_eicr & IXGBE_EICR_LSC)
> taskqueue_enqueue(adapter->tq, &adapter->link_task);
>
> - ixgbe_enable_intr(adapter);
> + if (more)
> + taskqueue_enqueue(que->tq, &que->que_task);
> + else
> + ixgbe_enable_intr(adapter);
> return;
> }
>
> @@ -1501,27 +1493,24 @@
> struct adapter *adapter = que->adapter;
> struct tx_ring *txr = que->txr;
> struct rx_ring *rxr = que->rxr;
> - bool more_tx, more_rx;
> + struct ifnet *ifp = adapter->ifp;
> + bool more;
> u32 newitr = 0;
>
> ixgbe_disable_queue(adapter, que->msix);
> ++que->irqs;
>
> - more_rx = ixgbe_rxeof(que);
> + more = ixgbe_rxeof(que);
>
> IXGBE_TX_LOCK(txr);
> - more_tx = ixgbe_txeof(txr);
> - /*
> - ** Make certain that if the stack
> - ** has anything queued the task gets
> - ** scheduled to handle it.
> - */
> + ixgbe_txeof(txr);
> #ifdef IXGBE_LEGACY_TX
> if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd))
> + ixgbe_start_locked(txr, ifp);
> #else
> - if (!drbr_empty(adapter->ifp, txr->br))
> + if (!drbr_empty(ifp, txr->br))
> + ixgbe_mq_start_locked(ifp, txr, NULL);
> #endif
> - more_tx = 1;
> IXGBE_TX_UNLOCK(txr);
>
> /* Do AIM now? */
> @@ -1575,7 +1564,7 @@
> rxr->packets = 0;
>
> no_calc:
> - if (more_tx || more_rx)
> + if (more)
> taskqueue_enqueue(que->tq, &que->que_task);
> else /* Reenable this interrupt */
> ixgbe_enable_queue(adapter, que->msix);
> @@ -3557,7 +3545,7 @@
> * tx_buffer is put back on the free queue.
> *
> **********************************************************************/
> -static bool
> +static void
> ixgbe_txeof(struct tx_ring *txr)
> {
> struct adapter *adapter = txr->adapter;
> @@ -3605,13 +3593,13 @@
> IXGBE_CORE_UNLOCK(adapter);
> IXGBE_TX_LOCK(txr);
> }
> - return FALSE;
> + return;
> }
> #endif /* DEV_NETMAP */
>
> if (txr->tx_avail == txr->num_desc) {
> txr->queue_status = IXGBE_QUEUE_IDLE;
> - return FALSE;
> + return;
> }
>
> /* Get work starting point */
> @@ -3705,12 +3693,8 @@
> if ((!processed) && ((ticks - txr->watchdog_time) >
> IXGBE_WATCHDOG))
> txr->queue_status = IXGBE_QUEUE_HUNG;
>
> - if (txr->tx_avail == txr->num_desc) {
> + if (txr->tx_avail == txr->num_desc)
> txr->queue_status = IXGBE_QUEUE_IDLE;
> - return (FALSE);
> - }
> -
> - return TRUE;
> }
>
> /*********************************************************************
>
>
> --
> John Baldwin
>
--14dae9cdc48767db0904dabbc98d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div><div><div><div>Thanks John, I'm incorporating you=
r changes into my source tree. I also plan on changing the<br></div>"g=
lue" between mq_start and mq_start_locked on igb after some UDP testin=
g that was done, and<br>
</div>believe ixgbe should follow suit. Results there have shown the latenc=
y is just too high if I only use<br>the task_enqueue... What works best is =
to always queue to the buf ring, but then also always to<br></div>do the TR=
Y_LOCK. I will update HEAD as soon as I handle an internal firedrill I have=
today :)<br>
<br></div>Jack<br><br></div><div class=3D"gmail_extra"><br><br><div class=
=3D"gmail_quote">On Fri, Apr 19, 2013 at 9:27 AM, John Baldwin <span dir=3D=
"ltr"><<a href=3D"mailto:jhb at freebsd.org" target=3D"_blank">jhb at freebsd.=
org</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">A second patch. =A0This is not something I m=
entioned before, but I had this in<br>
my checkout. =A0In the legacy IRQ case this could also result in out-of-ord=
er<br>
processing. =A0It also fixes a potential OACTIVE-stuck type bug that we use=
d to<br>
have in igb. =A0I have no way to test this, so it would be good if some oth=
er<br>
folks could test this.<br>
<br>
The patch changes ixgbe_txeof() return void and changes the few places that=
<br>
checked its return value to ignore it. =A0While it is true that ixgbe has a=
tx<br>
processing limit (which I think is dubious.. TX completion processing is ve=
ry<br>
cheap unlike RX processing, so it seems to me like it should always run to<=
br>
completion as in igb), in the common case I think the result will be to do<=
br>
what igb used to do: poll the ring at 100% CPU (either in the interrupt<br>
handler or in the task it keeps rescheduling) waiting for pending TX packet=
s<br>
to be completed (which is pointless: the host CPU can't make the NIC tr=
ansmit<br>
packets any faster by polling).<br>
<br>
It also changes the interrupt handlers to restart packet transmission<br>
synchronously rather than always deferring that to a task (the former is wh=
at<br>
(nearly) all other drivers do). =A0It also fixes the interrupt handlers to =
be<br>
consistent (one looped on txeof but not the others). =A0In the case of the<=
br>
legacy interrupt handler it is possible it could fail to restart packet<br>
transmission if there were no pending RX packets after rxeof returned and<b=
r>
txeof fully cleaned its ring without this change.<br>
<br>
It also fixes the legacy interrupt handler to not re-enable the interrupt i=
f<br>
it schedules the task but to wait until the task completes (this could resu=
lt<br>
in concurrent, out-of-order RX processing).<br>
<div class=3D"im"><br>
Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c<br>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
--- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c =A0 =A0 =A0 (revi=
sion 249553)<br>
+++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c =A0 =A0 =A0 (work=
ing copy)<br>
</div>@@ -149,7 +149,7 @@<br>
=A0static void =A0 =A0 ixgbe_enable_intr(struct adapter *);<br>
=A0static void =A0 =A0 ixgbe_disable_intr(struct adapter *);<br>
=A0static void =A0 =A0 ixgbe_update_stats_counters(struct adapter *);<br>
-static bool =A0 =A0ixgbe_txeof(struct tx_ring *);<br>
+static void =A0 =A0ixgbe_txeof(struct tx_ring *);<br>
=A0static bool =A0 =A0ixgbe_rxeof(struct ix_queue *);<br>
=A0static void =A0 =A0ixgbe_rx_checksum(u32, struct mbuf *, u32);<br>
=A0static void =A0 =A0 ixgbe_set_promisc(struct adapter *);<br>
@@ -1431,7 +1414,10 @@<br>
=A0 =A0 =A0 =A0 }<br>
<br>
=A0 =A0 =A0 =A0 /* Reenable this interrupt */<br>
- =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix);<br>
+ =A0 =A0 =A0 if (que->res !=3D NULL)<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix);<br=
>
+ =A0 =A0 =A0 else<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br>
=A0 =A0 =A0 =A0 return;<br>
=A0}<br>
<br>
@@ -1449,8 +1435,9 @@<br>
=A0 =A0 =A0 =A0 struct adapter =A0*adapter =3D que->adapter;<br>
=A0 =A0 =A0 =A0 struct ixgbe_hw *hw =3D &adapter->hw;<br>
=A0 =A0 =A0 =A0 struct =A0 =A0 =A0 =A0 =A0tx_ring *txr =3D adapter->tx_r=
ings;<br>
- =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more_tx, more_rx;<br>
- =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 reg_eicr, loop =3D MAX_LOOP;<br>
+ =A0 =A0 =A0 struct ifnet =A0 =A0*ifp =3D adapter->ifp;<br>
+ =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more;<br>
+ =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 reg_eicr;<br>
<br>
<br>
=A0 =A0 =A0 =A0 reg_eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR);<br>
@@ -1461,17 +1448,19 @@<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br>
=A0 =A0 =A0 =A0 }<br>
<br>
- =A0 =A0 =A0 more_rx =3D ixgbe_rxeof(que);<br>
+ =A0 =A0 =A0 more =3D ixgbe_rxeof(que);<br>
<br>
=A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br>
- =A0 =A0 =A0 do {<br>
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 more_tx =3D ixgbe_txeof(txr);<br>
- =A0 =A0 =A0 } while (loop-- && more_tx);<br>
+ =A0 =A0 =A0 ixgbe_txeof(txr);<br>
+#if __FreeBSD_version >=3D 800000<br>
+ =A0 =A0 =A0 if (!drbr_empty(ifp, txr->br))<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_mq_start_locked(ifp, txr, NULL);<br>
+#else<br>
+ =A0 =A0 =A0 if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_start_locked(txr, ifp);<br>
+#endif<br>
=A0 =A0 =A0 =A0 IXGBE_TX_UNLOCK(txr);<br>
<br>
- =A0 =A0 =A0 if (more_rx || more_tx)<br>
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->qu=
e_task);<br>
-<br>
=A0 =A0 =A0 =A0 /* Check for fan failure */<br>
=A0 =A0 =A0 =A0 if ((hw->phy.media_type =3D=3D ixgbe_media_type_copper) =
&&<br>
=A0 =A0 =A0 =A0 =A0 =A0 (reg_eicr & IXGBE_EICR_GPI_SDP1)) {<br>
@@ -1484,7 +1473,10 @@<br>
=A0 =A0 =A0 =A0 if (reg_eicr & IXGBE_EICR_LSC)<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(adapter->tq, &adap=
ter->link_task);<br>
<br>
- =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br>
+ =A0 =A0 =A0 if (more)<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->qu=
e_task);<br>
+ =A0 =A0 =A0 else<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br>
=A0 =A0 =A0 =A0 return;<br>
=A0}<br>
<br>
@@ -1501,27 +1493,24 @@<br>
=A0 =A0 =A0 =A0 struct adapter =A0*adapter =3D que->adapter;<br>
=A0 =A0 =A0 =A0 struct tx_ring =A0*txr =3D que->txr;<br>
=A0 =A0 =A0 =A0 struct rx_ring =A0*rxr =3D que->rxr;<br>
- =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more_tx, more_rx;<br>
+ =A0 =A0 =A0 struct ifnet =A0 =A0*ifp =3D adapter->ifp;<br>
+ =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more;<br>
=A0 =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 newitr =3D 0;<br>
<br>
=A0 =A0 =A0 =A0 ixgbe_disable_queue(adapter, que->msix);<br>
=A0 =A0 =A0 =A0 ++que->irqs;<br>
<br>
- =A0 =A0 =A0 more_rx =3D ixgbe_rxeof(que);<br>
+ =A0 =A0 =A0 more =3D ixgbe_rxeof(que);<br>
<br>
=A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br>
- =A0 =A0 =A0 more_tx =3D ixgbe_txeof(txr);<br>
- =A0 =A0 =A0 /*<br>
- =A0 =A0 =A0 ** Make certain that if the stack<br>
- =A0 =A0 =A0 ** has anything queued the task gets<br>
- =A0 =A0 =A0 ** scheduled to handle it.<br>
- =A0 =A0 =A0 */<br>
+ =A0 =A0 =A0 ixgbe_txeof(txr);<br>
=A0#ifdef IXGBE_LEGACY_TX<br>
=A0 =A0 =A0 =A0 if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd))<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_start_locked(txr, ifp);<br>
=A0#else<br>
- =A0 =A0 =A0 if (!drbr_empty(adapter->ifp, txr->br))<br>
+ =A0 =A0 =A0 if (!drbr_empty(ifp, txr->br))<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_mq_start_locked(ifp, txr, NULL);<br>
=A0#endif<br>
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 more_tx =3D 1;<br>
=A0 =A0 =A0 =A0 IXGBE_TX_UNLOCK(txr);<br>
<br>
=A0 =A0 =A0 =A0 /* Do AIM now? */<br>
@@ -1575,7 +1564,7 @@<br>
=A0 =A0 =A0 =A0 =A0rxr->packets =3D 0;<br>
<br>
=A0no_calc:<br>
- =A0 =A0 =A0 if (more_tx || more_rx)<br>
+ =A0 =A0 =A0 if (more)<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->=
que_task);<br>
=A0 =A0 =A0 =A0 else /* Reenable this interrupt */<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix);<=
br>
@@ -3557,7 +3545,7 @@<br>
=A0 * =A0tx_buffer is put back on the free queue.<br>
=A0 *<br>
=A0 **********************************************************************/=
<br>
-static bool<br>
+static void<br>
=A0ixgbe_txeof(struct tx_ring *txr)<br>
=A0{<br>
=A0 =A0 =A0 =A0 struct adapter =A0 =A0 =A0 =A0 =A0*adapter =3D txr->adap=
ter;<br>
@@ -3605,13 +3593,13 @@<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IXGBE_CORE_UNLOCK(adapter);=
<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }<br>
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return FALSE;<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br>
=A0 =A0 =A0 =A0 }<br>
=A0#endif /* DEV_NETMAP */<br>
<br>
=A0 =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc) {<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_IDLE;<=
br>
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return FALSE;<br>
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br>
=A0 =A0 =A0 =A0 }<br>
<br>
=A0 =A0 =A0 =A0 /* Get work starting point */<br>
@@ -3705,12 +3693,8 @@<br>
=A0 =A0 =A0 =A0 if ((!processed) && ((ticks - txr->watchdog_time=
) > IXGBE_WATCHDOG))<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_HUNG;<=
br>
<br>
- =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc) {<br>
+ =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc)<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_IDLE;<=
br>
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (FALSE);<br>
- =A0 =A0 =A0 }<br>
-<br>
- =A0 =A0 =A0 return TRUE;<br>
=A0}<br>
<br>
=A0/*********************************************************************<b=
r>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
<br>
--<br>
John Baldwin<br>
</font></span></blockquote></div><br></div>
--14dae9cdc48767db0904dabbc98d--
More information about the freebsd-net
mailing list