From nobody Thu Dec 21 16:53:05 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SwxMN20fCz550cj for ; Thu, 21 Dec 2023 16:53:20 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SwxMN08zhz3PlP for ; Thu, 21 Dec 2023 16:53:20 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-55370780c74so1210442a12.1 for ; Thu, 21 Dec 2023 08:53:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1703177597; x=1703782397; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=3SiH5TwXpxn0G8RhsmOaPmvyOsgheDE+zkydkmnAfr4=; b=ZL+ujJCNy771CJER1ZBGBo7Utv4/1Wjrhhp3E1jPSyyr8wmJsLx8iRgV4qeKQKtSaC IgTVxnTYMEZF4bI7eE6pY1KqkTNnejgGFrWJv5MCSgiYOOzEIoJuDwZB2DLfnuipaHTW NdeTUBvAj8zPlBHky3xCw/d5Q8A5kYe8zB5JTPfUUu12J08WWOElGV5BHnK9mqBa9p+s DP8gixQtE9YqTMVBE9pMkhNEu5t2UPKvAgHdYdohdZ55r8lQ0Zq9VIWbIS9XN/QSBQzK H97TiwIWdtyaC/aV0iYbcmI5aUBNYF6f3DiIGBsK6HDEgztmSfnVS2tsr/MualRWCLgh TKVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703177597; x=1703782397; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3SiH5TwXpxn0G8RhsmOaPmvyOsgheDE+zkydkmnAfr4=; b=gIIwvuZs42iqr/PRONNxhxvHX0eE1efKR0a1ZByKGFkat6FqygmECIfwQQh/VP+lOD OEt3AElOYcHjKJxwrs5F7yAzv85l04MRHI+5H+mtPwaYpwBw5FJZlGMD5TbRQsvMkPsW +39Gdc60SOskmXmj/4vTvAMNQvP+TzvYWWLQ5J/oMsXVCg8qutjwJugEegoxSzXltYSi zHJFGSG+etXGjg2/BhsXjSvvVKS9QTtIIWY110g0kZuksLmf5vmJ5jVvcOT7dAVcUVtp q/u/bApHyVYahthhSmXXw9G59loyuDl1NbUK9eGLbGDiTPkV8mWlINATtFNN41vFIZvu k/RQ== X-Gm-Message-State: AOJu0YxmU47CA4p14s1GHtE4h4la+TMdMhRcjIO7HLByJfzarXbPMpA8 B6C3/2m7eNPUGw5cvWAvJOxC6iBYHmfW5HMm34VAPhMAgXAKpQ== X-Google-Smtp-Source: AGHT+IF99LuiRyBup7uq9Cxbp1Gok3NHtXWc1rXm4pFuGBRTd8+7HbCb+/KmsjvsElv1YFuyFCkJbg8vHWgnV0wGq/U= X-Received: by 2002:a17:906:3b0d:b0:a23:f86:f2c4 with SMTP id g13-20020a1709063b0d00b00a230f86f2c4mr45653ejf.67.1703177597559; Thu, 21 Dec 2023 08:53:17 -0800 (PST) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 References: <202312210421.3BL4Lb9T036317@gitrepo.freebsd.org> In-Reply-To: From: Warner Losh Date: Thu, 21 Dec 2023 09:53:05 -0700 Message-ID: Subject: Re: git: 9e6d11ce9a51 - main - vtnet: Adjust rx buffer so IP header 32-bit aligned To: Mark Johnston Cc: Warner Losh , src-committers , "" , "" Content-Type: multipart/alternative; boundary="000000000000b137bf060d07efbc" X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Spamd-Bar: ---- X-Rspamd-Queue-Id: 4SwxMN08zhz3PlP --000000000000b137bf060d07efbc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston wro= te: > On Thu, Dec 21, 2023 at 08:31:36AM -0700, Warner Losh wrote: > > On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston wrote: > > > > > On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote: > > > > The branch main has been updated by imp: > > > > > > > > URL: > > > > https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4e= 9188a2ba4 > > > > > > > > commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4 > > > > Author: Warner Losh > > > > AuthorDate: 2023-12-20 19:09:09 +0000 > > > > Commit: Warner Losh > > > > CommitDate: 2023-12-21 04:16:45 +0000 > > > > > > > > vtnet: Adjust rx buffer so IP header 32-bit aligned > > > > > > > > Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating > them. > > > > This improves performance everywhere, and allows armv7 to work = at > > > all. > > > > > > > > PR: 271288 (PR had a different fix than I > wound > > > up with) > > > > MFC After: 3 days > > > > Sponsored by: Netflix > > > > Differential Revision: https://reviews.freebsd.org/D43136 > > > > --- > > > > sys/dev/virtio/network/if_vtnet.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/sys/dev/virtio/network/if_vtnet.c > > > b/sys/dev/virtio/network/if_vtnet.c > > > > index 287af7751066..360176e4f845 100644 > > > > --- a/sys/dev/virtio/network/if_vtnet.c > > > > +++ b/sys/dev/virtio/network/if_vtnet.c > > > > @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc *sc, in= t > > > nbufs, struct mbuf **m_tailp) > > > > m_freem(m_head); > > > > return (NULL); > > > > } > > > > - > > > > m->m_len =3D size; > > > > + m_adj(m, ETHER_ALIGN); > > > > > > The driver is expecting to get a cluster of size > sc->vtnet_rx_clustersz, > > > but now it's getting one of size sc->vtnet_rx_clustersz - 2. I don't > > > see how this change can be sufficient on its own: what prevents virti= o > > > from writing sc->vtnet_rx_clustersz bytes and thereby overwriting the > > > two bytes following the cluster? > > > > > > > The only trouble that I saw was here: > > > > /* > > * Every mbuf should have the expected cluster size sin= ce > > that > > * is also used to allocate the replacements. > > */ > > KASSERT(m->m_len =3D=3D clustersz, > > ("%s: mbuf size %d not expected cluster size %d", > > __func__, > > m->m_len, clustersz)); > > > > and even that's fine: We can adjust that assert. The next lines I think > > make it fine: > > m->m_len =3D MIN(m->m_len, len); > > > > so we only use what we say is there to land bytes into the mbuf. I'm n= ow > > not sure though what to do about a few lines later: > > > > m_new =3D vtnet_rx_alloc_buf(sc, nreplace, &m_tail); > > if (m_new =3D=3D NULL) { > > m_prev->m_len =3D clustersz; > > return (ENOBUFS); > > } > > > > which likely needs to be adjusted (or the old saved size). > > I'm not certain about what it should be doing instead, but yes this code > looks wrong now. > Yea. We need to do this at time of use not time of alloc. > I likely didn't see asserts in my testing because I didn't have LRO slow > > path packets. > > > > Do you see other places that would need adjusting? > > Presumably vtnet_rx_cluster_size() needs to be adjusted too? > > I also don't understand why we do this adjustment unconditionally > instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all > other NIC drivers do. > None of the ones I looked at had this... maybe I got unlucky. I'd think we'd have better performance if things are aligned all the way... is there a reason not to? Warner > > > All the other places > > seem to use m_len correctly on the rx path, but I'm not a nic driver gu= y, > > and I might be missing something. > --000000000000b137bf060d07efbc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston &l= t;markj@freebsd.org> wrote:
=
On Thu, Dec 21, 2023 at 08:31:36AM -07= 00, Warner Losh wrote:
> On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston <markj@freebsd= .org> wrote:
>
> > On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote:
> > > The branch main has been updated by imp:
> > >
> > > URL:
> > https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4= e9188a2ba4
> > >
> > > commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4
> > > Author:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org&g= t;
> > > AuthorDate: 2023-12-20 19:09:09 +0000
> > > Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org&g= t;
> > > CommitDate: 2023-12-21 04:16:45 +0000
> > >
> > >=C2=A0 =C2=A0 =C2=A0vtnet: Adjust rx buffer so IP header 32-b= it aligned
> > >
> > >=C2=A0 =C2=A0 =C2=A0Call madj(m, ETHER_ALIGN) to offset rx bu= ffers when allocating them.
> > >=C2=A0 =C2=A0 =C2=A0This improves performance everywhere, and= allows armv7 to work at
> > all.
> > >
> > >=C2=A0 =C2=A0 =C2=A0PR:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0271288 (PR had a different fix than I= wound
> > up with)
> > >=C2=A0 =C2=A0 =C2=A0MFC After:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 3 days
> > >=C2=A0 =C2=A0 =C2=A0Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0Netflix
> > >=C2=A0 =C2=A0 =C2=A0Differential Revision:=C2=A0 https://reviews.freebsd.org/D43136
> > > ---
> > >=C2=A0 sys/dev/virtio/network/if_vtnet.c | 2 +-
> > >=C2=A0 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/sys/dev/virtio/network/if_vtnet.c
> > b/sys/dev/virtio/network/if_vtnet.c
> > > index 287af7751066..360176e4f845 100644
> > > --- a/sys/dev/virtio/network/if_vtnet.c
> > > +++ b/sys/dev/virtio/network/if_vtnet.c
> > > @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc = *sc, int
> > nbufs, struct mbuf **m_tailp)
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0m_freem(m_head);
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0return (NULL);
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> > > -
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m->= m_len =3D size;
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_adj(m, ET= HER_ALIGN);
> >
> > The driver is expecting to get a cluster of size sc->vtnet_rx_= clustersz,
> > but now it's getting one of size sc->vtnet_rx_clustersz - = 2.=C2=A0 I don't
> > see how this change can be sufficient on its own: what prevents v= irtio
> > from writing sc->vtnet_rx_clustersz bytes and thereby overwrit= ing the
> > two bytes following the cluster?
> >
>
> The only trouble that I saw was here:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/*
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Every = mbuf should have the expected cluster size since
> that
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * is als= o used to allocate the replacements.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0KASSERT(m= ->m_len =3D=3D clustersz,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0("%s: mbuf size %d not expected cluster size %d",
> __func__,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0m->m_len, clustersz));
>
> and even that's fine: We can adjust that assert. The next lines I = think
> make it fine:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m->m_l= en =3D MIN(m->m_len, len);
>
> so we only use what we say is there to land bytes into the mbuf.=C2=A0= I'm now
> not sure though what to do about a few lines later:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 m_new =3D vtnet_rx_alloc_buf(sc, nreplace, = &m_tail);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (m_new =3D=3D NULL) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_prev-&g= t;m_len =3D clustersz;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (E= NOBUFS);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>
> which likely needs to be adjusted (or the old saved size).

I'm not certain about what it should be doing instead, but yes this cod= e
looks wrong now.

Yea. We need to do this at time of use not time of alloc.

> I likely didn't see asserts in my testing because I didn't hav= e LRO slow
> path packets.
>
> Do you see other places that would need adjusting?

Presumably vtnet_rx_cluster_size() needs to be adjusted too?

I also don't understand why we do this adjustment unconditionally
instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all other NIC drivers do.

None of the ones I looked at had this... maybe I got u= nlucky. I'd think we'd have better performance if things are aligne= d all the way... is there a reason not to?

Warner

> All the other places
> seem to use m_len correctly on the rx path, but I'm not a nic driv= er guy,
> and I might be missing something.
--000000000000b137bf060d07efbc--