From nobody Mon Jan 13 08:22:51 2025 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YWld14L2vz5kn27 for ; Mon, 13 Jan 2025 08:23:01 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4YWld13NxFz3SVr; Mon, 13 Jan 2025 08:23:01 +0000 (UTC) (envelope-from zlei@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1736756581; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ULKJJGsDblUg+MCJperY5rqbZvz+56vdFjXBHpvemtQ=; b=sGPVPzJVfdHUDrGlrOA/sL1viVvtNhwcXXV962rTzHF4hTNUhAfaGnTX4mAzIUBzU8n3+1 O2DvNBoHmeiQAQ6+NES2yHWbMzxTt9/2UU257Ryx/3UZA4zvCWTydiawUYysz8X5P4ht9o R6mg/kB0CIXN33Wfk9BYHIyKuB8CXszfeg/Vf77vsdv2Hnb3wK7pr6cRUCgcI8/XeqF9zy AvcNuLiXiSoG4ePrIsYIWTw+qV71ItZ3kS9qvjHUVaVIr3D9BL2rGHGUoJxcxWJmk1K+hJ m32ZLKDKtGLdR3+yTEywIbIK/FbMs6Cn0Qe1Si7JOXdn1j6ipgfGGBm7bSi6KQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1736756581; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ULKJJGsDblUg+MCJperY5rqbZvz+56vdFjXBHpvemtQ=; b=afqDL3QcInbgUS6wFkC6jodnK8It7wn4Lt0A0Rx32Wzzc9Wdg22mMUUMb7V1o8IlwT9oYa xzhLki02ajGDvsFvs8P0JROevhege3h32QB0VnqGWjlHQ0BeC5xfj2JB0TQ99phZXkhthR FdT8djQOCvKooZH7MaDuoXn/mN3XW9B0XNBOFkEn5SHlOOZ5JUGY/TgRxXFPzqSyEw+w8S z7CKOvkXWcSqzVPRjtt7jXksLhyGSmjYRjqRdP173lKRBVdFDqvtnR5vTfwuCZ9JoR4nG5 H9xyAXUTiZPoNPT6KdKarw7NvtKQSw6ixei3RiisZYWm86vmj5vqgJzzwD74uA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1736756581; a=rsa-sha256; cv=none; b=PGyQW1DeQQgL+f8Eyi8db/1TOXSqF9c6NG3V19poSnYZiwuenu4TKqM9PJByyuyS2tmnBo lTsjvlq5e0uow1KWv5rjJ+Axj943MKYKf89rjV6uPHVjMgkM6PN1eFUUGwawQGjib3ouUc NzwsH2DffFfg5s70lu7k5z0RgivfWBQJ7mLGpNJ26M+zhHJAMZEeHTkRt++fKSvIIw8TiN gcne+VSkNiBAVXWzaF6EWnOVgS103Es6UdKA4h9xK+zObU/vtAGX6BLLI56D4lcQeaLq19 4l/IOTkYV9FKnX8KSFObSY7GVSJzK0j6Md64L/31qy4ikao/wfMw6jqaB68xfg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from smtpclient.apple (ns1.oxydns.net [45.32.91.63]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: zlei/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4YWlcz3kMWz1K8W; Mon, 13 Jan 2025 08:22:59 +0000 (UTC) (envelope-from zlei@FreeBSD.org) From: Zhenlei Huang Message-Id: <3CB8230B-7938-4503-AADD-7F691482908C@FreeBSD.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_E704F548-0365-47C9-9E73-361C3FC8B029" List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.10\)) Subject: Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver Date: Mon, 13 Jan 2025 16:22:51 +0800 In-Reply-To: Cc: Warner Losh , stable@freebsd.org, Edward Tomasz Napierala To: Warner Losh References: <0DDE1B66-B794-472D-A901-54FA2FF1E853@FreeBSD.org> X-Mailer: Apple Mail (2.3696.120.41.1.10) --Apple-Mail=_E704F548-0365-47C9-9E73-361C3FC8B029 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Dec 28, 2024, at 6:03 AM, Warner Losh wrote: >=20 >=20 >=20 > On Mon, Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei Huang > wrote: > Hi Warner, >=20 > Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird = report for sas speed. > The boot console has the following, >=20 > ``` > da0 at pvscsi0 bus 0 scbus2 target 0 lun 0 > da0: Fixed Direct Access SPC-4 SCSI device > da0: 4294967.295MB/s transfers > ``` > But camcontrol report the correct value, > ``` > # camcontrol inquiry da0 -R > pass1: 750.000MB/s transfers, Command Queueing Enabled > ``` >=20 > The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any = logic set those values. >=20 > Finally I managed to get the stack trace, > ``` > _scsi_announce_periph > scsi_announce_periph_sbuf > xpt_announce_periph_sbuf > dadone_proberc > xpt_done_process > xpt_done_td > fork_exit > fork_trampoline > ``` >=20 > and noticed that the last param `cts` of `_scsi_announce_periph(struct = cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings = *cts)` > is from kernel stack and is not properly initialized, latter I found = some commits related to this, >=20 > 076686fe0703 cam: make sure to clear CCBs allocated on the stack > ec5325dbca62 cam: make sure to clear even more CCBs allocated on the = stack > 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB. > 616a676a0535 cam: clear stack-allocated CCB in the target layer >=20 > I applied them to stable/13, rebuild and reboot, now the speed of da0 = is reported correctly. I also tried to patch the pvscsi driver with few = lines and > it also works as intended. >=20 > ``` > --- a/sys/dev/vmware/pvscsi/pvscsi.c > +++ b/sys/dev/vmware/pvscsi/pvscsi.c > @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb = *ccb) > cts->proto_specific.scsi.flags =3D = CTS_SCSI_FLAGS_TAG_ENB; > cts->proto_specific.scsi.valid =3D CTS_SCSI_VALID_TQ; >=20 > + /* Prefer connection speed over sas port speed */ > + /* cts->xport_specific.sas.bitrate =3D 0; */ > + cts->xport_specific.sas.valid =3D 0; > + > ccb_h->status =3D CAM_REQ_CMP; > xpt_done(ccb); > ``` >=20 > Things come clear and I know why this weird speed happens, now it is = time to decide how to fix it. >=20 > Fixing the consumer of cam, aka pvscsi driver, is quite simple and = promising. I did a quick search it appears other consumers set = `cts->xport_specific.sas.valid` correctly. It does not convince me as = I'm quite new to cam subsystem. >=20 > Yes. sas.valid is set when the sas.bitrate and other data has been set = correctly. Setting it to 0 like this ensures it's ignored. So if you = know the speed, set sas.bitrate to that speed and sas.valid to 1. That is clear. >=20 > I'm not sure I answered the question right, but if not, please clarify = or point out what I missed and I'll try again. > =20 > Which one do you prefer, MFC commits to stable/13, or do direct fix = for pvscsi driver to stable/13 ? >=20 > [[ Sorry for the delay, I missed this all month ]] >=20 > I generally prefer a MFC, unless the code is no longer in -current. The code live in -current and all supported stable branches. > Even if there's two different fixes for this logical bug, fixing it in = current, then MFCing that (with the current hash) is fine, even if the = stable/13 changes are completely different. The bug does not exist in current and stable/14 but only in stable/13, = due to Edward's work ( commits those zero stack-allocated CCBs ) were = not MFCed into stable/13 branch. > For stable/13 I guess it matters a bit less than stable/14 since I'll = be merging to it less, but if it's a commit from -current that doesn't = need to be made to -stable because of the new commit on stable, I tend = to include the MFC hash text. Do you mean the `cherry picked from commit` commit message ? >=20 > Warner I'm preparing and testing the MFCing. Bless me to not make things messed = up :) Best regards, Zhenlei --Apple-Mail=_E704F548-0365-47C9-9E73-361C3FC8B029 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On Dec 28, 2024, at 6:03 AM, Warner Losh <imp@bsdimp.com> = wrote:



On Mon, = Dec 2, 2024 at 2:41=E2=80=AFAM Zhenlei Huang <zlei@freebsd.org> = wrote:
Hi Warner,

Recently I = upgraded some ESXi vms from 13.3 to 13.4 and noticed weird report for = sas speed.
The boot console has the following,

```
da0 at pvscsi0 bus 0 scbus2 = target 0 lun 0
da0: <VMware Virtual disk 2.0> Fixed = Direct Access SPC-4 SCSI device
da0: 4294967.295MB/s = transfers
```
But camcontrol report the = correct value,
```
# camcontrol inquiry da0 = -R
pass1: 750.000MB/s transfers, Command Queueing = Enabled
```

The = `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any logic = set those values.

Finally I managed to get = the stack trace,
```
_scsi_announce_periph
scsi_announce_periph_sbuf
xpt_announce_periph_sbuf
dadone_proberc
xpt_done_process
xpt_done_td
fork_exit
fork_trampoline
```

and noticed that the last param `cts` of = `_scsi_announce_periph(struct cam_periph *periph, u_int *speed, u_int = *freq, struct ccb_trans_settings *cts)`
is from kernel = stack and is not properly initialized, latter I found some commits = related to this,

076686fe0703 cam: make = sure to clear CCBs allocated on the stack
ec5325dbca62 = cam: make sure to clear even more CCBs allocated on the stack
0f206cc91279 cam: add missing zeroing of a stack-allocated = CCB.
616a676a0535 cam: clear stack-allocated CCB in the = target layer

I applied them to stable/13, = rebuild and reboot, now the speed of da0 is reported correctly. I also = tried to patch the pvscsi driver with few lines and
it = also works as intended.

```
---= a/sys/dev/vmware/pvscsi/pvscsi.c
+++ = b/sys/dev/vmware/pvscsi/pvscsi.c
@@ -1444,6 +1444,10 @@ = pvscsi_action(struct cam_sim *sim, union ccb *ccb)
  =               cts->proto_specific.scsi.f= lags =3D CTS_SCSI_FLAGS_TAG_ENB;
      =           cts->proto_specific.scsi.v= alid =3D CTS_SCSI_VALID_TQ;

+    =            /* Prefer connection speed over = sas port speed */
+          =      /* cts->xport_specific.sas.bitrate =3D 0; */
+              =  cts->xport_specific.sas.valid =3D 0;
+
                ccb_h->status =3D = CAM_REQ_CMP;
            =     xpt_done(ccb);
```

Things come clear and I know = why this weird speed happens, now it is time to decide how to fix it.

Fixing the consumer of cam, aka pvscsi driver, = is quite simple and promising. I did a quick search it appears other = consumers set `cts->xport_specific.sas.valid` correctly. It does not = convince me as I'm quite new to cam subsystem.

Yes. sas.valid is set when the sas.bitrate and other data has = been set correctly. Setting it to 0 like this ensures it's = ignored.  So if you know the speed, set sas.bitrate to that speed = and sas.valid to 1.

That is clear.


I'm not sure = I answered the question right, but if not, please clarify or point out = what I missed and I'll try again.
 
Which one do = you prefer, MFC commits to stable/13, or do direct fix for pvscsi driver = to stable/13 ?

[[ Sorry for the delay, I missed this = all month ]]

I = generally prefer a MFC, unless the code is no longer in -current. =

The code = live in -current and all supported stable branches.

Even if there's two different = fixes for this logical bug, fixing it in current, then MFCing that (with = the current hash) is fine, even if the stable/13 changes are completely = different.

The bug does not exist in current and stable/14 = but only in stable/13, due to Edward's work ( commits those zero = stack-allocated CCBs ) were not MFCed into stable/13 = branch.

For stable/13 I guess it matters a bit less than = stable/14 since I'll be merging to it less, but if it's a commit from = -current that doesn't need to be made to -stable because of the new = commit on stable, I tend to include the MFC hash = text.

Do = you mean the `cherry picked from commit` commit message ?


Warner

I'm preparing and testing the MFCing. = Bless me to not make things messed up :)

Best regards,
Zhenlei

= --Apple-Mail=_E704F548-0365-47C9-9E73-361C3FC8B029--