From nobody Sat Oct 09 14:44:22 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id DBBF412DE7E5 for ; Sat, 9 Oct 2021 14:44:34 +0000 (UTC) (envelope-from timp87@gmail.com) Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HRSWQ5Wvcz3GF8 for ; Sat, 9 Oct 2021 14:44:34 +0000 (UTC) (envelope-from timp87@gmail.com) Received: by mail-ed1-x52c.google.com with SMTP id p13so48293903edw.0 for ; Sat, 09 Oct 2021 07:44:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4HYOoyyGiwLpKQmKjMGSU3LKG6jFXMM4PHeTjSbv+PE=; b=M5BFqCKXVkqkl4lHCuUTMmKbdaIF8EHMO7l8FuQFA/ofHqw2bOhJYrlVP8GNvOwnHO xXmYF2snQi5eEcV9DPiGx02B9XQruHnRjC5SH2zCP98isrneHpQ/zo5LtzGFbfbApHJK zgm6JDT629HAaBoUKhj3osVrhnPepKjy4S7mhQ3doUTyDUl1GKmTJ76ZBookgQr+Hiqo MD1Ycrla3xM0t/kCDtekIevCZAreDMbqkdb/hembrvKGDGXL+7cLSO6yE3xdzu5mWhKH 1uiU/aIYngime8ckROI+iRE65iAAV2/mK3viydnOc33vXgrZKcWqIto0+SzNX8bDXqFP ejxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4HYOoyyGiwLpKQmKjMGSU3LKG6jFXMM4PHeTjSbv+PE=; b=UxHz7T+wb8dG9ZaTHkZy4GV1utsgOdhWk+oF95XDn0ql7k3D0osQM5hJFylv2c9ED3 BZh5L25qwk6lb/769/p8iG/xgeqSjJHiUZ0/0EPzoEKZvB9Z8uA+85yPVxW7VwdE97Cj /0qs7B6S3WREVHQXc9/b72WS+maqlZd7WCFIs/LoW64QbzDqPOyQQXTby52SVI3t2t8r 3f3aDnm59uSyBHdN5Xn6/sfQowBjEmsEE9xIzwkoA7yPtTkbTWMthrk8VniU15MY4oCu itip9Tg9oHpHOVdacjTnfQMDV4D7ON9qHMPsUuJpviwMTaffxthwxftn+Rf41vRB3WQm 6hpA== X-Gm-Message-State: AOAM531I7OIoA9iCmglEhjGNwpGxTjoVkJetwqWoLieYFSut7QUkA8r+ NrxwN4bOeBMaekeSygk+xPNuW1wwf048kybW348= X-Google-Smtp-Source: ABdhPJwjz2kIquXMfCvTzzVtxMtIrNQW0sZKJzdk1APrAW7/xdOhxfispsZJbMyj33tM0uXnxelpG1RzoRHZpHhK8F4= X-Received: by 2002:a17:906:2506:: with SMTP id i6mr12032476ejb.186.1633790673446; Sat, 09 Oct 2021 07:44:33 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Pavel Timofeev Date: Sat, 9 Oct 2021 08:44:22 -0600 Message-ID: Subject: Re: Dell Latitude 7400 - nvme0: Missing interrupt To: Warner Losh Cc: Chuck Tuffli , freebsd-current Content-Type: multipart/alternative; boundary="000000000000ba6db705cdec8812" X-Rspamd-Queue-Id: 4HRSWQ5Wvcz3GF8 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --000000000000ba6db705cdec8812 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =D0=BF=D1=82, 8 =D0=BE=D0=BA=D1=82. 2021 =D0=B3. =D0=B2 14:49, Warner Losh = : > > > On Fri, Oct 8, 2021 at 2:42 PM Pavel Timofeev wrote: > >> >> >> =D1=81=D0=B1, 21 =D0=B0=D0=B2=D0=B3. 2021 =D0=B3. =D0=B2 15:22, Warner L= osh : >> >>> >>> >>> On Sat, Aug 21, 2021 at 3:06 PM Pavel Timofeev wrote= : >>> >>>> >>>> >>>> Warner Losh : >>>> >>>>> >>>>> >>>>> On Fri, Aug 20, 2021 at 10:42 PM Pavel Timofeev >>>>> wrote: >>>>> >>>>>> Pavel Timofeev : >>>>>> >>>>>> > >>>>>> > Chuck Tuffli : >>>>>> > >>>>>> >> On Mon, Aug 16, 2021 at 7:43 PM Pavel Timofeev >>>>>> wrote: >>>>>> >> > >>>>>> >> > Hello >>>>>> >> > I've got a Dell Latitude 7400 and tried installing the latest >>>>>> >> 14.0-CURRENT >>>>>> >> > (main-n248636-d20e9e02db3) on it. >>>>>> >> > Despite other things the weird one which concerns me is >>>>>> >> > nvme0: Missing interrupt >>>>>> >> > message I get sometimes on the console. >>>>>> >> > It seems like I get it only after the reboot of the laptop, i. >>>>>> e. not >>>>>> >> > getting that message if I power cycle the laptop, at least I >>>>>> haven't >>>>>> >> seen >>>>>> >> > them for now in such cases. >>>>>> >> > So when the laptop is rebooted I can't even take advantage of >>>>>> >> > nvmecontrol(8) quickly. >>>>>> >> > Well, it still works, but it takes tens of seconds to return th= e >>>>>> output. >>>>>> >> ... >>>>>> >> > dmesg when power cycled - >>>>>> >> > >>>>>> https://drive.google.com/file/d/1dB27oB1O2CcnZy6DvOOhmFO8SN8V8SwJ >>>>>> >> > dmesg when rebooted - >>>>>> >> > >>>>>> https://drive.google.com/file/d/1DsKTMkihp_OmUcirByLaVO4o2mU38Bxh >>>>>> >> >>>>>> >> I'm sort of curious about the time stamps for the log messages in >>>>>> the >>>>>> >> failing case. Something like: >>>>>> >> >>>>>> >> $ grep "nv\(me\|d\)" /var/log/messages >>>>>> >> >>>>>> >> --chuck >>>>>> >> >>>>>> > >>>>>> > Well, I can't see timestamps in the verbose boot log. Am I missing >>>>>> some >>>>>> > configuration for that? >>>>>> > >>>>>> > $ grep "nv\(me\|d\)" /var/log/messages >>>>>> > nvme0: mem >>>>>> > 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff >>>>>> at device >>>>>> > 0.0 on pci6 >>>>>> > nvme0: attempting to allocate 5 MSI-X vectors (17 supported) >>>>>> > nvme0: using IRQs 133-137 for MSI-X >>>>>> > nvme0: CapLo: 0x140103ff: MQES 1023, CQR, TO 20 >>>>>> > nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, MPSMIN 0, MPSMAX = 0 >>>>>> > nvme0: Version: 0x00010300: 1.3 >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvme0: Missing interrupt >>>>>> > nvd0: NVMe namespace >>>>>> > GEOM: new disk nvd0 >>>>>> > nvd0: 488386MB (1000215216 512 byte sectors) >>>>>> > >>>>>> >>>>>> >>>>>> Ah, sorry, provided wrong output. >>>>>> Here is what you requested: >>>>>> $ grep "nv\(me\|d\)" /var/log/messages >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: mem >>>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at >>>>>> device >>>>>> 0.0 on pci6 >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: attempting to allocate 5 MSI= -X >>>>>> vectors (17 supported) >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: using IRQs 133-137 for MSI-X >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapLo: 0x140103ff: MQES 1023= , >>>>>> CQR, >>>>>> TO 20 >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapHi: 0x00000030: DSTRD 0, >>>>>> NSSRS, >>>>>> CSS 1, MPSMIN 0, MPSMAX 0 >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Version: 0x00010300: 1.3 >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>> Aug 21 04:34:36 nostromo kernel: nvd0: >>>>>> NVMe >>>>>> namespace >>>>>> Aug 21 04:34:36 nostromo kernel: GEOM: new disk nvd0 >>>>>> Aug 21 04:34:36 nostromo kernel: nvd0: 488386MB (1000215216 512 byte >>>>>> sectors) >>>>>> Aug 21 04:34:42 nostromo kernel: nvme0: Missing interrupt >>>>>> Aug 21 04:35:36 nostromo kernel: nvme0: Missing interrupt >>>>>> Aug 21 04:35:50 nostromo kernel: nvme0: Missing interrupt >>>>>> >>>>> >>>>> What happens if you set hw.nvme.use_nvd=3D0 and hw.cam.nda.nvd_compat= =3D1 >>>>> in the boot loader and reboot? Same thing except nda where nvd was? O= r >>>>> does >>>>> it work? >>>>> >>>>> Something weird is going on in the interrupt assignment, I think, but= I >>>>> wanted to get any nvd vs nda issues out of the way first. >>>>> >>>>> Warner >>>>> >>>> >>>> Do you mean kern.cam.nda.nvd_compat instead of hw.cam.nda.nvd_compat? >>>> kern.cam.nda.nvd_compat is 1 by default now. >>>> >>>> So I tried to set hw.nvme.use_nvd to 1 as suggested, but I still see >>>> nvme0: Missing interrupt >>>> and now also >>>> Root mount waiting for: CAM >>>> messages besides those >>>> >>> >>> OK. That all makes sense. I'd forgotten that nvd_compat=3D1 by default >>> these >>> days. >>> >>> I'll take a look on monday starting at the differences in interrupt >>> assignment that >>> are apparent when you cold boot vs reboot. >>> >>> Thanks for checking... I'd hoped this was a cheap fix, but also didn't >>> really >>> expect it to be. >>> >>> Warner >>> >>> >> I've recently upgraded to main-n249974-17f790f49f5 and it got even worse >> now. >> So clean poweron works as before. >> But if rebooted nvme drive refuses to work, while before the code upgrad= e >> it was just complaining about missing interrupts. >> >> currently dmesg show this: >> nvme0: mem >> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at dev= ice >> 0.0 on pci6 >> nvd0: NVMe namespace >> nvd0: 488386MB (1000215216 512 byte sectors) >> nvme0: mem >> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at dev= ice >> 0.0 on pci6 >> > > Why is this showing up twice? Or is everything above this line left over > from the first, working boot? > > >> nvme0: RECOVERY_START 9585870784 vs 9367036288 >> nvme0: timeout with nothing complete, resetting >> nvme0: Resetting controller due to a timeout. >> nvme0: RECOVERY_WAITING >> nvme0: resetting controller >> nvme0: aborting outstanding admin command >> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000 >> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 >> nvme0: nvme_identify_controller failed! >> nvme0: waiting >> > > Clearly something bad is going on with the drive here... We looked into > the completion queues when we didn't get an interrupt and there was nothi= ng > complete there.... > > The only thing I can think of is that this means there's a phase error > between the drive and the system. I recently removed a second reset and > made it an option NVME_2X_RESET. Can you see if adding > 'options NVME_2X_RESET' to your kernel config fixes this? > > Warner > > >> nvme0: mem >> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at dev= ice >> 0.0 on pci6 >> nvme0: RECOVERY_START 9362778467 vs 9361830561 >> nvme0: timeout with nothing complete, resetting >> nvme0: Resetting controller due to a timeout. >> nvme0: RECOVERY_WAITING >> nvme0: resetting controller >> nvme0: aborting outstanding admin command >> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000 >> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 >> nvme0: nvme_identify_controller failed! >> nvme0: waiting >> >> Sorry, it's showing twice due to multiple reboots. For one boot it's like: nvme0: mem 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at device 0.0 on pci6 nvme0: RECOVERY_START 9633303481 vs 9365971423 nvme0: timeout with nothing complete, resetting nvme0: Resetting controller due to a timeout. nvme0: RECOVERY_WAITING nvme0: resetting controller nvme0: aborting outstanding admin command nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000 nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 nvme0: nvme_identify_controller failed! nvme0: waiting Well, neither Windows not Linux have any problems with the device. I understand they may be hiding it or workaround somehow. I'll try setting NVME_2X_RESET in the kernel config and report back in a while. --000000000000ba6db705cdec8812--