From nobody Tue Feb 13 03:03:29 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TYmP34Pnxz59cZV for ; Tue, 13 Feb 2024 03:03:35 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TYmP30LQXz42q9; Tue, 13 Feb 2024 03:03:35 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-yb1-xb35.google.com with SMTP id 3f1490d57ef6-dcc80d6004bso205827276.0; Mon, 12 Feb 2024 19:03:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707793413; x=1708398213; darn=freebsd.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=y+14Lfv2ghy0pdcKD/fCBPSDwLu3LVniUZnnSEXd1vg=; b=i0sfoplh5SVHfBln3Fwbn3vyvjFLwF7lcdry01mJx2YOscpow886yJuE6HoW/GCBPb OrllUzdHVFd5SqOSmwY7SP7zt7yKSEfltDK6FFTfcLCKgoTHVFDbeImByPwFyvXh1FaD KFefa8Dqn+7KM2BEsrLSv3rCwPVv3umkYpCBPH42kBFYoMx9WUDFk7sJ1Xx8B1YdPoex OEvB3qTLKf2Ks3IsOKCXBCtWbnICDoxoGADKyGl7fjkjN1/3K4nh+nu/mRfRt0plNEFy LlpRvOCvR7fPLJW+wWSsv1vbmHNeXNmqUEooKUQ9HPPIL6VMc9u/qvM9PeW7AHz9I03X 5s3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707793413; x=1708398213; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y+14Lfv2ghy0pdcKD/fCBPSDwLu3LVniUZnnSEXd1vg=; b=N6Tb37SguNbU5umh3smh/m8Cq1qptiEprkW68hesS6BJ0UnhhZDm1vQ+uhAa9vgkoN COLQk7KfWU7bL79kSrtwrx2Kfrpgr4eOBeYYGECIwzGK4LXSFSb2RqoJs3p+k7SmZH5C q6HcY8wZk7fkMNfh6N0aXngb5HkLqf5TBis5tAWk57Y7VkpPe0nII+Z0zE8G72yicHnw NLu5LUCFtaeD5WDyyf4w6DM5/P22mszXTVyB7Mr+9W+hf8zD0aXBwhUG5jv+nvhsrChs s3Ou9hdX6Tf8EoFXPFM6ESBBsdfutwH3r5XEh7A4tpOluJBzdMXgTNNeuinLr8BrKyXY CqFQ== X-Forwarded-Encrypted: i=1; AJvYcCUxi7MGq5jevMOS4p3lXd+qeQFQrV708QCCuZxrct85yT8/5+EWYC/Tx9OsldSA3S+t/NxaCAP6GyyRESoHSwU= X-Gm-Message-State: AOJu0YymmEWCd3DShOSHtju3Po118maiwBHUEhk8c+P48FIyKak55A/z ErpRmFo4aIm/OWq1eNB8eagR5Q04e6kJ7OHGljpoVjF7dnDp3Oz4FrukENYF X-Google-Smtp-Source: AGHT+IGS0vNcuYT+QQZLKr0yUrW7gjF/cjWf5qk3864Vfarg4gIqfGcLjUsn4gqdnv0OHo/HPdw8UA== X-Received: by 2002:a25:5f09:0:b0:dc2:2d75:5fde with SMTP id t9-20020a255f09000000b00dc22d755fdemr6820956ybb.29.1707793412731; Mon, 12 Feb 2024 19:03:32 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCVis1yyNL1X740qfOeQnbw6moUqAoRRsr471FDcNsMsIvyueINv3quGB46RkrZ5nHEHjm2vx00Hbm0DlCrKJuQ= Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id g4-20020a05620a40c400b00785b02c5784sm2598706qko.86.2024.02.12.19.03.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Feb 2024 19:03:32 -0800 (PST) Date: Mon, 12 Feb 2024 22:03:29 -0500 From: Mark Johnston To: Don Lewis Cc: FreeBSD current , John Baldwin Subject: Re: nvme controller reset failures on recent -CURRENT Message-ID: References: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4TYmP30LQXz42q9 X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote: > I just upgraded my package build machine to: > FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e > from: > FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 > and I've had two nvme-triggered panics in the last day. > > nvme is being used for swap and L2ARC. I'm not able to get a crash > dump, probably because the nvme device has gone away and I get an error > about not having a dump device. It looks like a low-memory panic > because free memory is low and zfs is calling malloc(). > > This shows up in the log leading up to the panic: > Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a > nd possible hot unplug. > Feb 12 10:07:41 zipper syslogd: last message repeated 1 times > Feb 12 10:07:41 zipper kernel: nvme0: resetting controller > Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a > nd possible hot unplug. > Feb 12 10:07:41 zipper syslogd: last message repeated 1 times > Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete > Feb 12 10:07:41 zipper syslogd: last message repeated 2 times > Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o > Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti > meout. Are you by chance using the drive mentioned here? https://github.com/openzfs/zfs/discussions/14793 I was bitten by that and ended up replacing the drive with a different model. The crash manifested exactly as you describe, though I didn't have L2ARC or swap enabled on it. > The device looks healthy to me: > SMART/Health Information Log > ============================ > Critical Warning State: 0x00 > Available spare: 0 > Temperature: 0 > Device reliability: 0 > Read only: 0 > Volatile memory backup: 0 > Temperature: 312 K, 38.85 C, 101.93 F > Available spare: 100 > Available spare threshold: 10 > Percentage used: 3 > Data units (512,000 byte) read: 5761183 > Data units written: 29911502 > Host read commands: 471921188 > Host write commands: 605394753 > Controller busy time (minutes): 32359 > Power cycles: 110 > Power on hours: 19297 > Unsafe shutdowns: 14 > Media errors: 0 > No. error info log entries: 0 > Warning Temp Composite Time: 0 > Error Temp Composite Time: 0 > Temperature 1 Transition Count: 5231 > Temperature 2 Transition Count: 0 > Total Time For Temperature 1: 41213 > Total Time For Temperature 2: 0 > >