From nobody Fri Dec 08 01:02:57 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SmXv336Mhz52x26 for ; Fri, 8 Dec 2023 01:03:11 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SmXv31QQGz4JyC for ; Fri, 8 Dec 2023 01:03:11 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-50bee606265so1514268e87.2 for ; Thu, 07 Dec 2023 17:03:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701997389; x=1702602189; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6nyC/tcUF4wO4trQqSXTC1MoLSqXCK8svkcBURynzjo=; b=EvlDKtvT+oa8sdx+X8aAu5VbUW6hGLN7+wC+T2aQJ7FXxgyrJtPaB9LJTiWIMiIC1/ 7GsoS/8MxeVJmF4mM+akhtmWipTZk1Z3JQM3Y3zgJFeNMr/Ap9rIyk5A/7HK9NYMGWGO SPA7+5ZAx1rE94yr2MmHYH0OeWTFuVtnpbS9++ghBG7o5Vp8LmNnSwQlO7Kwjf/S5oia +r9815+B9MY0ZV3PzQz6p5z2x1r3uCYTXi31xQ8ibMXRbv+XyRgKY6fmkIlwjBYW3RP4 rDU6p7Gs3iBG3ZuuAMXWLS+pnkFFB+VqabOWuMLDS7+8VPwZIolDAYeVg7MK+4v/El7r Q+YA== X-Gm-Message-State: AOJu0YzOg2KlP9BlpJUl4ok+8aLoaaUweSk6gioDS4zFYjuHvwHrXWDU xFXg7PoRE/PaPw5BeyKmVL87dDk1HZdfibUdroVXGtVbNHfb55h3eSU= X-Google-Smtp-Source: AGHT+IGmNSwdA0Bl4msa9a34LkYCGgTPEIrwqkrBjNOgivdhR0OsTm/lGYD8HhnBPB9pWk5NISuIssYVIHpJIYx1yd0= X-Received: by 2002:ac2:46ed:0:b0:50c:e34:aefb with SMTP id q13-20020ac246ed000000b0050c0e34aefbmr1916974lfo.28.1701997388887; Thu, 07 Dec 2023 17:03:08 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org> <0ad493d5-1c1e-4370-977a-118f46ebd677@nomadlogic.org> <0c4f8149-89dd-4635-a5ed-4766fffd2553@nomadlogic.org> <20231208080929.cfd9fca421fea81d89d2380b@dec.sakura.ne.jp> <10FD2FC6-1F39-4F7D-8BA8-976ADC0AE37A@iitbombay.org> In-Reply-To: <10FD2FC6-1F39-4F7D-8BA8-976ADC0AE37A@iitbombay.org> From: Maxim Sobolev Date: Thu, 7 Dec 2023 17:02:57 -0800 Message-ID: Subject: Re: nvme timeout issues with hardware and bhyve vm's To: Bakul Shah Cc: Warner Losh , Tomoaki AOKI , FreeBSD Current Content-Type: multipart/alternative; boundary="000000000000c61723060bf525a2" X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US] X-Spamd-Bar: ---- X-Rspamd-Queue-Id: 4SmXv31QQGz4JyC --000000000000c61723060bf525a2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable How quickly it heats up depends on lots of factors. Usually those devices burn some 3-7 watts per stick at 100% load, so maybe this would give you some idea. At least some of them support several toggleable performance modes, which use throttling internally to limit power consumption to a certain level (man nvmecontril). It helped me recently to make a system stable, which otherwise would hang with timeout after reaching 70-75C until I got the chance to take it apart and attach a heatsinks to the nvmes. Once the temperature dropped to <=3D 50C the drives become 100% stable. -Max On Thu, Dec 7, 2023, 4:07=E2=80=AFPM Bakul Shah wrote= : > On Dec 7, 2023, at 3:59=E2=80=AFPM, Warner Losh wrote: > > > > > > *Overheating caused hang of NVMe controller or PCI bridge on SSD, or > > > > Yes. Most drive's firmware when it overheats resets. There might be > something > > that the pci code can do when this happens to retrain the link, > reprogram the > > config registers, etc. > > How quickly can the device heat up? Can it be queried frequently > enough act before it overheats by throttling io? > > > > > --000000000000c61723060bf525a2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
How quickly it heats up depends on lots of factors. Usual= ly those devices burn some 3-7 watts per stick at 100% load, so maybe this = would give you some idea. At least some of them support several toggleable = performance modes, which use throttling internally to limit power consumpti= on to a certain level (man nvmecontril). It helped me recently to make a sy= stem stable, which otherwise would hang with timeout after reaching 70-75C = until I got the chance to take it apart and attach a heatsinks to the nvmes= . Once the temperature dropped to <=3D 50C the drives become 100% stable= .

-Max

On Thu, Dec 7, 202= 3, 4:07=E2=80=AFPM Bakul Shah <ba= kul@iitbombay.org> wrote:
On= Dec 7, 2023, at 3:59=E2=80=AFPM, Warner Losh <imp@bsdimp.com> wrote:=
>
>
>=C2=A0 *Overheating caused hang of NVMe controller or PCI bridge on SSD= , or
>
> Yes. Most drive's firmware when it overheats resets. There might b= e something
> that the pci code can do when this happens to retrain the link, reprog= ram the
> config registers, etc.

How quickly can the device heat up? Can it be queried frequently
enough act before it overheats by throttling io?




--000000000000c61723060bf525a2--