From nobody Tue Nov 29 22:00:46 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NMGVz5PQXz4hdVp for ; Tue, 29 Nov 2022 22:00:59 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NMGVz0j0Qz3ljH for ; Tue, 29 Nov 2022 22:00:59 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x532.google.com with SMTP id v8so21691958edi.3 for ; Tue, 29 Nov 2022 14:00:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1ezV7Bu4w8wK2izMs3NT0aoY1h+9d0E4C5kv7awzDl4=; b=Sf/y3+eizZj3EN33b7/wz5I2Vj8m7M6xyYRvefZSljT7vzb3WJ1mLUUNTTqK04U41F OVK4CCgrZ4UgWrwh2TaUruVuyzer6fg434+EsXH/A2SgJeHygjYFR451KG1N/I4vQOLq c/gkVnj+Z4HEH/8FsHgMl08FNxGiVkdhp46vCwBD84HBmjO5P0EBkUwnoZIZpd4yDmfK iei/HzMzM2EPUuH0DRHfMH0RMJmM+Q/19ak0DbHNWwb+Z/z7fbEdhcVbsm1DXSvfj8/M UwV5ca1l5ue9U5hbHpbtkYnf5ZmBjyZQiAuBuN1kGsuUJ0J2EeB8e4hZ9uSwStqCVZ2J Pc0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1ezV7Bu4w8wK2izMs3NT0aoY1h+9d0E4C5kv7awzDl4=; b=uOix+VgG+Y+OE6CcbnoTnl6OhE5gmfGU07s3vxc9kotpitiM1c/+9bREUtIGoo8RhW rpskwA/ZRgFPU8rZ8Lh5BW47ilI+987/sywnAUZSeeFrNinwWhmk/TXqY5B8k3RPOUxV 4gjAlwmG+7zQGMmH2AOXa0gb3aSvDeV39i48u3+YHdRgcor2mU09JHPzS5jMR86ty1qy wBX5trN/eRlyO7SCr+wJQjTw+ATSoEwLBZAzSrVWQ1VrI938yeizllEeRNWuXYjrMPmb ODj5VNpdVoQ/N9tctGghQiQWYKg+0ET2HLTaWMdp3J5BB1ES6eEpDnpIhOgeO7pLzZ17 2N4w== X-Gm-Message-State: ANoB5plbcncsItPOUySb/B87ByGt+rtuOAAIrRitp/KtRPKqrXdDeSe+ /UF50OxXZ2cSmz/BqTknXf7HqVFgy0iC/ZTqBmhVKg== X-Google-Smtp-Source: AA0mqf6wR7xfHoMbBgYqE0hQNPPZ/qYeXnQhfLKmswQ+YZR9tPe1wvyF6jMesHCQAiIYOilNFiCxPTqMH1Gl/ROUpVY= X-Received: by 2002:aa7:d85a:0:b0:46b:81a8:1ff6 with SMTP id f26-20020aa7d85a000000b0046b81a81ff6mr4537646eds.174.1669759257404; Tue, 29 Nov 2022 14:00:57 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <20221127161544.7dd1207c@thor.intern.walstatt.dynvpn.de> In-Reply-To: From: Warner Losh Date: Tue, 29 Nov 2022 15:00:46 -0700 Message-ID: Subject: Re: CAM: extract HDD informations about failure/to fail? To: Maxim Sobolev Cc: FreeBSD User , FreeBSD CURRENT Content-Type: multipart/alternative; boundary="0000000000006601b005eea31fb0" X-Rspamd-Queue-Id: 4NMGVz0j0Qz3ljH X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --0000000000006601b005eea31fb0 Content-Type: text/plain; charset="UTF-8" Average Latency would also do the trick. Warner On Tue, Nov 29, 2022 at 2:20 PM Maxim Sobolev wrote: > Perhaps if you log r/w queue length for all 4 drives with a reasonable > interval (say 1 second) under the load using gstat(8) and plot all 4 as > function of time on the same graph you should have no problem to visually > identify the culprit(s). At least that's how I would do it. > > -Maksym > > On Sun, Nov 27, 2022, 7:15 AM FreeBSD User wrote: > >> Hello, >> >> well, the aim of my post sounds strange, but I'm serious. >> Background: I run at home a 14-CURRENT based server with a ZFS volume >> (RAIDZ) comprised from >> 4x 4 TB HDD. A couple of days I had to exchange the HGST NAS drives since >> one got a permanent >> SMART error. So all HDDs have been replaced by now with four times Seagte >> IronWolfe Pro 4TB >> drives. So far, so good. >> Now I face a weird sound sourcing at one of the new HDDs. The box is >> supposed to be a heavy >> duty poudriere build facility, so the drives are up 24/7. It seems that >> one (or even more) >> drives emitt a weird sound like the spindle motor is loosing for a >> fraction of a second power >> and spiining up the the drive again. Searching the net reveals that at >> least one Seagate >> customer did have the same issue and he provided an audio file of that >> very weird sound, to be >> found here: >> >> Post at reddit: >> >> https://www.reddit.com/r/techsupport/comments/sca6al/seagate_ironwolf_pro_making_weird_noise/ >> >> and herin the post of the audio file: >> >> https://www.mediafire.com/file/x3le816qsakiff9/Hdd.mp4/file >> >> I checked S.M.A.R.T for any unusual data, but everything is fine. The >> values for >> >> Power_Cycle_Count >> Power-Off_Retract_Count >> Start_Stop_Count >> >> seem all within a reasonable range compared to the life time in hours >> (did some simple >> statistsics ), nothing looks unusual. >> >> Also, the advanced view onto each drive via >> >> smartctl -x >> >> doesn't give me any hint of a power failure as a source for the noise. >> >> So, big question here is: the drives are attached to a HBA, LSI3008 based >> SAS9300-8i. Is it >> possible to retrieve via CAM more health paramteres than those gathered >> by SMART/smartmontools >> and if the answer is yes, how can this be achieved? >> It close to impossible to isolate the drive making the noise. My guts >> tell me to RMA the >> supposed to be faulty drive and not to wait until it dies from "spindle >> motor desease" or >> something that is the source for the noises. >> >> Thanks in advance, >> >> oh >> >> >> -- >> O. Hartmann >> >> --0000000000006601b005eea31fb0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Average Latency would also do the trick.
Warner

On Tue, Nov 29, 2022 at 2:20 PM Maxim Sobolev <sobomax@freebsd.org> wrote:
Perhaps if you log r/w queue length= for all 4 drives with a reasonable interval (say 1 second) under the load = using gstat(8) and plot all 4 as function of time on the same graph you sho= uld have no problem to visually identify the culprit(s). At least that'= s how I would do it.

=
-Maksym

<= div class=3D"gmail_quote">
On Sun, Nov= 27, 2022, 7:15 AM FreeBSD User <freebsd@walstatt-de.de> wrote:
Hello,

well, the aim of my post sounds strange, but I'm serious.
Background: I run at home a 14-CURRENT based server with a ZFS volume (RAID= Z) comprised from
4x 4 TB HDD. A couple of days I had to exchange the HGST NAS drives since o= ne got a permanent
SMART error. So all HDDs have been replaced by now with four times Seagte I= ronWolfe Pro 4TB
drives. So far, so good.
Now I face a weird sound sourcing at one of the new HDDs. The box is suppos= ed to be a heavy
duty poudriere build facility, so the drives are up 24/7. It seems that one= (or even more)
drives emitt a weird sound like the spindle motor is loosing for a fraction= of a second power
and spiining up the the drive again. Searching the net reveals that at leas= t one Seagate
customer did have the same issue and he provided an audio file of that very= weird sound, to be
found here:

Post at reddit:
=C2=A0https://www.reddit.com/r/techsupport/comments/sca6al/seagate_ir= onwolf_pro_making_weird_noise/

and herin the post of the audio file:

=C2=A0https://www.mediafire.co= m/file/x3le816qsakiff9/Hdd.mp4/file

I checked S.M.A.R.T for any unusual data, but everything is fine. The value= s for

Power_Cycle_Count
Power-Off_Retract_Count
Start_Stop_Count

seem all within a reasonable range compared to the life time in hours (did = some simple
statistsics ), nothing looks unusual.

Also, the advanced view onto each drive via

smartctl -x

doesn't give me any hint of a power failure as a source for the noise.<= br>
So, big question here is: the drives are attached to a HBA, LSI3008 based S= AS9300-8i. Is it
possible to retrieve via CAM more health paramteres than those gathered by = SMART/smartmontools
and if the answer is yes, how can this be achieved?
It close to impossible to isolate the drive making the noise. My guts tell = me to RMA the
supposed to be faulty drive and not to wait until it dies from "spindl= e motor desease" or
something that is the source for the noises.

Thanks in advance,

oh


--
O. Hartmann

--0000000000006601b005eea31fb0--