From nobody Sun Mar 17 12:03:44 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TyGqD0Vwyz5D024 for ; Sun, 17 Mar 2024 12:03:52 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smarthost1.sentex.ca", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TyGqC3CgJz4WfX; Sun, 17 Mar 2024 12:03:51 +0000 (UTC) (envelope-from mike@sentex.net) Authentication-Results: mx1.freebsd.org; none Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.17.1/8.16.1) with ESMTPS id 42HC3jQs067359 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=FAIL); Sun, 17 Mar 2024 08:03:45 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPV6:2607:f3e0:0:4:838:bb8d:b41c:78ce] ([IPv6:2607:f3e0:0:4:838:bb8d:b41c:78ce]) by pyroxene2a.sentex.ca (8.17.1/8.15.2) with ESMTPS id 42HC3hIU004945 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Sun, 17 Mar 2024 08:03:44 -0400 (EDT) (envelope-from mike@sentex.net) Content-Type: multipart/alternative; boundary="------------CnN0rO33ICebgnjizmMEUWHh" Message-ID: <00cf68fe-73e2-4d28-bb49-6aad7eeaf884@sentex.net> Date: Sun, 17 Mar 2024 08:03:44 -0400 List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: WD Blue 510 SSD and strange write performance Content-Language: en-US To: Andrea Venturoli Cc: freebsd-hardware@freebsd.org References: <6504bd49-eca5-4e0a-b2bd-23d29405bb7a@sentex.net> <4832DE6A-5C82-4805-99BB-220D4342AE0F@fjl.co.uk> <69e47494-01aa-4149-a326-91d82dfdc46e@sentex.net> <27933f54-2959-4071-b084-d796a7c3ae75@netfence.it> From: mike tancsa Autocrypt: addr=mike@sentex.net; keydata= xsBNBFywzOMBCACoNFpwi5MeyEREiCeHtbm6pZJI/HnO+wXdCAWtZkS49weOoVyUj5BEXRZP xflV2ib2hflX4nXqhenaNiia4iaZ9ft3I1ebd7GEbGnsWCvAnob5MvDZyStDAuRxPJK1ya/s +6rOvr+eQiXYNVvfBhrCfrtR/esSkitBGxhUkBjOti8QwzD71JVF5YaOjBAs7jZUKyLGj0kW yDg4jUndudWU7G2yc9GwpHJ9aRSUN8e/mWdIogK0v+QBHfv/dsI6zVB7YuxCC9Fx8WPwfhDH VZC4kdYCQWKXrm7yb4TiVdBh5kgvlO9q3js1yYdfR1x8mjK2bH2RSv4bV3zkNmsDCIxjABEB AAHNHW1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5uZXQ+wsCOBBMBCAA4FiEEmuvCXT0aY6hs 4SbWeVOEFl5WrMgFAl+pQfkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQeVOEFl5W rMiN6ggAk3H5vk8QnbvGbb4sinxZt/wDetgk0AOR9NRmtTnPaW+sIJEfGBOz47Xih+f7uWJS j+uvc9Ewn2Z7n8z3ZHJlLAByLVLtcNXGoRIGJ27tevfOaNqgJHBPbFOcXCBBFTx4MYMM4iAZ cDT5vsBTSaM36JZFtHZBKkuFEItbA/N8ZQSHKdTYMIA7A3OCLGbJBqloQ8SlW4MkTzKX4u7R yefAYQ0h20x9IqC5Ju8IsYRFacVZconT16KS81IBceO42vXTN0VexbVF2rZIx3v/NT75r6Vw 0FlXVB1lXOHKydRA2NeleS4NEG2vWqy/9Boj0itMfNDlOhkrA/0DcCurMpnpbM7ATQRcsMzk AQgA1Dpo/xWS66MaOJLwA28sKNMwkEk1Yjs+okOXDOu1F+0qvgE8sVmrOOPvvWr4axtKRSG1 t2QUiZ/ZkW/x/+t0nrM39EANV1VncuQZ1ceIiwTJFqGZQ8kb0+BNkwuNVFHRgXm1qzAJweEt RdsCMohB+H7BL5LGCVG5JaU0lqFU9pFP40HxEbyzxjsZgSE8LwkI6wcu0BLv6K6cLm0EiHPO l5G8kgRi38PS7/6s3R8QDsEtbGsYy6O82k3zSLIjuDBwA9GRaeigGppTxzAHVjf5o9KKu4O7 gC2KKVHPegbXS+GK7DU0fjzX57H5bZ6komE5eY4p3oWT/CwVPSGfPs8jOwARAQABwsB2BBgB CAAgFiEEmuvCXT0aY6hs4SbWeVOEFl5WrMgFAl+pQfkCGwwACgkQeVOEFl5WrMiVqwf9GwU8 c6cylknZX8QwlsVudTC8xr/L17JA84wf03k3d4wxP7bqy5AYy7jboZMbgWXngAE/HPQU95NM aukysSnknzoIpC96XZJ0okLBXVS6Y0ylZQ+HrbIhMpuQPoDweoF5F9wKrsHRoDaUK1VR706X rwm4HUzh7Jk+auuMYfuCh0FVlFBEuiJWMLhg/5WCmcRfiuB6F59ZcUQrwLEZeNhF2XJV4KwB Tlg7HCWO/sy1foE5noaMyACjAtAQE9p5kGYaj+DuRhPdWUTsHNuqrhikzIZd2rrcMid+ktb0 NvtvswzMO059z1YGMtGSqQ4srCArju+XHIdTFdiIYbd7+jeehg== In-Reply-To: <27933f54-2959-4071-b084-d796a7c3ae75@netfence.it> X-Scanned-By: MIMEDefang 2.86 on 64.7.153.18 X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA] X-Rspamd-Queue-Id: 4TyGqC3CgJz4WfX This is a multi-part message in MIME format. --------------CnN0rO33ICebgnjizmMEUWHh Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 3/17/2024 4:32 AM, Andrea Venturoli wrote: > On 3/15/24 19:17, mike tancsa wrote: > >> (da5:mpr0:0:15:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, >> reset, or bus device reset occurred) > > Hello. > I know I'm probably blaming the wrong component, but is your PSU up to > the task? > How many drives do you have? Are they power-hungrier than the others > you tried (Samsung ???)? > Do you have a spare PSU to test/add? > > Probably this is not the cause... still, before you bit farewell to > 400 bucks... > hehe, thanks Andrea :)  I too dont want to be out the money. Power supply for sure is a good thing to check. In this case, the main server chassis is sized with a couple of redundant 1000W power supplies that should handle 12 full HDDs. Pretty sure in this case 6 SSDs should not stress it beyond the point. But I had 2 other test boxes on the bench and the one common variable seems to be the WDs. I feel like this is a sunk cost I am pushing myself into, but I did do some more testing.  My co-worker came across this post which was interesting. https://forum.hddguru.com/viewtopic.php?f=10&t=43284 The very last entry says "For WD BLUE SA 510 there are some problems with this type of SSD. This YODA model To fix the SSD if it is still recognized, use the firmware update tools. And then do a secure erase or full wipe of the SSD. After this it will work well. I can give you a link to this utility if it necessary. Also ossible download it from manufacture FTP. If it is not recognized by the computer or is identified as a SSD device, there only one way, use production tools with new firmware to begin the production process by testing the controller and NAND chip and forming a translator. The SSD will be like brand new. " After I did the erase, the tests worked for a good 5 cycles and performance was MUCH smoother and consistent. But then the drives started to fail again.  So I really wonder if TRIM has something to do with it as my test is essentially writing a 250G data set with about 28 million txt files, destroying the dataset and then copying it again. I noticed these 2 commits for other drives. I wonder if the WD is having similar issues. https://cgit.freebsd.org/src/commit/?h=stable/14&id=bf11fee6a5cf97102f87695185cadb63d5a2a7de and https://cgit.freebsd.org/src/commit/?h=stable/14&id=50aa22323424ccea00ef5d8f24e729a480cc77eb I hope you dont mind bcc'ing you Andriy.  I noticed you only added the NCQ quirks for CAM ata and not for CAM scsi. I am running into odd issues with some WD drives and wondering if there is the same root limitation of these WD SA 510 drives like the Samsungs ? However, in my use of the Samsungs I have not been able to trigger these bugs so far.     ---Mike --------------CnN0rO33ICebgnjizmMEUWHh Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
On 3/17/2024 4:32 AM, Andrea Venturoli wrote:
On 3/15/24 19:17, mike tancsa wrote:

(da5:mpr0:0:15:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)

Hello.
I know I'm probably blaming the wrong component, but is your PSU up to the task?
How many drives do you have? Are they power-hungrier than the others you tried (Samsung ???)?
Do you have a spare PSU to test/add?

Probably this is not the cause... still, before you bit farewell to 400 bucks...


hehe, thanks Andrea :)  I too dont want to be out the money. Power supply for sure is a good thing to check. In this case, the main server chassis is sized with a couple of redundant 1000W power supplies that should handle 12 full HDDs. Pretty sure in this case 6 SSDs should not stress it beyond the point. But I had 2 other test boxes on the bench and the one common variable seems to be the WDs. 

I feel like this is a sunk cost I am pushing myself into, but I did do some more testing.  My co-worker came across this post which was interesting.

https://forum.hddguru.com/viewtopic.php?f=10&t=43284

The very last entry says

"For WD BLUE SA 510 there are some problems with this type of SSD. This YODA model
To fix the SSD if it is still recognized, use the firmware update tools.
And then do a secure erase or full wipe of the SSD. After this it will work well. I can give you a link to this utility if it necessary. Also ossible download it from manufacture FTP.
If it is not recognized by the computer or is identified as a SSD device, there only one way, use production tools with new firmware to begin the production process by testing the controller and NAND chip and forming a translator. The SSD will be like brand new.
"

After I did the erase, the tests worked for a good 5 cycles and performance was MUCH smoother and consistent. But then the drives started to fail again.  So I really wonder if TRIM has something to do with it as my test is essentially writing a 250G data set with about 28 million txt files, destroying the dataset and then copying it again.

I noticed these 2 commits for other drives. I wonder if the WD is having similar issues. 

https://cgit.freebsd.org/src/commit/?h=stable/14&id=bf11fee6a5cf97102f87695185cadb63d5a2a7de
and
https://cgit.freebsd.org/src/commit/?h=stable/14&id=50aa22323424ccea00ef5d8f24e729a480cc77eb

I hope you dont mind bcc'ing you Andriy.  I noticed you only added the NCQ quirks for CAM ata and not for CAM scsi. I am running into odd issues with some WD drives and wondering if there is the same root limitation of these WD SA 510 drives like the Samsungs ? However, in my use of the Samsungs I have not been able to trigger these bugs so far.

    ---Mike

--------------CnN0rO33ICebgnjizmMEUWHh--