Re: WD Blue 510 SSD and strange write performance (update II)

From: mike tancsa <mike_at_sentex.net>
Date: Sat, 27 Apr 2024 18:47:08 UTC
On 3/21/2024 8:46 AM, mike tancsa wrote:
>
> summary: WD Blue 510 SSDs when attached to the mpr controller seem to 
> start throwing errors on random disks in the pools (see 
> https://lists.freebsd.org/archives/freebsd-hardware/2024-March/000100.html 
> for examples) after copying and destroying a zfs 200G dataset with 
> many small files 3 or 4 times on a set of 4 disks in raidz1. Doing a 
> hard trim -f da on the disks and recreating the pool allows me to do 
> the tests 3 or 4 more times before hitting the errors again.  The same 
> tests with the same disks attached to a sata controller doesnt show 
> the errors. I also ran into the same problem with a similar LSI 
> controller but using the mrsas controller/driver (<AVAGO Invader SAS 
> Controller>).  It seems to be trim related?  Using samsung SSDs on the 
> mpr controller does not seem to show the issue.
>
I decided to try the same tests on the exact same hardware but booting 
truenas scale (the linux variant) to see if the problem persists.  If I 
do a manual trim between zfs send | zfs recv, zfs destroy, the 
performance seems fairly consistent and there are no crashes/resets of 
the drives in the pool on linux (6.6.20-production+truenas).

Not a linux person so hard to say if there are some quirks for these 
disks on linux.

root@truenas[/var/log]# hdparm -I /dev/sda | grep -i tri
            *    Data Set Management TRIM supported (limit 8 blocks)
            *    Deterministic read data after TRIM
root@truenas[/var/log]#

If I dont do the manual TRIM between send|recv (ie zpool trim -w pool), 
I get the same pattern as when I do a manual trim -f /dev/da[x] on each 
disk one by one on FreeBSD.  I get 3 full speed loops and after that, 
super slow until a proper trim is done. On FreeBSD I do this to the 
raidz1 pool by doing a trim -f /dev/da[1-4] one by one and resilver.

So it does seem to point to TRIM via zfs (be that manual or autotrim) 
somehow broken with this drive on FreeBSD via the mpr driver and via the 
ATA driver.

given the output of hdparm on linux and trim being limited to 8 blocks, 
anyone know if there is a quirk I can try on FreeBSD to maybe get TRIM 
working for these SSDs ?

details captured in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277992

the attachment in the PR, 
https://bugs.freebsd.org/bugzilla/attachment.cgi?id=250268 has a PNG 
showing the performance when the TRIM is not done.

     ---Mike



>
> OK, some updates.  I took the same 4 disks off the mpr controller and 
> put them off the motherboard and the problem seems to disappear.  If 
> it is still related to trim, I notice that on the mpr controller the 
> trim method is ATA_TRIM and when attached to the motherboard SATA its 
> DSM_TRIM.  Not sure if there is any difference there ? Or its some 
> other problem.  PR time for the mpr driver ?
>
> kern.cam.ada.1.trim_ticks: 0
> kern.cam.ada.1.trim_goal: 0
> kern.cam.ada.1.flags: 
> 0x1be3bde<CAN_48BIT,CAN_FLUSHCACHE,CAN_NCQ,CAN_DMA,WAS_OTAG,CAN_TRIM,OPEN,SCTX_INIT,CAN_POWERMGT,CAN_DMA48,CAN_LOG,CAN_WCACHE,CAN_RAHEAD,PROBED,ANNOUNCED,DIRTY,PIM_ATA_EXT,UNMAPPEDIO>
> kern.cam.ada.1.trim_lbas: 6356918872
> kern.cam.ada.1.trim_ranges: 171552
> kern.cam.ada.1.trim_count: 84205
> kern.cam.ada.1.delete_method: DSM_TRIM
>
> kern.cam.da.6.trim_ticks: 0
> kern.cam.da.6.trim_goal: 0
> kern.cam.da.6.sort_io_queue: 0
> kern.cam.da.6.unmapped_io: 1
> kern.cam.da.6.rotating: 0
> kern.cam.da.6.flags: 
> 0x10ef40<WAS_OTAG,OPEN,SCTX_INIT,CAN_RC16,PROBED,ANNOUCNED,CAN_ATA_DMA,CAN_ATA_LOG,UNMAPPEDIO>
> kern.cam.da.6.p_type: 0
> kern.cam.da.6.error_inject: 0
> kern.cam.da.6.max_seq_zones: 0
> kern.cam.da.6.optimal_nonseq_zones: 0
> kern.cam.da.6.optimal_seq_zones: 0
> kern.cam.da.6.zone_support: None
> kern.cam.da.6.zone_mode: Not Zoned
> kern.cam.da.6.trim_lbas: 0
> kern.cam.da.6.trim_ranges: 0
> kern.cam.da.6.trim_count: 0
> kern.cam.da.6.minimum_cmd_size: 6
> kern.cam.da.6.delete_max: 17179607040
> kern.cam.da.6.delete_method: ATA_TRIM
>
> camcontrol iden doesnt show much difference really
>
>  diff -bu wd.mpr wd.ata
> --- wd.mpr      2024-03-21 08:27:02.995734000 -0400
> +++ wd.ata      2024-03-21 08:21:42.310055000 -0400
> @@ -1,5 +1,6 @@
> +# camcontrol ide ada1
>  pass6: <WD Blue SA510 2.5 1000GB 52046100> ACS-4 ATA SATA 3.x device
> -pass6: 600.000MB/s transfers, Command Queueing Enabled
> +pass6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
>
>  protocol              ACS-4 ATA SATA 3.x
>  device model          WD Blue SA510 2.5 1000GB
>
>
> Controller is
>
>  mprutil show adapter
> mpr0 Adapter:
>        Board Name: INSPUR 3008IT
>    Board Assembly: INSPUR
>         Chip Name: LSISAS3008
>     Chip Revision: ALL
>     BIOS Revision: 18.00.00.00
> Firmware Revision: 16.00.12.00
>   Integrated RAID: no
>          SATA NCQ: ENABLED
>  PCIe Width/Speed: x8 (8.0 GB/sec)
>         IOC Speed: Full
>       Temperature: 51 C
>
> PhyNum  CtlrHandle  DevHandle  Disabled  Speed   Min    Max Device
> 0       0001        0009       N         6.0     3.0    12     SAS 
> Initiator
> 1       0001        0009       N         6.0     3.0    12     SAS 
> Initiator
> 2       0001        0009       N         6.0     3.0    12     SAS 
> Initiator
> 3       0001        0009       N         6.0     3.0    12     SAS 
> Initiator
> 4                              N                 3.0    12     SAS 
> Initiator
> 5                              N                 3.0    12     SAS 
> Initiator
> 6                              N                 3.0    12     SAS 
> Initiator
> 7                              N                 3.0    12     SAS 
> Initiator
>
>