FreeBSD10 Stable + ZFS + PostgreSQL + SSD performance drop < 24 hours

Sun Jun 11 16:51:18 UTC 2017

Thanks Allan for the suggestions.  I tried gstat -d but deletes (d/s) doesn't seem to be it as it stays at 0 despite vfs.zfs.trim.enabled=1.

This is most likely due to the "layering" I use as, for historical reasons, I have GEOM ELI set up to essentially emulate 4k sectors regardless of the underlying media.  I do my own alignment and partition sizing as well as have the ZFS record size set to 8k for Postgres.

In gstat, the SSDs %busy is 90-100% on startup after reboot.  Once the performance degradation hits (<24 hours later), I'm seeing %busy at ~10%.

#!/bin/sh
psql --username=test --password=supersecret -h /db -d test << EOL
\timing on
select count(*) from test;
\q
EOL

Sample run of above script after reboot (before degradation hits) (Samsung 850 Pros in ZFS mirror):
Timing is on.
  count
----------
 21568508
(1 row)

Time: 57029.262 ms

Sample run of above script after degradation (Samsung 850 Pros in ZFS mirror):
Timing is on.
  count
----------
 21568508
(1 row)

Time: 583595.239 ms
(Uptime ~1 day in this particular case.)

Any other suggestions?

Regards,
A

-----Original Message-----
From: owner-freebsd-hackers at freebsd.org [mailto:owner-freebsd-hackers at freebsd.org] On Behalf Of Allan Jude
Sent: Saturday, June 10, 2017 9:40 PM
To: freebsd-hackers at freebsd.org
Subject: [EXTERNAL] Re: FreeBSD10 Stable + ZFS + PostgreSQL + SSD performance drop < 24 hours

On 06/10/2017 12:36, Slawa Olhovchenkov wrote:
> On Sat, Jun 10, 2017 at 04:25:59PM +0000, Caza, Aaron wrote:
>
>> Gents,
>>
>> I'm experiencing an issue where iterating over a PostgreSQL table of ~21.5 million rows (select count(*)) goes from ~35 seconds to ~635 seconds on Intel 540 SSDs.  This is using a FreeBSD 10 amd64 stable kernel back from Jan 2017.  SSDs are basically 2 drives in a ZFS mirrored zpool.  I'm using PostgreSQL 9.5.7.
>>
>> I've tried:
>>
>> *       Using the FreeBSD10 amd64 stable kernel snapshot of May 25, 2017.
>>
>> *       Tested on half a dozen machines with different models of SSDs:
>>
>> o   Intel 510s (120GB) in ZFS mirrored pair
>>
>> o   Intel 520s (120GB) in ZFS mirrored pair
>>
>> o   Intel 540s (120GB) in ZFS mirrored pair
>>
>> o   Samsung 850 Pros (256GB) in ZFS mirrored pair
>>
>> *       Using bonnie++ to remove Postgres from the equation and performance does indeed drop.
>>
>> *       Rebooting server and immediately re-running test and performance is back to original.
>>
>> *       Tried using Karl Denninger's patch from PR187594 (which took some work to find a kernel that the FreeBSD10 patch would both apply and compile cleanly against).
>>
>> *       Tried disabling ZFS lz4 compression.
>>
>> *       Ran the same test on a FreeBSD9.0 amd64 system using PostgreSQL 9.1.3 with 2 Intel 520s in ZFS mirrored pair.  System had 165 days uptime and test took ~80 seconds after which I rebooted and re-ran test and was still at ~80 seconds (older processor and memory in this system).
>>
>> I realize that there's a whole lot of info I'm not including (dmesg, zfs-stats -a, gstat, et cetera): I'm hoping some enlightened individual will be able to point me to a solution with only the above to go on.
>
> Just a random guess: can you try r307264 (I am mean regression in
> r307266)?
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>

This sounds a bit like an issue I investigated for a customer a few months ago.

Look at gstat -d (includes DELETE operations like TRIM)

If you see a lot of that happening, but try: vfs.zfs.trim.enabled=0 in /boot/loader.conf and see if your issues go away.

the FreeBSD TRIM code for ZFS basicallys waits until the sector has been free for a while (to avoid doing a TRIM on a block we'll immediately reuse), so your benchmark will run file for a little while, then suddenly the TRIM will kick in.

For postgres, fio, bonnie++ etc, make sure the ZFS dataset you are storing the data on / benchmarking has a recordsize that matches the workload.

If you are doing a write-only benchmark, and you see lots of reads in gstat, you know you are having to do read/modify/write's, and that is why your performance is so bad.

--
Allan Jude
_______________________________________________
freebsd-hackers at freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"

This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately delete it. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to Weatherford e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (spam). This process could result in deletion of a legitimate e-mail before it is read by its intended recipient at our organization. Moreover, based on the scanning results, the full text of e-mails and attachments may be made available to Weatherford security and other personnel for review and appropriate action. If you have any concerns about this process, please contact us at dataprivacy at weatherford.com.