ZFS: How to enable cache and logs.
Jeremy Chadwick
freebsd at jdc.parodius.com
Thu May 12 02:08:09 UTC 2011
On Wed, May 11, 2011 at 09:48:48PM -0400, Jason Hellenthal wrote:
> Jeremy, As always the qaulity of your messages are 101% spot on and I
> always find some new new information that becomes handy more often than I
> could say, and there is always something to be learned.
>
> Thanks.
>
> On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote:
> > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> > >
> > > Jeremy,
> > >
> > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > > > >should also keep that in mind when putting an SSD into use in this
> > > > > >fashion.
> > > > >
> > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > > > written slowly (on purpose). Any current, or 1-2 generations back SSD
> > > > > would handle that write load without TRIM and without any performance
> > > > > degradation.
> > > > >
> > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > > > NAND should be smaller (my wild guess, current practice may differ)
> > > > > and the need for rewriting will be small. If you don't need to
> > > > > rewrite already written data, TRIM does not help. Also, as far as I
> > > > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > > > twice or more the advertised size and always write to fresh cells,
> > > > > scheduling an background erase of the 'overwritten' cell.
> > > >
> > > > AFAIK, drive manufacturers do not disclose just how much reallocation
> > > > space they keep available on an SSD. I'd rather not speculate as to how
> > > > much, as I'm certain it varies per vendor.
> > > >
> > >
> > > Lets not forget here: The size of the separate log device may be quite
> > > small. A rule of thumb is that you should size the separate log to be able
> > > to handle 10 seconds of your expected synchronous write workload. It would
> > > be rare to need more than 100 MB in a separate log device, but the
> > > separate log must be at least 64 MB.
> > >
> > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> > >
> > > So in other words how much is TRIM really even effective give the above ?
> > >
> > > Even with a high database write load on the disks at full compacity of the
> > > incoming link I would find it hard to believe that anyone could get the
> > > ZIL to even come close to 512MB.
> >
> > In the case of an SSD being used as a log device (ZIL), I imagine it
> > would only matter the longer the drive was kept in use. I do not use
> > log devices anywhere with ZFS, so I can't really comment.
> >
> > In the case of an SSD being used as a cache device (L2ARC), I imagine it
> > would matter much more.
> >
> > In the case of an SSD being used as a pool device, it matters greatly.
> >
> > Why it matters: there's two methods of "reclaiming" blocks which were
> > used: internal SSD "garbage collection" and TRIM. For a NAND block to be
> > reclaimed, it has to be erased -- SSDs erase things in pages rather
> > than individual LBAs. With TRIM, you submit the data management command
> > via ATA with a list of LBAs you wish to inform the drive are no longer
> > used. The drive aggregates the LBA ranges, determines if an entire
> > flash page can be erased, and does it. If it can't, it makes some sort
> > of mental note that the individual LBA (in some particular page)
> > shouldn't be used.
> >
> > The "garbage collection" works when the SSD is idle. I have no idea
> > what "idle" actually means operationally, because again, vendors don't
> > disclose what the idle intervals are. 5 minutes? 24 hours? It
> > matters, but they don't tell us. (What confuses me about the "idle GC"
> > method is how it determines what it can erase -- if the OS didn't tell
> > it what it's using, how does it know it can erase the page?)
> >
> > Anyway, how all this manifests itself performance-wise is intriguing.
> > It's not speculation: there's hard evidence that not using TRIM results
> > in SSD performance, bluntly put, sucking badly on some SSDs.
> >
> > There's this mentality that wear levelling completely solves all of the
> > **performance** concerns -- that isn't the case at all. In fact, I'm
> > under the impression it probably hurts performance, but it depends on
> > how it's implemented within the drive firmware.
> >
> > bit-tech did an experiment using Windows 7 -- which supports and uses
> > TRIM assuming the device advertises the capability -- with different
> > models of SSDs. The testing procedure is documented here, but I'll
> > document it as well:
> >
> > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4
> >
> > Again, remember, this is done on a Windows 7 system which does support
> > TRIM if the device supports it. The testing steps, in this order:
> >
> > 1) SSD without TRIM support -- all LBAs are zeroed.
> > 2) Took read/write benchmark readings.
> > 3) SSD without TRIM support -- partitioned and formatted as NTFS
> > (cluster size unknown), copied 100GB of data to the drive, deleted all
> > the data, and repeated this method 10 times.
> > 4) Step #2 repeated.
> > 5) Upgraded SSD firmware to a version that supports TRIM.
> > 6) SSD with TRIM support -- step #1 repeated.
> > 7) Step #2 repeated.
> > 8) SSD with TRIM support -- step #3 repeated.
> > 9) Step #2 repeated.
> >
> > Without TRIM, some drives drop their read performance by more than 50%,
> > and write performance by almost 70%. I'm focusing on Intel SSDs here,
> > by the way. I do not care for OCZ or Corsair products.
> >
> > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
> > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
> > on FreeBSD will mimic (to some degree).
> >
> > Therefore, simply put, users should be concerned when using ZFS on
> > FreeBSD with SSDs. It doesn't matter to me if you're only using
> > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> > means degraded performance over time.
> >
> > Can you refute any of this evidence?
> >
>
> At least now at the moment NO. But I can say depending on how large of a
> use of SSDs with OpenSolaris users from before the Oracle reaping that I
> didnt recall seeing any relative bug reports on degradation. But like I
> said... I havent seen them but thats not to say there wasnt a lack of use
> either. Definately more to look into, test, benchmark & test again.
>
> > > Given most SSD's come at a size greater than 32GB I hope this comes as a
> > > early reminder that the ZIL you are buying that disk for is only going to
> > > be using a small percent of that disk and I hope you justify cost over its
> > > actual use. If you do happen to justify creating a ZIL for your pool then
> > > I hope that you partition it wisely to make use of the rest of the space
> > > that is untouched.
> > >
> > > For all other cases I would reccomend if you still want to have a ZIL that
> > > you take some sort of PCI->SD CARD or USB stick into account with
> > > mirroring.
> >
> > Others have pointed out this isn't effective (re: USB sticks). The read
> > and write speeds are too slow, and limit the overall performance of ZFS
> > in a very bad way. I can absolutely confirm this claim (I've tested it
> > myself, using a high-end USB flash drive as a cache device (L2ARC)).
> >
> > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
> > *does* improve performance on older systems which have slower disk I/O
> > (e.g. ICH5-based systems).
> >
>
> Agreed. Soon as the bus speed, write speeds are greater than the speeds
> that USB 2.0 can handle, then any USB based solution is useless. ICH5 and
> up would be right about that time you would see this starting to happen.
>
> sdcards/cfcards mileage may vary depending on the transfer rates. But
> still the same situation applies like you said once your main pool
> throughput outweighs the throughput on your ZIL then its probably not
> worth even having a ZIL or a Cache device. Emphasis on Cache moreso than
> ZIL.
>
>
> Anyway all good information for those to make the judgement whether they
> need a cache or a zil.
>
>
> Thanks again Jeremy. Always appreciated.
You're welcome.
It's important to note that much of what I say is stuff I've learned and
read (technical documentation usually) on my own -- which means I almost
certainly misunderstand certain pieces of technology. There are a *lot*
of people here who understand it much better than I do. (I'm looking at
you, jhb@ ;-) )
As such, I probably should have CC'd pjd@ on this thread, since he's
talked a bit about how to get ZFS on FreeBSD to work with TRIM, and when
to issue the erasing of said blocks.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-fs
mailing list