Re: git: 2f036705f337 - main - Document the two recent newsyslog(8) change (-c option and <compress> configuration option).

From: Xin LI <delphij_at_gmail.com>
Date: Tue, 09 Jan 2024 19:29:25 UTC
Hi, Olivier,

On Tue, Jan 9, 2024 at 2:19 AM Olivier Certner <olce@freebsd.org> wrote:

> [...]
> > Sorry not to have noticed this in the review; it was only when I saw this
> > message that it sunk in that we now have *three* ways to specify
> compression,
> > and I'm not even sure what the precedence is.  I would have thought that
> > <compress> would replace -c.  It's a mess if the config file has entries
> > that specify J and X flags as well as none, the config file has
> > <compress> zstd, and the -c option is given as well.  We now have a knob
> > to override the knob to override a knob. The only reason to keep -c that
> > I can think of is to specify a different compression in a single
> invocation,
> > but as noted, changing compression requires manual operations that make
> > it unreasonable to change it invocation by invocation.
>
> I agree.  Two possibilies that I can think of from here: Remove '-c' or
> make it enable compression regardless of the log files' individual settings.
>

I am open to removing '-c'.

Could you please clarify what you mean by "make it enable compression" --
did you mean that we mark all log files to be compressible?  (It's probably
not a good idea as some "log" files may be binary and not really
compressible).

>
> > I still think it would be much better to add an option letter to select
> > the default compression as specified by <compress>.  This would eliminate
> > the need for "legacy", and it would add the ability to have both a global
> > default and an exception.  I think the redefinition of the existing flags
> > to have different meanings if <compress> is given is messy.
>
> I didn't think about that at first.  I agree.
>
> If people want to be able to override compression settings globally, which
> I find useful, one could introduce another directive such as
> <compress_override> taking a boolean to request to apply the <compress>
> option regardless of the individual compression letters.
>
> Another possibility is just to rename "<compress>" to
> "<compress_override>" (so, this time, not a boolean) and keep its current
> behavior.  This would match one of the suggestions above about '-c', but
> then there's the question of which one takes precedence, and I think that
> the command-line specification should prevail (for practical purposes and
> POLA).
>
> > The entry for -c says that we plan to change the default to "none" in
> 15.0.
> > Hopefully that would be done via <compress> and not -c.  However, there
> > was significant pushback on "none" being the default.
>
> I think the default should be "no <compress_override>", i.e., no
> directive.  This may plea for having "none" mean "don't change anything"
> (as if the directive wasn't there) and have something else to deactivate
> compression, such as "no_compression" (which is really an override).  If
> "none" is confusing, then just forego it completely, and have 'newsyslog'
> plain fail on it (but keep "no_compression" as just described).
>
> If there is consensus, I'd then change the 'J' flag currently used for all
> log files to the new chosen flag for generic compression, and have
> <compress_override> set to "bzip2" in a first step (for POLA).  Then, it
> could be changed to something else, e.g., 'zstd'.
>
> Setting it to 'none' seems to me the worst solution (but far from being
> the end of the world).
>

Changing the meaning of all four legacy compression type letters to "file
is compressible" is part of the intention.  The goal is to discourage using
them as a way to specify a compression type, in favor of using the
administrator configured value.

That's said, 'none' is a reasonable default in many ways as explained
before (it makes grep'ing easier, compression is not really that helpful in
the modern world because hard drives are larger than the 90's and it
reduces the times data gets rewritten to SSDs and avoids hourly CPU load
bursts for busy systems).

'bzip2' could be a good second best default (because for most
configurations it's how the log files are compressed with today's
defaults), but if the administrator has already configured their systems to
use a different method, this would break their configuration anyways.


> More deeply, I remember having seen at least two claims that using
> filesystem's compression is better, without arguments.  I don't agree with
> that in practice.  The only advantage of in-filesystem compression, besides
> the administrative simplification that you can also get with the override
> above, is to get O(1) random access to big log files, and I don't see any
> compelling and common use case for it.  You certainly want to get to the
> end of the current log quickly, but that one precisely is not handled by
> 'newsyslog' and stays uncompressed (at the application level).  When you
> want to search for strings or patterns, you have to grep the whole file
> anyway.  You may want to immediately reach the end of some historical log
> file, e.g., when manually going back in time from the current log, but this
> should have negligible latency, and if it doesn't, than just use more and
> smaller log archives.  Same thing if you have a more sophisticated setup
> with an index of log text: Jumping to a particular location in the log file
> should have negligible latency, else apply the same recipe.  If your setup
> with index requires a single, never rotated, log file, then you're not even
> using 'newsyslog' in the first place (or should not).  Although I agree
> that in this case using a compressed filesystem (or a randomly accessible
> archive) can make sense (if your index doesn't already cover the results
> expected from your searches), I very much doubt this is a common setup.
>

There are other benefits of not compressing rotated logs.  For busy
systems, the hourly newsyslog run would process larger logs and cause CPU
workload bursts.

And when logs are compressed, the data is read back and compressed data is
rewritten to disk / SSDs, causing additional wear of the flash storage, and
all that comes with no significant benefit for modern hardware.

(I don't think it's common to have log files indexed after rotation; a more
common use case would be to use [u]grep to look up for a certain pattern).


> Moreover, using in-filesystem compression can lead to degrading the
> compression ratio, since the compression method on ZFS is chosen per
> dataset, which includes a bunch of other files and use cases preventing the
> administrator from choosing the best, and slowest, compression methods.  To
> avoid this problem, one can use a separate dataset for /var/log (anyone?),
> but changing this on already running systems is a greater burden than just
> changing the compression settings in the 'newsyslog' configuration files.
>

Yes, and that's not a big concern.  Achieving the maximum compression ratio
is probably never the goal for most scenarios (not limited to logs, but
also other places) where compression is used, and one always has to balance
between the cost and benefit.

If the person is distributing a release image to many thousands of users
over the Internet, it would make a lot of sense to try the best compression
for an 5% reduction of size because that adds up to the bandwidth cost and
optimizes the experience for users, but it doesn't make as much sense to
save, let's say a few MBs of disk space at the expense of spending a few
more minutes every hour, the added "bursts" of slower response time for a
server, and that's usually undesirable for production.

Cheers,