Re: git: 2f036705f337 - main - Document the two recent newsyslog(8) change (-c option and <compress> configuration option).

From: Olivier Certner <olce_at_freebsd.org>
Date: Tue, 09 Jan 2024 10:19:48 UTC
Hi,

Sorry not to have gotten to that earlier.

I had initially expressed on IRC that I found the newsyslog changes great, but now reading Mike's arguments and proposals I have serious doubts on the current approach.

> Sorry not to have noticed this in the review; it was only when I saw this
> message that it sunk in that we now have *three* ways to specify compression,
> and I'm not even sure what the precedence is.  I would have thought that
> <compress> would replace -c.  It's a mess if the config file has entries
> that specify J and X flags as well as none, the config file has
> <compress> zstd, and the -c option is given as well.  We now have a knob
> to override the knob to override a knob. The only reason to keep -c that
> I can think of is to specify a different compression in a single invocation,
> but as noted, changing compression requires manual operations that make
> it unreasonable to change it invocation by invocation.

I agree.  Two possibilies that I can think of from here: Remove '-c' or make it enable compression regardless of the log files' individual settings.
 
> I still think it would be much better to add an option letter to select
> the default compression as specified by <compress>.  This would eliminate
> the need for "legacy", and it would add the ability to have both a global
> default and an exception.  I think the redefinition of the existing flags
> to have different meanings if <compress> is given is messy.

I didn't think about that at first.  I agree.

If people want to be able to override compression settings globally, which I find useful, one could introduce another directive such as <compress_override> taking a boolean to request to apply the <compress> option regardless of the individual compression letters.

Another possibility is just to rename "<compress>" to "<compress_override>" (so, this time, not a boolean) and keep its current behavior.  This would match one of the suggestions above about '-c', but then there's the question of which one takes precedence, and I think that the command-line specification should prevail (for practical purposes and POLA).
 
> The entry for -c says that we plan to change the default to "none" in 15.0.
> Hopefully that would be done via <compress> and not -c.  However, there
> was significant pushback on "none" being the default.

I think the default should be "no <compress_override>", i.e., no directive.  This may plea for having "none" mean "don't change anything" (as if the directive wasn't there) and have something else to deactivate compression, such as "no_compression" (which is really an override).  If "none" is confusing, then just forego it completely, and have 'newsyslog' plain fail on it (but keep "no_compression" as just described).

If there is consensus, I'd then change the 'J' flag currently used for all log files to the new chosen flag for generic compression, and have <compress_override> set to "bzip2" in a first step (for POLA).  Then, it could be changed to something else, e.g., 'zstd'.

Setting it to 'none' seems to me the worst solution (but far from being the end of the world).

More deeply, I remember having seen at least two claims that using filesystem's compression is better, without arguments.  I don't agree with that in practice.  The only advantage of in-filesystem compression, besides the administrative simplification that you can also get with the override above, is to get O(1) random access to big log files, and I don't see any compelling and common use case for it.  You certainly want to get to the end of the current log quickly, but that one precisely is not handled by 'newsyslog' and stays uncompressed (at the application level).  When you want to search for strings or patterns, you have to grep the whole file anyway.  You may want to immediately reach the end of some historical log file, e.g., when manually going back in time from the current log, but this should have negligible latency, and if it doesn't, than just use more and smaller log archives.  Same thing if you have a more sophisticated setup with an index of log text: Jumping to a particular location in the log file should have negligible latency, else apply the same recipe.  If your setup with index requires a single, never rotated, log file, then you're not even using 'newsyslog' in the first place (or should not).  Although I agree that in this case using a compressed filesystem (or a randomly accessible archive) can make sense (if your index doesn't already cover the results expected from your searches), I very much doubt this is a common setup.

Moreover, using in-filesystem compression can lead to degrading the compression ratio, since the compression method on ZFS is chosen per dataset, which includes a bunch of other files and use cases preventing the administrator from choosing the best, and slowest, compression methods.  To avoid this problem, one can use a separate dataset for /var/log (anyone?), but changing this on already running systems is a greater burden than just changing the compression settings in the 'newsyslog' configuration files.

I'd like people who disagree with this to present arguments for their case, if for nothing else to share their experience and best practices on log management.

Thanks and regards.

-- 
Olivier Certner