From nobody Tue Jan 09 19:29:25 2024 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4T8gx01WXyz56Lh2; Tue, 9 Jan 2024 19:29:40 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-ej1-x62e.google.com (mail-ej1-x62e.google.com [IPv6:2a00:1450:4864:20::62e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4T8gwz4kMPz4j0s; Tue, 9 Jan 2024 19:29:39 +0000 (UTC) (envelope-from delphij@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ej1-x62e.google.com with SMTP id a640c23a62f3a-a28a6cef709so352774866b.1; Tue, 09 Jan 2024 11:29:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704828577; x=1705433377; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=jy10zpfYdaHwHNgS/wEOKNr80eU6gbw3rVYx4vuB18Y=; b=DUzCRVPA6nCPG+O7AzZCCvqIrbx5jkjPct/DwVvmrMoSS4gV/XKw0RyZdbHjuaYZzI +56vfy4G1JE0b8BfVMZS7BpRlmMKjq8g9L1eX4kykXWB3N3xDi1AbT5hX32WQFc26dg8 ONGG2D47TlfeZzgvHYLQrUtLGOjIb2p9YT2/a3DVBAd0Lg7Fq+MWXE6DiJtpdjp44wYx 7I6p3JLYtyLOcIGzxbaLFogsTqAFMZOYF4s/ujBBjxjGFq8BnHSDwgaAQ240AbXPW9+U 3C4vdreAh1j95kUTGvje75LgNMOom8TlcAUHejPAWO/1Ae/O5ygeLtnnRPnoLsCh6Fyc WqRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704828577; x=1705433377; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jy10zpfYdaHwHNgS/wEOKNr80eU6gbw3rVYx4vuB18Y=; b=o3GY0qJ4envodwskyVqt3lq8dtq6nlVu/+UOu8oRF8ecPV1yF4VHu2yqGG1QEXs+V/ oAhobiXWvZianJ7UFAx4YV+z/RmlYIP8Lyx24ymlFdae5Ksyahcv1XikyfsKlxt95Z7B gIleAfyhw6B35DON1H9zNzp1/lQ/hMKWe1Ea++l5QRw1EzqNgtF0YmFeyqc5h36dVa/P l1lS5m3V/G7wC6yOaUeFXvaPmL1CNsV7WBWNS3ZlJxNf3QBatdHex0xZHL50rVMx/I2T kQFEKLHhbQBQZXwZX+YTgdVMkrDTBkJ9MWNylStDQnOBDogwbPCL8xe9+BU1dkz37ZQP 40Eg== X-Gm-Message-State: AOJu0Yx24mJ6mmpdJCXmTdup55MzfISL3/5aVPfEvEMec7TWsmoXB6hL IBUlBE/7QxSXYZJGIyfWIP3T7NgzXet3/U9PwNjNx+mPdhE= X-Google-Smtp-Source: AGHT+IF2q22F48sRef3W5UFcNIGuMx3+1w2B3AUCtY0jxJBlotyV148XLYo4riCM25fkvQeZ/MONpyiB1PQkzoKaz7k= X-Received: by 2002:a17:906:1cd:b0:a29:e2e7:8faa with SMTP id 13-20020a17090601cd00b00a29e2e78faamr825656ejj.101.1704828576684; Tue, 09 Jan 2024 11:29:36 -0800 (PST) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 References: <202312290846.3BT8kOiO029918@gitrepo.freebsd.org> <90D0905E-AA46-4351-AEE0-9ED9D835DB50@karels.net> <2683023.poxlI1A5LX@ravel> In-Reply-To: <2683023.poxlI1A5LX@ravel> From: Xin LI Date: Tue, 9 Jan 2024 11:29:25 -0800 Message-ID: Subject: Re: git: 2f036705f337 - main - Document the two recent newsyslog(8) change (-c option and configuration option). To: Olivier Certner Cc: Xin LI , Mike Karels , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: multipart/alternative; boundary="000000000000b760b8060e8855d6" X-Rspamd-Queue-Id: 4T8gwz4kMPz4j0s X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] --000000000000b760b8060e8855d6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Olivier, On Tue, Jan 9, 2024 at 2:19=E2=80=AFAM Olivier Certner w= rote: > [...] > > Sorry not to have noticed this in the review; it was only when I saw th= is > > message that it sunk in that we now have *three* ways to specify > compression, > > and I'm not even sure what the precedence is. I would have thought tha= t > > would replace -c. It's a mess if the config file has entrie= s > > that specify J and X flags as well as none, the config file has > > zstd, and the -c option is given as well. We now have a kno= b > > to override the knob to override a knob. The only reason to keep -c tha= t > > I can think of is to specify a different compression in a single > invocation, > > but as noted, changing compression requires manual operations that make > > it unreasonable to change it invocation by invocation. > > I agree. Two possibilies that I can think of from here: Remove '-c' or > make it enable compression regardless of the log files' individual settin= gs. > I am open to removing '-c'. Could you please clarify what you mean by "make it enable compression" -- did you mean that we mark all log files to be compressible? (It's probably not a good idea as some "log" files may be binary and not really compressible). > > > I still think it would be much better to add an option letter to select > > the default compression as specified by . This would elimina= te > > the need for "legacy", and it would add the ability to have both a glob= al > > default and an exception. I think the redefinition of the existing fla= gs > > to have different meanings if is given is messy. > > I didn't think about that at first. I agree. > > If people want to be able to override compression settings globally, whic= h > I find useful, one could introduce another directive such as > taking a boolean to request to apply the > option regardless of the individual compression letters. > > Another possibility is just to rename "" to > "" (so, this time, not a boolean) and keep its current > behavior. This would match one of the suggestions above about '-c', but > then there's the question of which one takes precedence, and I think that > the command-line specification should prevail (for practical purposes and > POLA). > > > The entry for -c says that we plan to change the default to "none" in > 15.0. > > Hopefully that would be done via and not -c. However, there > > was significant pushback on "none" being the default. > > I think the default should be "no ", i.e., no > directive. This may plea for having "none" mean "don't change anything" > (as if the directive wasn't there) and have something else to deactivate > compression, such as "no_compression" (which is really an override). If > "none" is confusing, then just forego it completely, and have 'newsyslog' > plain fail on it (but keep "no_compression" as just described). > > If there is consensus, I'd then change the 'J' flag currently used for al= l > log files to the new chosen flag for generic compression, and have > set to "bzip2" in a first step (for POLA). Then, it > could be changed to something else, e.g., 'zstd'. > > Setting it to 'none' seems to me the worst solution (but far from being > the end of the world). > Changing the meaning of all four legacy compression type letters to "file is compressible" is part of the intention. The goal is to discourage using them as a way to specify a compression type, in favor of using the administrator configured value. That's said, 'none' is a reasonable default in many ways as explained before (it makes grep'ing easier, compression is not really that helpful in the modern world because hard drives are larger than the 90's and it reduces the times data gets rewritten to SSDs and avoids hourly CPU load bursts for busy systems). 'bzip2' could be a good second best default (because for most configurations it's how the log files are compressed with today's defaults), but if the administrator has already configured their systems to use a different method, this would break their configuration anyways. > More deeply, I remember having seen at least two claims that using > filesystem's compression is better, without arguments. I don't agree wit= h > that in practice. The only advantage of in-filesystem compression, besid= es > the administrative simplification that you can also get with the override > above, is to get O(1) random access to big log files, and I don't see any > compelling and common use case for it. You certainly want to get to the > end of the current log quickly, but that one precisely is not handled by > 'newsyslog' and stays uncompressed (at the application level). When you > want to search for strings or patterns, you have to grep the whole file > anyway. You may want to immediately reach the end of some historical log > file, e.g., when manually going back in time from the current log, but th= is > should have negligible latency, and if it doesn't, than just use more and > smaller log archives. Same thing if you have a more sophisticated setup > with an index of log text: Jumping to a particular location in the log fi= le > should have negligible latency, else apply the same recipe. If your setu= p > with index requires a single, never rotated, log file, then you're not ev= en > using 'newsyslog' in the first place (or should not). Although I agree > that in this case using a compressed filesystem (or a randomly accessible > archive) can make sense (if your index doesn't already cover the results > expected from your searches), I very much doubt this is a common setup. > There are other benefits of not compressing rotated logs. For busy systems, the hourly newsyslog run would process larger logs and cause CPU workload bursts. And when logs are compressed, the data is read back and compressed data is rewritten to disk / SSDs, causing additional wear of the flash storage, and all that comes with no significant benefit for modern hardware. (I don't think it's common to have log files indexed after rotation; a more common use case would be to use [u]grep to look up for a certain pattern). > Moreover, using in-filesystem compression can lead to degrading the > compression ratio, since the compression method on ZFS is chosen per > dataset, which includes a bunch of other files and use cases preventing t= he > administrator from choosing the best, and slowest, compression methods. = To > avoid this problem, one can use a separate dataset for /var/log (anyone?)= , > but changing this on already running systems is a greater burden than jus= t > changing the compression settings in the 'newsyslog' configuration files. > Yes, and that's not a big concern. Achieving the maximum compression ratio is probably never the goal for most scenarios (not limited to logs, but also other places) where compression is used, and one always has to balance between the cost and benefit. If the person is distributing a release image to many thousands of users over the Internet, it would make a lot of sense to try the best compression for an 5% reduction of size because that adds up to the bandwidth cost and optimizes the experience for users, but it doesn't make as much sense to save, let's say a few MBs of disk space at the expense of spending a few more minutes every hour, the added "bursts" of slower response time for a server, and that's usually undesirable for production. Cheers, --000000000000b760b8060e8855d6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi, Olivier,

On Tue, Jan 9, 2024 at 2:19= =E2=80=AFAM Olivier Certner <olce@fr= eebsd.org> wrote:
[...]
> Sorry not to have noticed this in the review; it was only when I saw t= his
> message that it sunk in that we now have *three* ways to specify compr= ession,
> and I'm not even sure what the precedence is.=C2=A0 I would have t= hought that
> <compress> would replace -c.=C2=A0 It's a mess if the config= file has entries
> that specify J and X flags as well as none, the config file has
> <compress> zstd, and the -c option is given as well.=C2=A0 We no= w have a knob
> to override the knob to override a knob. The only reason to keep -c th= at
> I can think of is to specify a different compression in a single invoc= ation,
> but as noted, changing compression requires manual operations that mak= e
> it unreasonable to change it invocation by invocation.

I agree.=C2=A0 Two possibilies that I can think of from here: Remove '-= c' or make it enable compression regardless of the log files' indiv= idual settings.

I am open to removing '= -c'.

Could you please clarify what you mean by "make it ena= ble compression" -- did you mean that we mark all log files to be comp= ressible?=C2=A0 (It's probably not a good idea as some "log" = files may be binary and not really compressible).=C2=A0

> I still think it would be much better to add an option letter to selec= t
> the default compression as specified by <compress>.=C2=A0 This w= ould eliminate
> the need for "legacy", and it would add the ability to have = both a global
> default and an exception.=C2=A0 I think the redefinition of the existi= ng flags
> to have different meanings if <compress> is given is messy.

I didn't think about that at first.=C2=A0 I agree.

If people want to be able to override compression settings globally, which = I find useful, one could introduce another directive such as <compress_o= verride> taking a boolean to request to apply the <compress> optio= n regardless of the individual compression letters.

Another possibility is just to rename "<compress>" to "= ;<compress_override>" (so, this time, not a boolean) and keep it= s current behavior.=C2=A0 This would match one of the suggestions above abo= ut '-c', but then there's the question of which one takes prece= dence, and I think that the command-line specification should prevail (for = practical purposes and POLA).

> The entry for -c says that we plan to change the default to "none= " in 15.0.
> Hopefully that would be done via <compress> and not -c.=C2=A0 Ho= wever, there
> was significant pushback on "none" being the default.

I think the default should be "no <compress_override>", i.e= ., no directive.=C2=A0 This may plea for having "none" mean "= ;don't change anything" (as if the directive wasn't there) and= have something else to deactivate compression, such as "no_compressio= n" (which is really an override).=C2=A0 If "none" is confusi= ng, then just forego it completely, and have 'newsyslog' plain fail= on it (but keep "no_compression" as just described).

If there is consensus, I'd then change the 'J' flag currently u= sed for all log files to the new chosen flag for generic compression, and h= ave <compress_override> set to "bzip2" in a first step (for= POLA).=C2=A0 Then, it could be changed to something else, e.g., 'zstd&= #39;.

Setting it to 'none' seems to me the worst solution (but far from b= eing the end of the world).

Changing the m= eaning of all four legacy compression type letters to "file is compres= sible" is part of the intention.=C2=A0 The goal is to discourage using= them as a way to specify a compression type, in favor of using the adminis= trator configured value.

That's said, 'none' is a = reasonable default in many ways as=C2=A0explained before (it makes grep'= ;ing easier, compression is not really that helpful in the modern world bec= ause hard drives are larger than the 90's and it reduces the times data= gets rewritten to SSDs and=C2=A0avoids hourly CPU load bursts for busy sys= tems).

'bzip2' could be a good second best default (because = for most configurations it's how the log files are compressed with toda= y's defaults), but if the administrator has already configured their sy= stems to use a different method, this would break their configuration anywa= ys.
=C2=A0
More deeply, I remember having seen at least two claims that using filesys= tem's compression is better, without arguments.=C2=A0 I don't agree= with that in practice.=C2=A0 The only advantage of in-filesystem compressi= on, besides the administrative simplification that you can also get with th= e override above, is to get O(1) random access to big log files, and I don&= #39;t see any compelling and common use case for it.=C2=A0 You certainly wa= nt to get to the end of the current log quickly, but that one precisely is = not handled by 'newsyslog' and stays uncompressed (at the applicati= on level).=C2=A0 When you want to search for strings or patterns, you have = to grep the whole file anyway.=C2=A0 You may want to immediately reach the = end of some historical log file, e.g., when manually going back in time fro= m the current log, but this should have negligible latency, and if it doesn= 't, than just use more and smaller log archives.=C2=A0 Same thing if yo= u have a more sophisticated setup with an index of log text: Jumping to a p= articular location in the log file should have negligible latency, else app= ly the same recipe.=C2=A0 If your setup with index requires a single, never= rotated, log file, then you're not even using 'newsyslog' in t= he first place (or should not).=C2=A0 Although I agree that in this case us= ing a compressed filesystem (or a randomly accessible archive) can make sen= se (if your index doesn't already cover the results expected from your = searches), I very much doubt this is a common setup.
<= br>
There are other benefits of not compressing rotated logs.=C2=A0 F= or busy systems, the hourly newsyslog run would process larger logs and cau= se CPU workload bursts.

And when logs are compressed, the data= is read back and compressed data is rewritten to disk / SSDs, causing addi= tional wear of the flash storage, and all that comes with no significant be= nefit for modern hardware.

(I don't think it's common to hav= e log files indexed after rotation; a more common use case would be to use = [u]grep to look up for a certain pattern).
=C2=A0
Moreover, using in-filesystem compression can lead to degrading the compres= sion ratio, since the compression method on ZFS is chosen per dataset, whic= h includes a bunch of other files and use cases preventing the administrato= r from choosing the best, and slowest, compression methods.=C2=A0 To avoid = this problem, one can use a separate dataset for /var/log (anyone?), but ch= anging this on already running systems is a greater burden than just changi= ng the compression settings in the 'newsyslog' configuration files.=

Yes, and that's not a big concern.=C2= =A0 Achieving the maximum compression ratio is probably never the goal for = most scenarios (not limited to logs, but also other places) where compressi= on is used, and one always has to balance between the cost and benefit.

If the person is distributing a release image to many thousa= nds of users over the Internet, it would make a lot of sense to try the bes= t compression for an 5% reduction of size because that adds up to the bandw= idth cost and optimizes the experience for users, but it doesn't make a= s much sense to save, let's say a few MBs of disk space at the expense = of spending a few more minutes every hour, the added "bursts" of = slower response time for a server, and that's usually undesirable for p= roduction.

Cheers,
--000000000000b760b8060e8855d6--