From nobody Fri Mar 25 14:27:12 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id BF9B81A48CF6 for ; Fri, 25 Mar 2022 14:27:25 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x92c.google.com (mail-ua1-x92c.google.com [IPv6:2607:f8b0:4864:20::92c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KQ4DX6F2Bz4jvL for ; Fri, 25 Mar 2022 14:27:24 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x92c.google.com with SMTP id k11so3060865uap.1 for ; Fri, 25 Mar 2022 07:27:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=21MJM0uToIs9YTx2RPRc9bI8y69W/C6Jjq2aPhkiAOQ=; b=CY4LN4Q3g9hw69L4MQWr/deguEdZd/xxcr94ydilXXHSy69zxlvb+5X5RUVFP0QmGn bFT7R8Nz8Cq/yJ2vfcSgef0loFo0ukTNEC0qHkqm5gNKZzQFfJc/g1xvkzPJh7Nb4GUR yppkI2BGvWu3poqXH89FynuQW14cqXsa50bn42PgaG2fBeYFz7DxUxZi4AeJ8rR8T8W7 g3G8FFYWkTM3pUryxnbV5v8xQwQrc6VDYnRI1wyh2ScfVeokkViwC/NxnvFt1XtO7/A+ wZtYZh6llmhadA1lKs+/ucJjQNdkkCphX91u4au9lZ/zmRj1DZoAmj7L7D/nM478DyXR uHmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=21MJM0uToIs9YTx2RPRc9bI8y69W/C6Jjq2aPhkiAOQ=; b=VE63OAximi4y4jMTlTj6ezP3W0cM5svACX+XOfqaiwPfoOoJ4sWpPg4A7jatwDUYAg dVyOzhWbIQ80tXNh8nqr8J2ZzyXTeeDpnQw7mJxmXeCdrrOuHOrc1roXDYtXG+ORvUCY 7jhKpD70KdxBJiLrRoLUqk1eayaxP50VklprXYWkaM2u7QscXkBTscDkW9QWqmW/paA7 r0J6Z1dPlVWX2eLFq+A+yuNFKZevM5zpZPTCZfDBH6kbrR1rajdXPNyHeeI6VSYcYBcD BK6fhCGmdjmT1LmVGxVFNZmBSeLztwQEMyCB+NmfswRGokp8keB0YZs4HCP240cyBUrS e28w== X-Gm-Message-State: AOAM530Ml8RHICfhbkCzkx2T9tMVMXY0Zy3sbrm7dRdNqeCT0kbNk/gT z7/ZTXURNNfzKZzG5z+N/ag8liSzpoDxPvWT4IDfEg4U4vcsBA== X-Google-Smtp-Source: ABdhPJyces/Z41NrP+iswjhI8mLZtk1YBpCydyE5wQTVaTnqQXAoNoPEF/iDMPA03fSvTCde0DEkkyQE9XxiHh4etpc= X-Received: by 2002:ab0:67cf:0:b0:341:257f:ce52 with SMTP id w15-20020ab067cf000000b00341257fce52mr4804907uar.109.1648218444117; Fri, 25 Mar 2022 07:27:24 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: <70B211BB-15BA-47A4-8F9C-C833AA8C1EAA@freebsd.org> <202203241519.22OFJ3Mk098649@gndrsh.dnsmgr.net> <71356.1648139436@kaos.jnpr.net> <7773a0c73c77649efaf9f748ee8bb0b4@gundo.com> In-Reply-To: <7773a0c73c77649efaf9f748ee8bb0b4@gundo.com> From: Warner Losh Date: Fri, 25 Mar 2022 08:27:12 -0600 Message-ID: Subject: Re: What's the locale for system files (e.g. /etc/fstab)? To: Pau Amma Cc: FreeBSD Hackers Content-Type: multipart/alternative; boundary="000000000000dfc90105db0bc2bc" X-Rspamd-Queue-Id: 4KQ4DX6F2Bz4jvL X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=CY4LN4Q3; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::92c) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-1.99 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112]; NEURAL_HAM_MEDIUM(-0.99)[-0.986]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::92c:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MLMMJ_DEST(0.00)[freebsd-hackers]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; SUBJECT_ENDS_QUESTION(1.00)[]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com] X-ThisMailContainsUnwantedMimeParts: N --000000000000dfc90105db0bc2bc Content-Type: text/plain; charset="UTF-8" On Fri, Mar 25, 2022, 5:10 AM Pau Amma wrote: > (pruned cc: to just the list) > > On 2022-03-25 04:08, Warner Losh wrote: > > On Thu, Mar 24, 2022 at 2:51 PM Phil Shafer wrote: > > > >> On 24 Mar 2022, at 15:12, Warner Losh wrote: > >> > That is the primary reason for system files always being C.UTF-8... > >> > There is no way to tag it as anything else... and some of these files > >> > are often parsed from a context that can't set the locale, like the > >> > boot loader or the kernel... also, these files have a format that was > >> > defined back in the 7bit ascii time frame. They also don't make use of > >> > the text in a way that isn't literal... > >> > >> Exactly. There's just no way to know in the current setup. And > >> declaring it UTF-8 will break anyone currently using locale-based > >> values. Using the symlink has the value of allowing a simple fix > >> ("sudo > >> ln -s $LANG /etc/locale"). > > > > Except it's not a simple fix. Sure, you can find this value, but > > nothing > > will use it, necessarily. Since there's little value and little need, I > > think it would be more hassle than it's worth absent a much more > > extensive audit. For system wide things like config files, we assume > > C.UTF-8 or the lessor ASCII-7 (or maybe ASCII-8). > > There's no ASCII-8. (If you meant 8859-*, there's 15 or 16, which > essentially means "no".) Assuming ASCII (and therefore 7-bit) went out > of style last millenium. Anything that expects or enforces something > other than Unicode (which for all practical purposes means UTF-8) needs > to be fixed urgently. > Ascii-8 here is just a sloppy shorthand for no multi byte character support. All the parsing routines just look for certain fixed byte separators for sequences of bytes. This will likely never change, but if it does a lot of work to prove correctness needs to happen and all the things that read these files would need to change. UTF-8 works because it mostly avoids encodings that would get in the way of this naive code since the encoding sequences can't have 7bit ascii values in them and all the special characters are 7bit ascii. Warner -- > #BlackLivesMatter #TransWomenAreWomen #AccessibilityMatters > #StandWithUkrainians > English: he/him/his (singular they/them/their/theirs OK) > French: il/le/lui (iel/iel and ielle/ielle OK) > Tagalog: siya/niya/kaniya (please avoid sila/nila/kanila) > > --000000000000dfc90105db0bc2bc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Fri, Mar 25, 2022, 5:10 AM Pau Amma <pauamma@gund= o.com> wrote:
(pruned cc: to= just the list)

On 2022-03-25 04:08, Warner Losh wrote:
> On Thu, Mar 24, 2022 at 2:51 PM Phil Shafer <phil@juniper.= net> wrote:
>
>> On 24 Mar 2022, at 15:12, Warner Losh wrote:
>> > That is the primary reason for system files always being C.UT= F-8...
>> > There is no way to tag it as anything else... and some of the= se files
>> > are often parsed from a context that can't set the locale= , like the
>> > boot loader or the kernel... also, these files have a format = that was
>> > defined back in the 7bit ascii time frame. They also don'= t make use of
>> > the text in a way that isn't literal...
>>
>> Exactly.=C2=A0 There's just no way to know in the current setu= p.=C2=A0 And
>> declaring it UTF-8 will break anyone currently using locale-based<= br> >> values.=C2=A0 Using the symlink has the value of allowing a simple= fix
>> ("sudo
>> ln -s $LANG /etc/locale").
>
> Except it's not a simple fix. Sure, you can find this value, but <= br> > nothing
> will use it, necessarily. Since there's little value and little ne= ed, I
> think it would be more hassle than it's worth absent a much more > extensive audit. For system wide things like config files, we assume > C.UTF-8 or the lessor ASCII-7 (or maybe ASCII-8).

There's no ASCII-8. (If you meant 8859-*, there's 15 or 16, which <= br> essentially means "no".) Assuming ASCII (and therefore 7-bit) wen= t out
of style last millenium. Anything that expects or enforces something
other than Unicode (which for all practical purposes means UTF-8) needs to be fixed urgently.

Ascii-8 here is just a sloppy shorthand for no multi b= yte character support. All the parsing routines just look for certain fixed= byte separators for sequences of bytes. This will likely never change, but= if it does a lot of work to prove correctness needs to happen and all the = things that read these files would need to change.
<= br>
UTF-8 works because it mostly avoids encodings t= hat would get in the way of this naive code since the encoding sequences ca= n't have 7bit ascii values in them and all the special characters are 7= bit ascii.

Warner
<= div dir=3D"auto">
--
#BlackLivesMatter #TransWomenAreWomen #AccessibilityMatters
#StandWithUkrainians
English: he/him/his (singular they/them/their/theirs OK)
French: il/le/lui (iel/iel and ielle/ielle OK)
Tagalog: siya/niya/kaniya (please avoid sila/nila/kanila)

--000000000000dfc90105db0bc2bc--