From nobody Fri Apr 21 19:51:55 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q34tQ0fs4z46pQy for ; Fri, 21 Apr 2023 19:52:14 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic310-21.consmr.mail.gq1.yahoo.com (sonic310-21.consmr.mail.gq1.yahoo.com [98.137.69.147]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q34tN4wjZz4CS9 for ; Fri, 21 Apr 2023 19:52:12 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=r5lpY+bJ; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.69.147 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1682106730; bh=t4Z5XG/3uEunjp+HwWi8wENjFHNUic9wH5UbWuf0VrY=; h=From:Subject:Date:To:References:From:Subject:Reply-To; b=r5lpY+bJ3I75sB7w7ZIVO1g1rRBTWEp+23lBDbzZGZHiTN7MV2YHs6hatGUTG5/8A5k7OHwGVGDSSQqakzA6wFE7CFgkhd1ubO6m0DPLFD/cst76MAMXkGw/KSJBF0e9yNYqi6J/+5adsRqhDmp7BCkI4ts7Ve7F0PPUEsDJFrV2aNSZNCi/Cst/eYhDxoxFdIIyxNttEfmlZAr8MAAmTGh+6Tf9oZE84BJ4NYPiKvYrHoU5xh7ignuAocE8wP9IgoE8CpiEkpeWUotek37hP/dLGjtZOgAcfFRE0r+p0lJt6bjqnI7FLvohKny/0W0jaqPNHqMyRNvFZwcQJt8eeQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1682106730; bh=RUPj5vVjowWg2AS/nFVX/rbXQGQBhBpph8x5Shxa7K6=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=ZMpiDON+74GMKlsvgAziah/p0JbjPzjFdsCQ4d1i0z2x0pXwKjAujGKYfisZ/RMytibysZRmM9uixUP5nIe3Ma4kw0Z/+dGhuLfpSfXzSLS56no/qBNjBJhdrjCcmEZp1NWn0tYuOHtzlDZrqHrVT99TWCOROBU99dBuwZ7/QlMFA8TxiHtOb49Pd4XTwJYfZ1KYqvMFbjEkqKe4dDl5wJQ7tooWxUfS7w2gxV5aY1dpbFbkebIri+58icYfbnPPoQBEs5gLT4kAkVG3Tqs86pIdtHtIOSWxqd+lFvD911tRZ6hRUNxuWH2iTmYWIoWUxGWCxu0QGkAFKUPkkcvAqA== X-YMail-OSG: dCQ3USIVM1mT9S0D790wyM.y2ku71YlGMQllJgx03PM9kdgJrmXZm76PZ_GKQrf VFCi824jx.TxGFEqxm8ibR0aDuP28MKrQM0TF7uU9whn8QJEW6A9zOI1CDiQOx2p1oaBLEeBbRPU tH6n7_z2VliYtTX2wbmWZHrC6dL3E1APjJPlfymPDwJEwu.LuBv27A_2UZSlf3Xx4yGUEXWtsSfc PRD2lKYS7mC92A521iVN7nC_AgfzjZV9PYIFtvl2JUl2SxlJSLwUqyQZ..ESAgiqrp.Ctb0I6Hoc 7bvKwLXf0vZkYphHJNp7VL2bb8FRHoWEzMSivy7lv5Mwfjar0Bnw1dpIMt9vM0ofFChvS__Q1Opu 1Js4sZexq9y3xbgY.XNkW2JbPhi1xm7tDkPk8l7xQ7WSWkcQ3A5VqLzz_PKCDO_frRHfbiLK.mBv gnQH0qV26s1sAhpH99oA14KBstqqFuXKOUlohJ6OC5sZWkBpiPqE_uxIa2wH9zP2A1HVtsxM4sBu wTrtAdIGVXxQqC8mAt0JbWRPYhHdw4zq0M91arnLWvgK02B2mxhj8_KClatvH31WA0xKKQWxn0mC IV9ULGbq3ysthlVi5trXzrrPT4xMvigVUsq3xUb5PZA.rOF9w9Pg6OMir0zYoFoSaOBgr6BB73sD ct1U1Vc6MmkPvPgv3Z1PLXh6Rrb2Efs1.uV31ZwujBlSr8OQ2FGbytfe8333V9A89LF8Sgkfd_H4 i5_52_5eSinZQN83TWvJO1hdqtSTGOuqZKUYmbwVEau8TrglsWX1MT60t_mGVp8hBK_E_o.ROQJ4 UFsysmoQbdd6P85aIXfOkapBnLD4F9RT4qthbwapWC3jSTrocE9Eqfhrwu5wb7HxkaMRbbFUK.3A A852uFvGkQGAgpHMd_k9TMe3md2_JRwUpzbSX8YOwvSPDAc2VhvV86Pu.6NcSBikuwHSvbXkWrRO 7YbubuNGSny442pP0ajgq31LQH5l0ilQbuDWwu5lQasYFPWDYctdBiqSW6nnE6yRostKr4j1WwTk toSXhLnbHIxDYvMGE7igM7qV4Rrq1w73d25Y26NEgD.8_mNews9wH.sLaz44qsS7kcSjU5h1ZOsI q3vdBWXPnu9nUeIIq8H2U6yu9CZ5J3ElSgdpmiJg1dm4X7FIPTlSIEQgc5zuRz9y3vz4CmoFV8ZL OUxuU0JTQO6R5Uz9zxHSFvY6W6ZLOWfERFwAOO2kvjWQIehApqzOzLm.jHS4ciKqAdxz84sLvlSa J8LwO3bIxOfJnrmMyTlw6SBRfOVZTvGf6yPgVTsAjpI8m_Wd89PlSciJboHqPug1jpf76qTK82f6 B82vzQFajW1wwfl3nQO9Y9t0aEqfkgbt.79ZZP2HoegkmJhEAfPUTZctNB4jnpxlhGPaYFF5Jl2C RF39VLnNYM_akrNyC.MpAIGpX6Hi1Ks84BpqL.9hF8DwUUe6uRmvo.uf19AFHsiMRW1Fwn3B0kDX iKn8BbS2cGdQDqO76KcU3ExsQTwh2bnIsQO0xocC_oSevtNpVnI6d79rzxTAiHxVTS3pJpGLCcZq vUpW9fpjpek6ZPL19s4U8PzeW2eBxYRQG2hISTX02WXSzhpHlb1g3n08JZXV7IK8ubmAm75ma8v0 Fy3xtwPeACKMWF37aLruk69NTmTzCxCu3rKOakapKmBgP1C2UrsTLWpmfJy5TFoHK5g4nCG_NJjg hxGNLFRPK2A8kpytoaE6E_UAJ3CEmzm2ZamzGGG1DZVinhlpYYVMewXQ1obhWq66xCx39QQ75KiM ocizq4EIT_Dj1Y5jc4eWWuuOduqsiR7U9idcFETchA_7cDoSDWPv0hno8rj07mw_EL7ei_u_CDBn VfBAIFSWeo6rr9yYnJ07a42V1TtJjavG1BVtKYSxXr9stGbNxY.flzUQ4VDv0w2ZNbEiCubgW1CB 6OCL7FN_klSjO6HJt9.Um2TB9lslEKoRPI1lyxxcSJYxG0nbbAmyx2p5l10ZNLKmcd8Ae4HAcxwx z1Mv9.gIl5jbwnKwy8UbhsTfQiw7ttOya.jBray_5D_Ttnyp8IlqDSpz0MVE8o61fG6JlpG8Eg8L 0_L68yh9UJQHa8JEJZqTzZelJh.LrsncZ3s9ZavBmDVDhpRS5jSFQnNscXpiJ1UG7xnnJY0vziDc 69h2U4Lgm7wWOCLiU5H8sDz1E.PY.75Ss9KrHwXCjg.vwVgL3gcTOlltm4o9AMUK3MUHzEqlLeNQ UZ7eTVJGHG4s8MY1OHoFwPFPp.Ns6QUwmERGXOIJE.Wut2AxncelFXE4Y6bIIv7lAH7p5p7CUMhc 8.pU- X-Sonic-MF: X-Sonic-ID: 25606138-0e5e-402f-916b-aa83f62199bb Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.gq1.yahoo.com with HTTP; Fri, 21 Apr 2023 19:52:10 +0000 Received: by hermes--production-gq1-546798879c-9kfxl (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 48da965ea590d1257b1e96cba063fabe; Fri, 21 Apr 2023 19:52:06 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: find(1): I18N gone wild ? Message-Id: Date: Fri, 21 Apr 2023 12:51:55 -0700 To: Yuri , Current FreeBSD X-Mailer: Apple Mail (2.3731.400.51.1.1) References: X-Spamd-Result: default: False [-2.49 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.995]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.147:from]; TO_DN_ALL(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; DKIM_TRACE(0.00)[yahoo.com:+]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org] X-Rspamd-Queue-Id: 4Q34tN4wjZz4CS9 X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N Yuri wrote on Date: Fri, 21 Apr 2023 18:18:21 UTC : > Yuri wrote: > > Mark Millard wrote: > >> Dimitry Andric wrote on > >> Date: Fri, 21 Apr 2023 10:38:05 UTC : > >> > >>> On 21 Apr 2023, at 12:01, Ronald Klop = wrote: > >>>> Van: Poul-Henning Kamp > >>>> Datum: maandag, 17 april 2023 23:06 > >>>> Aan: current@freebsd.org > >>>> Onderwerp: find(1): I18N gone wild ? > >>>> This surprised me: > >>>> > >>>> # mkdir /tmp/P > >>>> # cd /tmp/P > >>>> # touch FOO > >>>> # touch bar > >>>> # env LANG=3DC.UTF-8 find . -name '[A-Z]*' -print > >>>> ./FOO > >>>> # env LANG=3Den_US.UTF-8 find . -name '[A-Z]*' -print > >>>> ./FOO > >>>> ./bar > >>>> > >>>> Really ?! > >>> ... > >>>> My Mac and a Linux server only give ./FOO in both cases. Just a 2 = cents remark. > >>> > >>> Same here. However, I have read that with unicode, you should = *never* > >>> use [A-Z] or [0-9], but character classes instead. That seems to = give > >>> both files on macOS and Linux with [[:alpha:]]: > >>> > >>> $ LANG=3Den_US.UTF-8 find . -name '[[:alpha:]]*' -print > >>> ./BAR > >>> ./foo > >>> > >>> and only the lowercase file with [[:lower:]]: > >>> > >>> $ LANG=3Den_US.UTF-8 find . -name '[[:lower:]]*' -print > >>> ./foo > >>> > >>> But on FreeBSD, these don't work at all: > >>> > >>> $ LANG=3Den_US.UTF-8 find . -name '[[:alpha:]]*' -print > >>> > >>> > >>> $ LANG=3Den_US.UTF-8 find . -name '[[:lower:]]*' -print > >>> > >>> > >>> This is an interesting rabbit hole... :) > >> > >> FreeBSD: > >> > >> -name pattern > >> True if the last component of the pathname being examined matches > >> pattern. Special shell pattern matching characters (=E2=80=9C[=E2=80=9D= , =E2=80=9C]=E2=80=9D, > >> =E2=80=9C*=E2=80=9D, and =E2=80=9C?=E2=80=9D) may be used as part = of pattern. These characters > >> may be matched explicitly by escaping them with a backslash > >> (=E2=80=9C\=E2=80=9D). > >> > >> I conclude that [[:alpha:]] and [[:lower:]] were not > >> considered "Special shell pattern"s. "man glob" > >> indicates it is a shell specific builtin. > >> > >> macOS says similarly. Different shells, different > >> pattern notations and capabilities? Well, "man bash" > >> reports: > > [snip] > >> Seems like: pick your shell (as shown by echo $SHELL) and > >> that picks the pattern match rules used. (May be controllable > >> in the specific shell.) > >=20 > > No, the pattern is not passed to shell and shell used should not = matter > > (pattern should be properly escaped). The rules are here: > >=20 > > = https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#= tag_18_13 > >=20 > > ...which in turn refers to the following link for bracket = expressions: > >=20 > > = https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#t= ag_09_03_05 > >=20 > > Why we don't support all of that is different story. >=20 > A bit more on this; first link applies both to find(1) and fnmatch(3), > and find uses fnmatch() internally (which is good), but even the > function that processes bracket expressions is called rangematch() and > that's really all it does ignoring other bracket expression rules: >=20 > https://cgit.freebsd.org/src/tree/lib/libc/gen/fnmatch.c#n234 >=20 > So to "fix" find we just need to implement the bracket expressions > properly in fnmatch(). Too bad the -name documentation does not track this but points to shell notation. The following confirms that even for the IEEE Std 1003.1-2001 that FreeBSD's find is documented to be based on, the notations that you reference were indicated. FreeBSD's man page reports: STANDARDS The find utility syntax is a superset of the syntax specified by = the IEEE Std 1003.1-2001 (=E2=80=9CPOSIX.1=E2=80=9D) standard. All the single character options except -H and -L as well as -amin, -anewer, -cmin, -cnewer, -delete, -empty, -fstype, -iname, -inum, -iregex, -ls, -maxdepth, -mindepth, -mmin, -not, -path, -print0, = -regex, -sparse and all of the -B* birthtime related primaries are = extensions to IEEE Std 1003.1-2001 (=E2=80=9CPOSIX.1=E2=80=9D). . . . IEEE Std 1003.1-2001 find looks to be at: https://pubs.opengroup.org/onlinepubs/009604499/utilities/find.html -name pattern The primary shall evaluate as true if the basename of the = filename being examined matches pattern using the pattern matching = notation described in Pattern Matching Notation. = https://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#= tag_02_13 [ The open bracket shall introduce a pattern bracket expression. The description of basic regular expression bracket expressions in the = Base Definitions volume of IEEE Std 1003.1-2001, Section 9.3.5, RE = Bracket Expression shall also apply to the pattern bracket expression, = https://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap09.html#t= ag_09_03_05 =E2=80=A2 A character class expression shall represent the union of = two sets: =E2=80=A2 The set of single-character collating elements whose = characters belong to the character class, as defined in the LC_CTYPE = category in the current locale. =E2=80=A2 An unspecified set of multi-character collating = elements. All character classes specified in the current locale shall be = recognized. A character class expression is expressed as a character = class name enclosed within bracket-colon ( "[:" and ":]" ) delimiters. The following character class expressions shall be supported in all = locales: [:alnum:] [:cntrl:] [:lower:] [:space:] [:alpha:] [:digit:] [:print:] [:upper:] [:blank:] [:graph:] [:punct:] [:xdigit:] In addition, character class expressions of the form: [:name:] are recognized in those locales where the name keyword has been given a = charclass definition in the LC_CTYPE category. =3D=3D=3D Mark Millard marklmi at yahoo.com