From nobody Fri Apr 21 17:41:45 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q320G208Cz46gJs for ; Fri, 21 Apr 2023 17:42:06 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic307-55.consmr.mail.gq1.yahoo.com (sonic307-55.consmr.mail.gq1.yahoo.com [98.137.64.31]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q320D6klcz3LRT for ; Fri, 21 Apr 2023 17:42:04 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=H9rMDLHh; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.31 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1682098923; bh=zoxGnW09lDDLqxwCT3s3QYAvgdBNpO/YoXcldRGW6bc=; h=From:Subject:Date:To:References:From:Subject:Reply-To; b=H9rMDLHhXiro8woJuvCTD8jQnHF5YDYajLpjM5KqA6b4Mn1H+fnqSzUEr/x6zBB6NRP4d8Fzk+CHQBL6114Z/8z/7N2K/J4wmxLQeQrL6aYjFzKvNG82N5IYKGh3A2//7mgfSYf2hPOTfl1335YXP90fJeBdy7vP0ZkzMszajIbLXScKZxfcsrAW4e54PDuAV95ljO1uz4rMv/v1Rxt5b9nctQNX8ruWlZKCCAhsT3JSpsSEl2TtkZaKAjICIBcSqrW4+S8piAAmjvOL4CsANW62n+ybw0DXWMpmsr6DoEhQcfWlKYW00A+pbQb5PIZXbAWQPj2b1/Xk9F5Le/SeLQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1682098923; bh=z/+EZGwHYUJe9NAMGRjB69/WN48RvX6vkIKTDAIlbQ8=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=CBNzPIdh7xtEQUnL59979aqDziyPJwMnoAdyI0begdf6x4oFkHA9VtWYLtrqbwGQMZ+AY+tCcU98X7vLgzYDVsCFvb8SeMJss368meCT2H/wihjpfmFG4uzqVPSO6q2lGmlU+cH0Dy3tlJd0cT+uPvjyHmLCWDGdI6yv+QnKW9M9x43CBBK72PeBjdZ5zBoUHly0jD7oIhkueT9m/9GiIdH8t5XGUq36zAM7UyIJiIFKxGM/rctNOGbkgHktTqiF5OXfbBPpuT3p+KlO2XjgIkYXt0s7/g8Et99sFCfh7lJJ+4L1nO8W3MrzATvATP4ICAbMi4fH58NPJCQQ1qHdMA== X-YMail-OSG: n9qZatsVM1nT6526WRcg03dVQaIEv955vRXgl1CtHjbhN3w2glGBSHY.1zHDYL7 5kC00943CnY6XPfm9ubC6md6JBazeyaBJqS0Y2jIL9AdTXqDFpNRD0Cjm75yaVoRGOClavQYtekk 3mJhx2Z1MX0.xuvj8v1R..C6KSzdQGh3ZQO1y5Zer2oIfgAwkpJWzzIVBzWLlUNkoj6S_BWnomA9 0VyqIiujiV4iQ_fK1TWoxI9heXAs_LI2WbbkrGItnXvEiYgTDCk1abatTdtXPzA6YNu0P6iYHtlD BA1n9P8jCYNRYvAvDwnc1tO2BqcWBJ4TTIuirzXGDOqhjvUrfcUgx_GMQz29sGlz4nyu_DK6gUVp CN7Y_Kj4nlY6SqRFzby0JjMiEQ3n5Yw.tB11sNNJnB0SW3yJimXGiMhzr.SXmmJJLSR5R5B4aawK ucYcr9ZMxZfYuvVSdDU_RBcGdvY3M.r0R60o1wsPshUlaaB8hU_ZD625k93dAbOsndhM8tIZbmAO sdlxEoOsJqN8FiW60uLYi3yxbslv8H92CKtjBGwP_msLL7LrLRsMSYOLQ7SN1Y6tZykmUGJq3ttW iFWpay_y5MBGdCbFonDmL.3l9oruY5HJdgouv02cBouz4iJx0mR3hqRyO7OBg0QYBBQ7yWxXo.9L 9OXINeMSbmqU9GtLEESBKdvGPUW4NcxnNFgjGkjZLZouJtnrCQN3bZrSMUjeY4IYdBJ9qa3CqIsb RUyCUqPYJHthXNPxN9nSBM0cvjZr5SG3aA5TCaQmJA_tIEsI1jTKoha70lEnNMtvpsKCnvfjulpP 4exHEo1D9lssnjMtmgg9.oFsmDx1fvSAE3rnmWSHUrIgt288rHx6e56SI8YnesYm90rx3B_C7vwZ q4VnG6NcfJP7FEdyinz77poOobAT473Ia8Fg3tbK.qioZq4JuGHwbKhuRYpMFqSt.Gbjs7Dr9Dhf HNy0VVvHENuMS_i0vDdR9dviIlakb7HpswudXXRoj.cHiXXnQRJ.Fiw3wxvNDlaZMDq7_BwYm7sS 4rrCJe30_W7OvNukVNNZUWkTZ4lzmWlaomzhmgr2B.iMs9JyHY_8d07FYi.BzyTDRyQ1DG8CFy4t jpGXclaNuPA9p8VD20movsaquF_qkAEAFqs8HlkA3_O8vtPN2NhXtGSw2ldIxD_aC4pEQXeO5hUZ k6tSm4bFMkAIkcy6crUx4YSH4F_AALqny46gz4XrQN8IPLWBZ_N328LvlpFPv1q7t8XNWk_WAyMh naclqOMbKuXZREP7CQtwURdGrEy6S6h2KjIrqtBVSbtzl8To8KmvthTMk_BBG3uNpoArS89.bp_0 emNqFTH19ckElo3VCB_6oT4uKPxxqzlApjFkERmiSGT1sDs9sY8Tw1E.rpeYO4pl6hzNG.Mg8GPY Zji3EVIkKv6gFqhryCecNLjRLuUe5ZW5vRJA9Ly5nZQtT6rQcWObNZz9A.r4.5SNSue4sJZB.bR3 GU0fYd_YVfqJ_VNIP1C03ak1ObyNMn4X.cl8LLu9oJfMqNMrn7rNnIw_CjdB3Uwrl7FKy43nlXaC 04geLAW73HJ9gT3ao4YDxEZExPUM8GhmoPtr0NXQGZgUur4Iq0HW5g4jSjLAQo9i_w_.27z4f6al BhiWdEX.5sTILguWkEUMs1NO6PvSnqSKhWGs0phWfUzpaHOcggGHKvNy3uGWgk5qoGv_7fCqMAqq z_bIqqbr8tIYC3koDRqCqLNCIIOL5H0cxynf91IePrJp.PV0XYi.HdFqV_BvMWzpTIb3fnK62REI qo2SUf5KH63VSwxj9lG.mIs6fbpnEJa_MuuYW3j2W4FSKsZfRCAC5Sy7JNnOaXndxCw7Z9fReOCt xHGdmBKQwGlSevFhghVTQo.MWGCK9sPElootu5GfdAmCrRInau6kKPMr2E.0fXFtNIcvkfU0ilmb 01gyWeNThkAXulWjQfSSkWnKT2_1_t_bvB5BwmaxDRg5GaBqrCS9MuqREWloCmcsid_mKt6kecS2 t2qPOeo_KcDopAYDetV7b7eEohmO9UdhIceCF0XQqKUAUjESFFVtPr2xKoHpzPUcZfJr4.gxgpU7 ziaZcK.zG3NnJYqDIQRLsaNAsUPCGFMHiF56xYLPOrcnHxnO67Ly8wHroIxcJGl_LQgUumuXieX_ xOlixp6PqKTr3fdrP0Qz5MHVCgqWISGTTuN5XD_FeFPSvrTzZpAKhwvt_813x8Gp.CCxeTj.g01R 4HJvxv06.FFsZnqBh7iL5WB3_HjvkDmvOajzQ0zsxoAQgiYcObNi2pZLrVRgAdUbmC0.UEWpAih8 - X-Sonic-MF: X-Sonic-ID: bb7358e4-0a06-40fe-a4a6-187e6b861d8c Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.gq1.yahoo.com with HTTP; Fri, 21 Apr 2023 17:42:03 +0000 Received: by hermes--production-bf1-5f9df5c5c4-fgkgh (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID be2fab7cf84034fb03bb60815cefc56d; Fri, 21 Apr 2023 17:41:58 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: find(1): I18N gone wild ? Message-Id: Date: Fri, 21 Apr 2023 10:41:45 -0700 To: Dimitry Andric , Current FreeBSD X-Mailer: Apple Mail (2.3731.400.51.1.1) References: X-Spamd-Result: default: False [-2.50 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; MV_CASE(0.50)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.31:from]; TO_DN_ALL(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; DKIM_TRACE(0.00)[yahoo.com:+]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org] X-Rspamd-Queue-Id: 4Q320D6klcz3LRT X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N Dimitry Andric wrote on Date: Fri, 21 Apr 2023 10:38:05 UTC : > On 21 Apr 2023, at 12:01, Ronald Klop wrote: > > Van: Poul-Henning Kamp > > Datum: maandag, 17 april 2023 23:06 > > Aan: current@freebsd.org > > Onderwerp: find(1): I18N gone wild ? > > This surprised me: > >=20 > > # mkdir /tmp/P > > # cd /tmp/P > > # touch FOO > > # touch bar > > # env LANG=3DC.UTF-8 find . -name '[A-Z]*' -print > > ./FOO > > # env LANG=3Den_US.UTF-8 find . -name '[A-Z]*' -print > > ./FOO > > ./bar > >=20 > > Really ?! > ... > > My Mac and a Linux server only give ./FOO in both cases. Just a 2 = cents remark. >=20 > Same here. However, I have read that with unicode, you should *never* > use [A-Z] or [0-9], but character classes instead. That seems to give > both files on macOS and Linux with [[:alpha:]]: >=20 > $ LANG=3Den_US.UTF-8 find . -name '[[:alpha:]]*' -print > ./BAR > ./foo >=20 > and only the lowercase file with [[:lower:]]: >=20 > $ LANG=3Den_US.UTF-8 find . -name '[[:lower:]]*' -print > ./foo >=20 > But on FreeBSD, these don't work at all: >=20 > $ LANG=3Den_US.UTF-8 find . -name '[[:alpha:]]*' -print > >=20 > $ LANG=3Den_US.UTF-8 find . -name '[[:lower:]]*' -print > >=20 > This is an interesting rabbit hole... :) FreeBSD: -name pattern True if the last component of the pathname being examined = matches pattern. Special shell pattern matching characters = (=E2=80=9C[=E2=80=9D, =E2=80=9C]=E2=80=9D, =E2=80=9C*=E2=80=9D, and =E2=80=9C?=E2=80=9D) may be used = as part of pattern. These characters may be matched explicitly by escaping them with a backslash (=E2=80=9C\=E2=80=9D). I conclude that [[:alpha:]] and [[:lower:]] were not considered "Special shell pattern"s. "man glob" indicates it is a shell specific builtin. macOS says similarly. Different shells, different pattern notations and capabilities? Well, "man bash" reports: QUOTE Pattern Matching . . . Within [ and ], character classes can be specified using = the syntax [:class:], where class is one of the following classes = defined in the POSIX standard: alnum alpha ascii blank cntrl digit graph lower print = punct space upper word xdigit A character class matches any character belonging to that = class. The word character class matches letters, digits, and the = character _. Within [ and ], an equivalence class can be specified = using the syntax [=3Dc=3D], which matches all characters with the same = collation weight (as defined by the current locale) as the character c. Within [ and ], the syntax [.symbol.] matches the = collating symbol symbol. END QUOTE "man zsh" does not document patterns but: sh-3.2$ echo $SHELL /bin/zsh sh-3.2$ find . -name '[[:lower:]]*' -print ./bar % ls -Tldt /bin/*sh -r-xr-xr-x 1 root wheel 1326688 Feb 9 01:39:53 2023 /bin/bash -rwxr-xr-x 2 root wheel 1153216 Feb 9 01:39:53 2023 /bin/csh -rwxr-xr-x 1 root wheel 307232 Feb 9 01:39:53 2023 /bin/dash -r-xr-xr-x 1 root wheel 2598864 Feb 9 01:39:53 2023 /bin/ksh -rwxr-xr-x 1 root wheel 134000 Feb 9 01:39:53 2023 /bin/sh -rwxr-xr-x 2 root wheel 1153216 Feb 9 01:39:53 2023 /bin/tcsh -rwxr-xr-x 1 root wheel 1377616 Feb 9 01:39:53 2023 /bin/zsh But in each, even bash, % echo $SHELL /bin/zsh With "find" not being part of the kernel, Linux may have a number of variations across the operating systems. Picking one . . . openSUSE tumbleweed: -name pattern Base of file name (the path with the leading directories = removed) matches shell pattern pattern. Because the leading directories = are removed, the file names considered for a match with -name will never include a slash, so `-name a/b' will = never match anything (you probably need to use -path instead). A = warning is issued if you try to do this, unless the en- vironment variable POSIXLY_CORRECT is set. The = metacharacters (`*', `?', and `[]') match a `.' at the start of the base = name (this is a change in findutils-4.2.2; see section STAN- DARDS CONFORMANCE below). To ignore a directory and the = files under it, use -prune rather than checking every file in the tree; = see an example in the description of that action. Braces are not recognised as being special, despite the = fact that some shells including Bash imbue braces with a special meaning = in shell patterns. The filename matching is per- formed with the use of the fnmatch(3) library function. = Don't forget to enclose the pattern in quotes in order to protect it = from expansion by the shell. "man 3 fnmatch" says: The fnmatch() function checks whether the string argument matches = the pattern argument, which is a shell wildcard pattern (see glob(7)). "man 7 glob" (not shell specific) in turn has a section on "Character classes and internationalization" that reports: QUOTE . . . . . . Therefore, POSIX extended the bracket notation greatly, both for wildcard patterns and for regular expressions. In = the above we saw three types of items that can occur in a bracket = expression: namely (i) the negation, (ii) explicit single characters, and (iii) ranges. POSIX specifies ranges in an = internationally more useful way and adds three more types: (iii) Ranges X-Y comprise all characters that fall between X and = Y (inclusive) in the current collating sequence as defined by the = LC_COLLATE category in the current locale. (iv) Named character classes, like [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:] so that one can say "[[:lower:]]" instead of "[a-z]", and have = things work in Denmark, too, where there are three letters past 'z' in = the alphabet. These character classes are defined by the LC_CTYPE category in the current locale. (v) Collating symbols, like "[.ch.]" or "[.a-acute.]", where the = string between "[." and ".]" is a collating element defined for the = current locale. Note that this may be a multicharacter element. (vi) Equivalence class expressions, like "[=3Da=3D]", where the = string between "[=3D" and "=3D]" is any collating element from its = equivalence class, as defined for the current locale. For exam- ple, "[[=3Da=3D]]" might be equivalent to "[a=C3=A1=C3=A0=C3=A4=C3=A2= ]", that is, to "[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]". END QUOTE # file /usr/bin/sh /usr/bin/sh: symbolic link to bash Seems like: pick your shell (as shown by echo $SHELL) and that picks the pattern match rules used. (May be controllable in the specific shell.) =3D=3D=3D Mark Millard marklmi at yahoo.com