From nobody Fri Apr 21 18:18:21 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q32p83yWWz46jFL for ; Fri, 21 Apr 2023 18:18:24 +0000 (UTC) (envelope-from yuri@aetern.org) Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q32p81Xjhz3LtS for ; Fri, 21 Apr 2023 18:18:24 +0000 (UTC) (envelope-from yuri@aetern.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=aetern.org header.s=fm2 header.b="gd9QHFu/"; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="Q fEP9ff"; spf=pass (mx1.freebsd.org: domain of yuri@aetern.org designates 64.147.123.19 as permitted sender) smtp.mailfrom=yuri@aetern.org; dmarc=none Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 5721B3200B0A for ; Fri, 21 Apr 2023 14:18:23 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Fri, 21 Apr 2023 14:18:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aetern.org; h=cc :content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm2; t= 1682101102; x=1682187502; bh=fExMugTqZlDaB3MvxNo5lZ2T0vSpRq6HXLT mNgnDniA=; b=gd9QHFu/QU/M7TBDeLAZjSHIURt8kQMKhB485M5jdx7UY4S0Qzx kWTNNx2iN1rGLQigAche4EHfyxH7dikKJX1S18mSwx+ZFW25U7nJz5ybTO1er1Tv FKCNL64xu7zL5DCTAH7qb4DLuVZW30X8AmUMcqFIyJFSSqMb8OjhBVx7RgksdfTj 15Gv0eVICbUA1Sg0/jl5f5uvU+XFhvf2MvI7WLbrFNYNoac1MED+PMfxvD7Pyr7I CP93oYTYLXIjuPWjU9tDkA22PTDWEudaPwEvsmNS0juD3VtJRMYsPzrFPx/I3X35 HjqpUw8eZ9LPPXRas1/MvXhejaWHDMIEqeA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1682101102; x= 1682187502; bh=fExMugTqZlDaB3MvxNo5lZ2T0vSpRq6HXLTmNgnDniA=; b=Q fEP9ffbSwhzln/NCoiISZr/Kuod2jQCtQ0HsKGNROxKpE/bxRGakivSGSojMwDcs Dwmfbewrg2oe8d9VMHFwa7Ne1lFlJrmHvBSO0JXVGK8NndrW13/ARuRx6iJXkm4c OjyoMroal7b2yBJ7DrhVEW0qFfnwY61F94bQeYZYWni53oN9oTIY6CPrEJwE/HBI 7fGI+Qc2lPSu+4K1E1Tjb6IkpD9PkdkXq3truoRtzIfP9z/hXO9/cr/rZL9Cb9+H LtaUuaHMe/hv36KUwxYIkKB97YIWjvAhzYX8Mka31hJpWwCEUuBjdxxoqoN3/1cj fbReLOE0QUw1L75N0BOTw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedtgedguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enfghrlhcuvffnffculddufedmnecujfgurhepkfffgggfuffhvfhfjggtgfesthekredt tdefjeenucfhrhhomhepjghurhhiuceohihurhhisegrvghtvghrnhdrohhrgheqnecugg ftrfgrthhtvghrnhepffejgeduuedvueekfeeghffgteekhefhgfegkedtvdejiedtffdu ieekleehvdefnecuffhomhgrihhnpehfrhgvvggsshgurdhorhhgpdhophgvnhhgrhhouh hprdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho mhephihurhhisegrvghtvghrnhdrohhrgh X-ME-Proxy: Feedback-ID: i0d79475b:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Fri, 21 Apr 2023 14:18:22 -0400 (EDT) Message-ID: Date: Fri, 21 Apr 2023 20:18:21 +0200 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: find(1): I18N gone wild ? Content-Language: en-US From: Yuri To: Current FreeBSD References: <3e473603-f384-f176-e7cb-03409e16ec9c@aetern.org> In-Reply-To: <3e473603-f384-f176-e7cb-03409e16ec9c@aetern.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4Q32p81Xjhz3LtS X-Spamd-Bar: / X-Spamd-Result: default: False [-0.50 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; DWL_DNSWL_LOW(-1.00)[messagingengine.com:dkim]; R_DKIM_ALLOW(-0.20)[aetern.org:s=fm2,messagingengine.com:s=fm3]; R_SPF_ALLOW(-0.20)[+ip4:64.147.123.19:c]; RCVD_IN_DNSWL_LOW(-0.10)[64.147.123.19:from]; ASN(0.00)[asn:29838, ipnet:64.147.123.0/24, country:US]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[aetern.org]; DKIM_TRACE(0.00)[aetern.org:+,messagingengine.com:+]; local_wl_from(0.00)[yuri@aetern.org]; ARC_NA(0.00)[] X-Rspamd-Pre-Result: action=no action; module=multimap; Matched map: local_wl_from X-ThisMailContainsUnwantedMimeParts: N Yuri wrote: > Mark Millard wrote: >> Dimitry Andric wrote on >> Date: Fri, 21 Apr 2023 10:38:05 UTC : >> >>> On 21 Apr 2023, at 12:01, Ronald Klop wrote: >>>> Van: Poul-Henning Kamp >>>> Datum: maandag, 17 april 2023 23:06 >>>> Aan: current@freebsd.org >>>> Onderwerp: find(1): I18N gone wild ? >>>> This surprised me: >>>> >>>> # mkdir /tmp/P >>>> # cd /tmp/P >>>> # touch FOO >>>> # touch bar >>>> # env LANG=C.UTF-8 find . -name '[A-Z]*' -print >>>> ./FOO >>>> # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print >>>> ./FOO >>>> ./bar >>>> >>>> Really ?! >>> ... >>>> My Mac and a Linux server only give ./FOO in both cases. Just a 2 cents remark. >>> >>> Same here. However, I have read that with unicode, you should *never* >>> use [A-Z] or [0-9], but character classes instead. That seems to give >>> both files on macOS and Linux with [[:alpha:]]: >>> >>> $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print >>> ./BAR >>> ./foo >>> >>> and only the lowercase file with [[:lower:]]: >>> >>> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print >>> ./foo >>> >>> But on FreeBSD, these don't work at all: >>> >>> $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print >>> >>> >>> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print >>> >>> >>> This is an interesting rabbit hole... :) >> >> FreeBSD: >> >> -name pattern >> True if the last component of the pathname being examined matches >> pattern. Special shell pattern matching characters (“[”, “]”, >> “*”, and “?”) may be used as part of pattern. These characters >> may be matched explicitly by escaping them with a backslash >> (“\”). >> >> I conclude that [[:alpha:]] and [[:lower:]] were not >> considered "Special shell pattern"s. "man glob" >> indicates it is a shell specific builtin. >> >> macOS says similarly. Different shells, different >> pattern notations and capabilities? Well, "man bash" >> reports: > [snip] >> Seems like: pick your shell (as shown by echo $SHELL) and >> that picks the pattern match rules used. (May be controllable >> in the specific shell.) > > No, the pattern is not passed to shell and shell used should not matter > (pattern should be properly escaped). The rules are here: > > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 > > ...which in turn refers to the following link for bracket expressions: > > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 > > Why we don't support all of that is different story. A bit more on this; first link applies both to find(1) and fnmatch(3), and find uses fnmatch() internally (which is good), but even the function that processes bracket expressions is called rangematch() and that's really all it does ignoring other bracket expression rules: https://cgit.freebsd.org/src/tree/lib/libc/gen/fnmatch.c#n234 So to "fix" find we just need to implement the bracket expressions properly in fnmatch().