From nobody Fri Apr 21 19:36:05 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q34Ws1Z9fz46n9L for ; Fri, 21 Apr 2023 19:36:09 +0000 (UTC) (envelope-from yuri@aetern.org) Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q34Wr62MZz3RCR for ; Fri, 21 Apr 2023 19:36:08 +0000 (UTC) (envelope-from yuri@aetern.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=aetern.org header.s=fm2 header.b="k/H4idCF"; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="D qAcU1Q"; spf=pass (mx1.freebsd.org: domain of yuri@aetern.org designates 64.147.123.19 as permitted sender) smtp.mailfrom=yuri@aetern.org; dmarc=none Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 8628C3200B54 for ; Fri, 21 Apr 2023 15:36:07 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Fri, 21 Apr 2023 15:36:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aetern.org; h=cc :content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm2; t= 1682105767; x=1682192167; bh=7EEtz4ZOueefrcScFWdKQ3cn48j9B8qp9ik 4yruZQfY=; b=k/H4idCFDv7HPhEm1JdipoKDatl5lzdcLjxhL5bbP/5yHJaB6Bz KOFnnzDiEf8eejjlNyVYaX3L8Zvpj4xBtwfxMA3dgvW4cOapms0FSD2bftkzV5tE Imlq9MA0gb2BlfERktqLQmZRlFnKB7sSFqduF8n/LIgnP9C/XqnlUupnL93S2Mgz lK6LXHsyQhaZw3bXzIBXOfBIb1jLTgRO5O2tqhk/X57FDuYLyPvZsOfHy3WS307m /m32T9fJbs2BS8SufD2qBZqfCMe4mkg1uKXShE5aNcesJKcVrDt2xJHAKZxGpgja pBp58suLcAkn+GnG3pY2wB5W6wyjo9EYG8w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1682105767; x= 1682192167; bh=7EEtz4ZOueefrcScFWdKQ3cn48j9B8qp9ik4yruZQfY=; b=D qAcU1QXK0EsodR3XX+Dkz+neHa/rUUOgyztyU7e57wt/qMB51eQ8e6LzlKI6mks6 c2MGuM+/hw4nDZxJv690+dgoW4BufhYkwzl7lCLIuRd0yqJnI5sITixoknkP/1pA /EWP9YwROwfcOnFLVwoD05p5gXWt3DupvdPncMGF26X3JAvJCui+CyjHdKtTmfgY VFyohUhjrIy+yiz8cYe3SP3fT2e3fxFAxaTOKTUYEcRC/HygzTEsebWLZ10KuC7G Z8qA6dMUeNhDA8dCc2RUCCRlO+uASAOhaPX63Kl7cRDiUcRuG4R/nfZ43OHE94iV Q/kM2ZV6s7jkjJ+HoLIZw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedtgedgudefkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhepkfffgggfuffvfhfhjggtgfesth ejredttdefjeenucfhrhhomhepjghurhhiuceohihurhhisegrvghtvghrnhdrohhrgheq necuggftrfgrthhtvghrnhepgeehleevueehhffggfetteffieffhfduteduteeuvdehvd fhffdvtefhffejjedvnecuffhomhgrihhnpehfrhgvvggsshgurdhorhhgnecuvehluhhs thgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhephihurhhisegrvghtvg hrnhdrohhrgh X-ME-Proxy: Feedback-ID: i0d79475b:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Fri, 21 Apr 2023 15:36:06 -0400 (EDT) Message-ID: <86efedcf-e3ed-be0c-79ab-03f0d4a743af@aetern.org> Date: Fri, 21 Apr 2023 21:36:05 +0200 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: find(1): I18N gone wild? [[:alpha:]] not a substitute to refer 26 English letters A-Z Content-Language: en-US To: freebsd-current@freebsd.org References: From: Yuri In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4Q34Wr62MZz3RCR X-Spamd-Bar: - X-Spamd-Result: default: False [-1.50 / 15.00]; DWL_DNSWL_LOW(-1.00)[messagingengine.com:dkim]; R_SPF_ALLOW(-0.20)[+ip4:64.147.123.19:c]; R_DKIM_ALLOW(-0.20)[aetern.org:s=fm2,messagingengine.com:s=fm3]; RCVD_IN_DNSWL_LOW(-0.10)[64.147.123.19:from]; DKIM_TRACE(0.00)[aetern.org:+,messagingengine.com:+]; ARC_NA(0.00)[]; DMARC_NA(0.00)[aetern.org]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; ASN(0.00)[asn:29838, ipnet:64.147.123.0/24, country:US]; local_wl_from(0.00)[yuri@aetern.org]; SUBJECT_HAS_QUESTION(0.00)[] X-Rspamd-Pre-Result: action=no action; module=multimap; Matched map: local_wl_from X-ThisMailContainsUnwantedMimeParts: N parv/FreeBSD wrote: > Wrote Dimitry Andric on Fri, 21 Apr 2023 10:38:05 UTC > (via > https://lists.freebsd.org/archives/freebsd-current/2023-April/003556.html ) >> >> ... However, I have read that with unicode, you should *never* >> use [A-Z] or [0-9], but character classes instead. That seems to give >> both files on macOS and Linux with [[:alpha:]]: > ... > > Subject to the locale, problem with that is "[[:alpha:]]" will match > more than 26 English letters "A" through "Z" (besides also matching > lower case "a" through "z") even if none of 26 * 2 English alphabets > appear in a string. (replying to random recent message) And there is a bit of quite recent history for fnmatch() related to [a-z], same was done for regex with the same outcome -- attempt to make [a-z] (guess [A-Z] as well) range non-collating failed. I am not aware of the encountered failures, hopefully someone should remember: -------- commit 5a5807dd4ca34467ac5fb458bc19f12bf62075a5 Author: Andrey A. Chernov Date: Sun Jul 10 03:49:38 2016 +0000 Remove broken support for collation in [a-z] type ranges. Only first 256 wide chars are considered currently, all other are just dropped from the range. Proper implementation require reverse tables database lookup, since objects are really big as max UTF-8 (1114112 code points), so just the same scanning as it was for 256 chars will slow things down. POSIX does not require collation for [a-z] type ranges and does not prohibit it for non-POSIX locales. POSIX require collation for ranges only for POSIX (or C) locale which is equal to ASCII and binary for other chars, so we already have it. No other *BSD implements collation for [a-z] type ranges. Restore ABI compatibility with unused now __collate_range_cmp() which is visible from outside (will be removed later). -------- commit 1daad8f5ad767dfe7896b8d1959a329785c9a76b Author: Andrey A. Chernov Date: Thu Jul 14 08:18:12 2016 +0000 Back out non-collating [a-z] ranges. Instead of changing whole course to another POSIX-permitted way for consistency and uniformity I decide to completely ignore missing regex fucntionality and concentrace on fixing bugs in what we have now, too many small obstacles instead, counting ports. -------- commit 12eae8c8f346cb459a388259ca98faebdac47038 Author: Andrey A. Chernov Date: Thu Jul 14 09:07:25 2016 +0000 1) Eliminate possibility to call __*collate_range_cmp() with inclomplete locale (which cause core dump) by removing whole 'table' argument by which it passed. 2) Restore __collate_range_cmp() in __sccl(). 3) Collating [a-z] range in regcomp() only for single bytes locales (we can't do it now for other ones). In previous state only first 256 wchars are considered and all others are just silently dropped from the range. --------