From nobody Wed Nov 13 22:13:12 2024 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Xpcx50bJmz5cfSn; Wed, 13 Nov 2024 22:13:13 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Xpcx46fZHz4VgX; Wed, 13 Nov 2024 22:13:12 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1731535992; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=qVqEsuB0Q9GWXemzJpIGkzFMcYogp9gSsrFbhbFORwY=; b=UDrz5USv68qb9T6unpIZvZxbbuF3o2urtqsrFjlMzvAYK00qfisX+E0xOUNqeCpBUJBpRf TinybBXnoxsFkhw0cz61Wm7v1qO2V/y0+zi8TOWAhosX2qyFJwLEWpFg17lAjm4a4DKv9g iAHQ9A5IdPG7rJ8sOCg9NQtXnh66gz8kUepjHRrrhkbPQJMhVk83fJk34c9z8NbCLOlcdz Avj6uiE6ZepCAgxS+kythPTRFr3AtO4l811C8ZEp5ZcCDM+ZQkkNO2l6n+1VJFKb1fM+nr K6Q5/H7YuDzSjhEOtB3fWL0Yl1lHJsTrguJSO9j6kVYFCVnDEJbYz3dv0PP8tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1731535992; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=qVqEsuB0Q9GWXemzJpIGkzFMcYogp9gSsrFbhbFORwY=; b=UqrlzDNYl/T6dKQS9F4BmOUFKc+fJaCeEYDFHbPUlI0cMULTltW1bePGg4My8LkcAkLfg1 ds1rdAym/G9Vp9hOTDOc2AQOTg5TWEPVO5jVrmnCef40Z7Ir+1+urxFN0HBH2jVlEf/DGo YkQkvaKQW+AFrmZMErzqoNmbnjXj+l9eH0XdDgd9Do43fZBFVnxopdMO3gZN/fS6BFqfp4 2yyeSBZctPgb4OSBWyl0UlpgH2CUxhLRb32v4ko1D/c9lKFL0hojhkVN5cAXugIeTPGnvm mprt5AjtlTl5oOm6Ce1wjzbXBW6vzBVjXE3kNcgnqKm+4qYTK8X5UYcrCuF+cg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1731535992; a=rsa-sha256; cv=none; b=rmVFG15O5+lJBbSMJ134xaBilaHAefFRQMfwhNEEmWaB51+9INoK55vF2nklpmkYBaos3s 04n/D9jfXBT/WCTiMF6hweyCvQ4ZoY29HoC5IKr6aT6HMXCBZGt9EHMsAh8heTMlnrrthC Yp4WPx7o03yiu5uo/c5i6ktPP38Q69T2Cpn7Ferv9yAY6pa+BcQ3k37+V/pt3gYUE7e5Tz x0IoeLFQt2Gm9i7p+tYZD/pJ6AA8HB7a4ODL604plnef5k85qZ436vEVKr59k4XGVL2CKN A2EfAiG0rxS8neFKK89ocSNnlvU3mtE4gt9Ah6A5qG9ShuFT4q5f7oz/MwKk2g== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Xpcx46G9tz1BqV; Wed, 13 Nov 2024 22:13:12 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 4ADMDCWf046996; Wed, 13 Nov 2024 22:13:12 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 4ADMDC6O046993; Wed, 13 Nov 2024 22:13:12 GMT (envelope-from git) Date: Wed, 13 Nov 2024 22:13:12 GMT Message-Id: <202411132213.4ADMDC6O046993@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Kyle Evans Subject: git: 160c36eae41a - main - localedata: add some exceptions to utf8proc widths List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-all@freebsd.org Sender: owner-dev-commits-src-all@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: kevans X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 160c36eae41afa3c4944ed44778c2b48db8fbb77 Auto-Submitted: auto-generated The branch main has been updated by kevans: URL: https://cgit.FreeBSD.org/src/commit/?id=160c36eae41afa3c4944ed44778c2b48db8fbb77 commit 160c36eae41afa3c4944ed44778c2b48db8fbb77 Author: Kyle Evans AuthorDate: 2024-11-13 22:12:42 +0000 Commit: Kyle Evans CommitDate: 2024-11-13 22:12:42 +0000 localedata: add some exceptions to utf8proc widths Hangul Jamo medial vowels and final consonants are reportedly combining characters that won't take up any columns on their own and should be reported as zero-width, so add an exception for these as well to reflect how they work in practice. This conforms to how other implementations (e.g., glibc) treat these characters. Reviewed by: bapt (earlier version), jkim Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D47472 --- tools/tools/locale/tools/getwidths.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/tools/locale/tools/getwidths.c b/tools/tools/locale/tools/getwidths.c index 2790b8031912..63c62791253f 100644 --- a/tools/tools/locale/tools/getwidths.c +++ b/tools/tools/locale/tools/getwidths.c @@ -28,6 +28,21 @@ #include +static int +width_of(int32_t wc) +{ + + /* + * Hangul Jamo medial vowels and final consonants are more of + * a combining character, and should be considered zero-width. + */ + if (wc >= 0x1160 && wc <= 0x11ff) + return (0); + + /* No override by default, trust utf8proc's width. */ + return (utf8proc_charwidth(wc)); +} + int main(void) { @@ -43,9 +58,10 @@ main(void) wcc = utf8proc_category(wc); if (wcc == UTF8PROC_CATEGORY_CC) continue; - wcw = utf8proc_charwidth(wc); + wcw = width_of(wc); if (wcw == 1) continue; + printf("%04X %d\n", wc, wcw); }