From nobody Fri Nov 22 04:53:54 2024 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XvjRl1ZdNz5dM8V; Fri, 22 Nov 2024 04:53:55 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XvjRk1jmFz4vQ2; Fri, 22 Nov 2024 04:53:54 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1732251234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=N5NrUyqntvHlGrb9CNl41f14teELHvaHJiqr4M6Ar4U=; b=dZYSa60u5eaYATJK1YaEok2aKmskmq3PmxE7+iLlwPATjkn8RNmv+9iDPxOCXoD/8OkP0/ 35PTvvl7Vyp2720uj3ZkZFzR10KcuDiOK65Ml2G1MRkEt1UJB5CfR1vM/zuQ56kOTbCKr8 s150InlN4Z1z1HV606U349BQMhpQQYHw1kCTd3JFVPRHPLiO2/KA5xBfnBN+SxWD51enw5 fpwVz8uL1IG7+Zk0pwfdcjSeh5t5iRKz68Iuc+4FXqm4R+QZj3SV6QKHOqlYZUEjqWKozW zcxb6yuX9Y+5zJlCnaEozQbhy5qjxED43CGJmfStu4Ii9HfCHpp7/2PIFkG5Ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1732251234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=N5NrUyqntvHlGrb9CNl41f14teELHvaHJiqr4M6Ar4U=; b=X5sN8C4AEVR/xt5d2TXxzro/zqWvV1M0+STgBLUMPEKx62D4iK4NnKtd1zNuCVnPpLrc8E asag44YhXVbAt3KoqX4XJIoe//vcGnCtE1C8zF/yrU5xYPGuSgnbw9XCSMe7am18PX0srb aj68PLObEX4k2gd0chSPPikcik8hUXJykpkrb4DdR27RgdNxUWMPf1uPT5VzLPxbkrSYTM N5x2RDawlFpXZWtGgrFpC1HUu9/aRGdjMxhUKGmRNrfVQF7UHLgLT+Z1tQBAM4nMx7eS5J 48kGOMcLCZo2bBYUbdpZU+lGSOf4j+DsXyV0PAn127fuoB0GDBwrWuL27Dxxkg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1732251234; a=rsa-sha256; cv=none; b=wMIPBa+HMop8wr/3z7ENJDNPl3fTeZO4tdA2mCCrIUzbUtsWmmdF3PKB4Q4qpyhODjVckz 0STw1Ef/Z5hWyQDQNhfXjDrv6YDCemgoL2oYV+TiQP9cAa7Pp/Xxa82aWrqg2niztSJeqw qgpjQoPhtgu46rbZT1VqBwpC1SImk1Dt2FGLRTBhiE3G7WmwyeMZeROvB8K0cFovfFseEb Vjr+03ey9o/u5hnkkxWZvUtWRHFCqDghx6cxBE+t8JiDMvDAlTKR1TJu1WFTXhDEuBTa+i ZKr5kYb5/XyseG/i9+dVTacYCOMXYNOleOaq+lwFEAPbBTSyxtEjTD48WRVEMA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4XvjRk1HwNz19WS; Fri, 22 Nov 2024 04:53:54 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 4AM4rsn1035052; Fri, 22 Nov 2024 04:53:54 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 4AM4rssf035049; Fri, 22 Nov 2024 04:53:54 GMT (envelope-from git) Date: Fri, 22 Nov 2024 04:53:54 GMT Message-Id: <202411220453.4AM4rssf035049@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Kyle Evans Subject: git: f4f4fa8d04df - stable/13 - localedata: add some exceptions to utf8proc widths List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-all@freebsd.org Sender: owner-dev-commits-src-all@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: kevans X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: f4f4fa8d04dfb1e27c4b3a82c1b032545e74e2e4 Auto-Submitted: auto-generated The branch stable/13 has been updated by kevans: URL: https://cgit.FreeBSD.org/src/commit/?id=f4f4fa8d04dfb1e27c4b3a82c1b032545e74e2e4 commit f4f4fa8d04dfb1e27c4b3a82c1b032545e74e2e4 Author: Kyle Evans AuthorDate: 2024-11-13 22:12:42 +0000 Commit: Kyle Evans CommitDate: 2024-11-22 04:53:43 +0000 localedata: add some exceptions to utf8proc widths Hangul Jamo medial vowels and final consonants are reportedly combining characters that won't take up any columns on their own and should be reported as zero-width, so add an exception for these as well to reflect how they work in practice. This conforms to how other implementations (e.g., glibc) treat these characters. Reviewed by: bapt (earlier version), jkim Sponsored by: Klara, Inc. (cherry picked from commit 160c36eae41afa3c4944ed44778c2b48db8fbb77) --- tools/tools/locale/tools/getwidths.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/tools/locale/tools/getwidths.c b/tools/tools/locale/tools/getwidths.c index 2790b8031912..63c62791253f 100644 --- a/tools/tools/locale/tools/getwidths.c +++ b/tools/tools/locale/tools/getwidths.c @@ -28,6 +28,21 @@ #include +static int +width_of(int32_t wc) +{ + + /* + * Hangul Jamo medial vowels and final consonants are more of + * a combining character, and should be considered zero-width. + */ + if (wc >= 0x1160 && wc <= 0x11ff) + return (0); + + /* No override by default, trust utf8proc's width. */ + return (utf8proc_charwidth(wc)); +} + int main(void) { @@ -43,9 +58,10 @@ main(void) wcc = utf8proc_category(wc); if (wcc == UTF8PROC_CATEGORY_CC) continue; - wcw = utf8proc_charwidth(wc); + wcw = width_of(wc); if (wcw == 1) continue; + printf("%04X %d\n", wc, wcw); }