From nobody Sun Oct 22 11:57:35 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SCxdr1qDZz4xC71; Sun, 22 Oct 2023 11:57:36 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SCxdr107Qz3WKG; Sun, 22 Oct 2023 11:57:36 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1697975856; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=hYazf5EwR2Ja57OAOh5esuI40liwBCpYd38j5Tii9rk=; b=ZbFyqcV3Mgkog0slfbSQmYtqX9Yg2jSSSiE5ue3PbQfcBBZPMgw17Mr36CF/znXP4GJBvS 5v/WPSHLpUD3TP34knaZC8jL5c9l20185hfvwCkixyhBpNU66XsmsXXGg6kJxLkZGDfRFV qSmS3aRz6EhVOE38Qo0Onfuv0k4b5cDEFQxKrwBQCMDw+oFaVoYOy5fIP6HJiYc5ELD2GN xMjUZJnVVf68SETD334PkaJE/4ONeqnHHhPfKGcqJcRlOMY0JfEaGmVhrWJu40VjZjd59W Y2uw0k0qFp9E+dCCI4pJcHehOS9cBwwLMBF29shg6xC/LtLFWAPvhp/dh2fUHw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1697975856; a=rsa-sha256; cv=none; b=ri8Rv/Rw5gdko7ZRHTdXxyFKQUI9Wkhlm/d7kWaYSsNWKGNLO9ckXsnurJPN6jH0wBriU0 9W0935IyIxkKT/lS43viQB2PeSs3Ztb8Uv9NXSsWgaieGOyxfmTNFRO0mxmwAcJ/REkO9W pnQTJlz2FNX94hSE9rJ0yIC6ZmMEk4hoLl+v1fmnTt4NP+srQmqyEg4y+Ed5HZjjI341Lt 33TaGcj3KD7ViI/a2v9AJL+KLFjUhufApMO9BUA3SeDr4F1iowp169bU56H+7PWH0c4SaV NlNpqXqLwUnIDdnhbgEA5+/Lbn2qJ0fGb0ypgdGb54SeuvRZnZZA/LuJR0mzbg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1697975856; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=hYazf5EwR2Ja57OAOh5esuI40liwBCpYd38j5Tii9rk=; b=acJUHLvjSyU294T/BNOs1yhfXsBl2TDz6O2UrP4Pxm75Xrbc/S2gocBGOSYjUgXUKyqOQU JP5bMHYBdwtajE9Mm1e0F6P7m2yo/pCvn38z4ez3xNMtn/fHR8lh0akEsyVUihvb+vAw71 xkRw3h2+OxEBoQx/UoyW5j3GL3RrzJuknPls9z47GnR72I4GVZ3CwW5KiK9Tifl7HCII7f u827vtsVMGAV/QqtQpo+xo1SNW6G4X3Ja6bkyJOKM1nLTtXWbw3kKsdBVaXwYGmempJiGc QeBDA2yKTqV4EmCoWxFNnR7aefiP6JBJmwCVQ6CI2C10mYqCkPGDLaFJ6V1gFw== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4SCxdq6ygbzx0V; Sun, 22 Oct 2023 11:57:35 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 39MBvZDa015251; Sun, 22 Oct 2023 11:57:35 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 39MBvZle015248; Sun, 22 Oct 2023 11:57:35 GMT (envelope-from git) Date: Sun, 22 Oct 2023 11:57:35 GMT Message-Id: <202310221157.39MBvZle015248@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Christos Margiolis Subject: git: dbe9ba41bbd7 - releng/14.0 - tty: fix improper backspace behaviour for UTF8 characters when in canonical mode List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: christos X-Git-Repository: src X-Git-Refname: refs/heads/releng/14.0 X-Git-Reftype: branch X-Git-Commit: dbe9ba41bbd76e7579683cd69b3235d0b02aca1e Auto-Submitted: auto-generated The branch releng/14.0 has been updated by christos: URL: https://cgit.FreeBSD.org/src/commit/?id=dbe9ba41bbd76e7579683cd69b3235d0b02aca1e commit dbe9ba41bbd76e7579683cd69b3235d0b02aca1e Author: Bojan Novković AuthorDate: 2023-10-07 18:00:11 +0000 Commit: Christos Margiolis CommitDate: 2023-10-22 11:56:44 +0000 tty: fix improper backspace behaviour for UTF8 characters when in canonical mode This patch adds additional logic in ttydisc_rubchar() to properly handle backspace behaviour for UTF-8 characters. Currently, typing in a backspace after a UTF8 character will delete only one byte from the byte sequence, leaving garbled output in the tty's output queue. With this change all of the character's bytes are deleted. This change is only active when the IUTF8 flag is set (see 19054eb6053189144aa962b2ecc1bf5087758a3e "(s)tty: add support for IUTF8 input flag") The code uses the teken_wcwidth() function to properly handle character column widths for different code points, and adds the teken_utf8_bytes_to_codepoint() function that converts a UTF-8 byte sequence to a codepoint, as specified in RFC3629. Reported by: christos Reviewed by: christos, imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D42067 (cherry picked from commit 9e589b0938579f3f4d89fa5c051f845bf754184d) Approved by: re (gjb) --- sys/kern/tty_ttydisc.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++ sys/teken/teken_wcwidth.h | 30 +++++++++++++++++++ 2 files changed, 104 insertions(+) diff --git a/sys/kern/tty_ttydisc.c b/sys/kern/tty_ttydisc.c index 665275ee93e7..eae7162e31c0 100644 --- a/sys/kern/tty_ttydisc.c +++ b/sys/kern/tty_ttydisc.c @@ -43,6 +43,9 @@ #include #include +#include +#include + /* * Standard TTYDISC `termios' line discipline. */ @@ -78,8 +81,13 @@ SYSCTL_ULONG(_kern, OID_AUTO, tty_nout, CTLFLAG_RD, /* Character is alphanumeric. */ #define CTL_ALNUM(c) (((c) >= '0' && (c) <= '9') || \ ((c) >= 'a' && (c) <= 'z') || ((c) >= 'A' && (c) <= 'Z')) +/* Character is UTF8-encoded. */ +#define CTL_UTF8(c) (!!((c) & 0x80)) +/* Character is a UTF8 continuation byte. */ +#define CTL_UTF8_CONT(c) (((c) & 0xc0) == 0x80) #define TTY_STACKBUF 256 +#define UTF8_STACKBUF 4 void ttydisc_open(struct tty *tp) @@ -800,6 +808,72 @@ ttydisc_rubchar(struct tty *tp) ttyoutq_write_nofrag(&tp->t_outq, "\b\b\b\b\b\b\b\b", tablen); return (0); + } else if ((tp->t_termios.c_iflag & IUTF8) != 0 && + CTL_UTF8(c)) { + uint8_t bytes[UTF8_STACKBUF] = { 0 }; + int curidx = UTF8_STACKBUF - 1, cwidth = 1, + nb = 0; + teken_char_t codepoint; + + /* Save current byte. */ + bytes[curidx] = c; + curidx--; + nb++; + /* Loop back through inq until we hit the + * leading byte. */ + while (CTL_UTF8_CONT(c) && nb < UTF8_STACKBUF) { + ttyinq_peekchar(&tp->t_inq, &c, "e); + ttyinq_unputchar(&tp->t_inq); + bytes[curidx] = c; + curidx--; + nb++; + } + /* + * Shift array so that the leading + * byte ends up at idx 0. + */ + if (nb < UTF8_STACKBUF) + memmove(&bytes[0], &bytes[curidx + 1], + nb * sizeof(uint8_t)); + /* Check for malformed UTF8 characters. */ + if (nb == UTF8_STACKBUF && + CTL_UTF8_CONT(bytes[0])) { + /* + * Place all bytes back into the inq and + * delete the last byte only. + */ + ttyinq_write(&tp->t_inq, bytes, + UTF8_STACKBUF, 0); + } else { + /* Find codepoint and width. */ + codepoint = + teken_utf8_bytes_to_codepoint(bytes, + nb); + if (codepoint != + TEKEN_UTF8_INVALID_CODEPOINT) { + cwidth = teken_wcwidth( + codepoint); + } else { + /* + * Place all bytes back into the + * inq and fall back to + * default behaviour. + */ + ttyinq_write(&tp->t_inq, bytes, + nb, 0); + } + } + tp->t_column -= cwidth; + /* + * Delete character by punching + * 'cwidth' spaces over it. + */ + if (cwidth == 1) + ttyoutq_write_nofrag(&tp->t_outq, + "\b \b", 3); + else if (cwidth == 2) + ttyoutq_write_nofrag(&tp->t_outq, + "\b\b \b\b", 6); } else { /* * Remove a regular character by diff --git a/sys/teken/teken_wcwidth.h b/sys/teken/teken_wcwidth.h index f57a185c2433..f5a23dbc9679 100644 --- a/sys/teken/teken_wcwidth.h +++ b/sys/teken/teken_wcwidth.h @@ -8,6 +8,8 @@ * Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c */ +#define TEKEN_UTF8_INVALID_CODEPOINT -1 + struct interval { teken_char_t first; teken_char_t last; @@ -116,3 +118,31 @@ static int teken_wcwidth(teken_char_t ucs) (ucs >= 0x20000 && ucs <= 0x2fffd) || (ucs >= 0x30000 && ucs <= 0x3fffd))); } + +/* + * Converts an UTF-8 byte sequence to a codepoint as specified in + * https://datatracker.ietf.org/doc/html/rfc3629#section-3 . The function + * expects the 'bytes' array to start with the leading character. + */ +static teken_char_t +teken_utf8_bytes_to_codepoint(uint8_t bytes[4], int nbytes) +{ + + /* Check for malformed characters. */ + if (bitcount(bytes[0] & 0xf0) != nbytes) + return (TEKEN_UTF8_INVALID_CODEPOINT); + + switch (nbytes) { + case 1: + return (bytes[0] & 0x7f); + case 2: + return (bytes[0] & 0xf) << 6 | (bytes[1] & 0x3f); + case 3: + return (bytes[0] & 0xf) << 12 | (bytes[1] & 0x3f) << 6 | (bytes[2] & 0x3f); + case 4: + return (bytes[0] & 0x7) << 18 | (bytes[1] & 0x3f) << 12 | + (bytes[2] & 0x3f) << 6 | (bytes[3] & 0x3f); + default: + return (TEKEN_UTF8_INVALID_CODEPOINT); + } +}