[Bug 272386] The iconv converter from EUC-JP to UTF-8 accepts second and third bytes outside of the valid range

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 05 Jul 2023 14:51:52 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272386

            Bug ID: 272386
           Summary: The iconv converter from EUC-JP to UTF-8 accepts
                    second and third bytes outside of the valid range
           Product: Base System
           Version: 13.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: bruno@clisp.org

The structure of the EUC-JP encoding is explained in
https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP . The second byte of a
two- or three-bytes sequence must be in the range 0xA1..0xFE for the sequence
to be valid. The third byte of a three-bytes sequence must be in the range
0xA1..0xFE for the sequence to be valid. So, bytes in the range 0x00..0x7F are
only valid as the first byte.

The FreeBSD 13.2 converter from EUC-JP to UTF-8 accepts bytes < 0x80 in these
positions. This is not helpful, because it breaks detection of encodings by
applications.

How to reproduce:
$ cc -Wall -o table-from table-from.c
$ ./table-from EUC-JP > EUC-JP.TXT

Attached are the actual and the expected EUC-JP.TXT.

-- 
You are receiving this mail because:
You are the assignee for the bug.