[Bug 272334] Misleading 'iconv -l' output

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 04 Jul 2023 20:34:35 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272334

--- Comment #1 from bruno@clisp.org ---
The description contains just the first among 20 issues with the 'iconv -l'
output.
Here are the further ones:

2) The line
=====================================================================
ARMSCII-8 AST166-8 AST_34.002 ARMSCII-8A AST166-A AST_34.002_A
=====================================================================
should be split into two lines, because ARMSCII-8 and ARMSCII-8A are
different encodings:
=====================================================================
ARMSCII-8 AST166-8 AST_34.002
ARMSCII-8A AST166-A AST_34.002_A
=====================================================================

3) The line
=====================================================================
BIG5-E BIG5E BIG-5 BIG-FIVE BIG5 BIG5-ETEN BIG5ETEN BIGFIVE CN-BIG5 CSBIG5
=====================================================================
should be split into two lines, because BIG5-E and BIG-5 are
different encodings:
=====================================================================
BIG5-E BIG5E
BIG-5 BIG-FIVE BIG5 BIG5-ETEN BIG5ETEN BIGFIVE CN-BIG5 CSBIG5
=====================================================================

4) The line
=====================================================================
CP942 942 IBM942 942C CP942C IBM942C
=====================================================================
should be split into two lines, because CP942 and CP942C are
different encodings:
=====================================================================
CP942 942 IBM942
CP942C 942C IBM942C
=====================================================================

5) The line
=====================================================================
CP943 943 IBM943 943C CP943C IBM943C
=====================================================================
should be split into two lines, because CP943 and CP943C are
different encodings:
=====================================================================
CP943 943 IBM943
CP943C 943C IBM943C
=====================================================================

6) The line
=====================================================================
ISO646-CA CA CSA7-1 CSA_Z243.4-1985-1 ISO-IR-121 CSA7-2 CSA_Z243.4-1985-2
ISO-IR-122 ISO646-CA2
=====================================================================
should be split into two lines, because ISO646-CA and ISO646-CA2 are
different encodings:
=====================================================================
ISO646-CA CA CSA7-1 CSA_Z243.4-1985-1 ISO-IR-121
ISO646-CA2 CSA7-2 CSA_Z243.4-1985-2 ISO-IR-122
=====================================================================

7) The line
=====================================================================
ISO646-ES ES ISO-IR-17 ES2 ISO-IR-85 ISO646-ES2
=====================================================================
should be split into two lines, because ISO646-ES and ISO646-ES2 are
different encodings:
=====================================================================
ISO646-ES ES ISO-IR-17
ISO646-ES2 ES2 ISO-IR-85
=====================================================================

8) The line
=====================================================================
ISO646-FR FR ISO-IR-69 NF_Z_62-010 ISO-IR-25 ISO646-FR1 NF_Z_62-010_(1973)
=====================================================================
should be split into two lines, because ISO646-FR and ISO646-FR1 are
different encodings:
=====================================================================
ISO646-FR FR ISO-IR-69 NF_Z_62-010
ISO646-FR1 ISO-IR-25 NF_Z_62-010_(1973)
=====================================================================

9) The line
=====================================================================
ISO646-NO ISO-IR-60 NO NS_4551-1 ISO-IR-61 ISO646-NO2 NO2 NS_4551-2
=====================================================================
should be split into two lines, because ISO646-NO and ISO646-NO2 are
different encodings:
=====================================================================
ISO646-NO ISO-IR-60 NO NS_4551-1
ISO646-NO2 ISO-IR-61 NO2 NS_4551-2
=====================================================================

10) The line
=====================================================================
ISO646-PT ISO-IR-16 PT ISO-IR-84 ISO646-PT2 PT2
=====================================================================
should be split into two lines, because ISO646-PT and ISO646-PT2 are
different encodings:
=====================================================================
ISO646-PT ISO-IR-16 PT
ISO646-PT2 ISO-IR-84 PT2
=====================================================================

11) The line
=====================================================================
ISO646-SE FI ISO-IR-10 ISO646-FI SE SEN_850200_B ISO-IR-11 ISO646-SE2 SE2
SEN_850200_C
=====================================================================
should be split into two lines, because ISO646-SE and ISO646-SE2 are
different encodings:
=====================================================================
ISO646-SE FI ISO-IR-10 ISO646-FI SE SEN_850200_B
ISO646-SE2 ISO-IR-11 SE2 SEN_850200_C
=====================================================================

12) The line
=====================================================================
KOI8-R KOI8-RU
=====================================================================
should be split into two lines, because KOI8-R and KOI8-RU are
different encodings:
=====================================================================
KOI8-R
KOI8-RU
=====================================================================

13) The line
=====================================================================
MACROMAN CSMACINTOSH MAC MACINTOSH MACROMANIA MACROMANIAN
=====================================================================
should be split into two lines, because MACROMAN and MACROMANIA are
different encodings:
=====================================================================
MACROMAN CSMACINTOSH MAC MACINTOSH
MACROMANIA MACROMANIAN
=====================================================================

14) The line
=====================================================================
UTF-16 UNICODE UTF16 CSUNICODE CSUNICODE11 ISO-10646-UCS-2 UCS-2 UCS-2BE
UNICODE-1-1 UNICODEBIG UTF-16BE UTF16BE UCS-2LE UNICODELITTLE UTF-16LE UTF16LE
=====================================================================
should be split into two lines, because UTF-16BE and UTF-16LE are
different encodings:
=====================================================================
UTF-16 UNICODE UTF16 CSUNICODE CSUNICODE11 ISO-10646-UCS-2 UCS-2 UCS-2BE
UNICODE-1-1 UNICODEBIG UTF-16BE UTF16BE
UCS-2LE UNICODELITTLE UTF-16LE UTF16LE
=====================================================================

15) The line
=====================================================================
UTF-32 CSUCS4 ISO-10646-UCS-4 UCS-4 UCS-4BE UTF-32BE UTF32BE UCS-4LE UTF-32LE
UTF32LE
=====================================================================
should be split into two lines, because UTF-32BE and UTF-32LE are
different encodings:
=====================================================================
UTF-32 CSUCS4 ISO-10646-UCS-4 UCS-4 UCS-4BE UTF-32BE UTF32BE
UCS-4LE UTF-32LE UTF32LE
=====================================================================

16) The lines
=====================================================================
CP10029 10029 CP10029_MACLATIN2
MACCENTEURO MACCENTRALEUROPE
=====================================================================
should be joined into a single line, because these encodings are identical:
=====================================================================
CP10029 10029 CP10029_MACLATIN2 MACCENTEURO MACCENTRALEUROPE
=====================================================================

17) The entry ISO646-BASIC@1983 should be removed, since iconv_open returns
EINVAL for it.
Then, among the the lines
=====================================================================
ISO646-BASIC:1983 ISO_646.BASIC:1983 REF REF
ISO646-BASIC:1983
=====================================================================
the second one should be removed, since it is part of the first line:
=====================================================================
ISO646-BASIC:1983 ISO_646.BASIC:1983 REF REF
=====================================================================

18) The entry ISO646-IRV@1983 should be removed, since iconv_open returns
EINVAL for it.
Then, among the the lines
=====================================================================
ISO646-IRV:1983 IRV ISO-IR-2
ISO646-IRV:1983
=====================================================================
the second one should be removed, since it is part of the first line:
=====================================================================
ISO646-IRV:1983 IRV ISO-IR-2
=====================================================================

19) The entry JISX0208@1990 should be removed, since iconv_open returns EINVAL
for it.
Then, among the the lines
=====================================================================
JISX0208:1990 CSISO87JISX0208 ISO-IR-87 JIS0208 JISX0208-1990 JIS_C6226-1983
JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 JIS_X0208:1990 X0208
JISX0208:1990
=====================================================================
the second one should be removed, since it is part of the first line:
=====================================================================
JISX0208:1990 CSISO87JISX0208 ISO-IR-87 JIS0208 JISX0208-1990 JIS_C6226-1983
JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 JIS_X0208:1990 X0208
=====================================================================

20) The entry WINDOWS-874 occurs in two different lines:
=====================================================================
CP1162 1162 CSIBM1162 IBM-1162 IBM1162 MSCP874 WINDOWS-874
CP874 874 IBM874 WINDOWS-874
=====================================================================
It should be removed from the first line, since the WINDOWS-874 encoding is
identical to CP874 and different from CP1162:
=====================================================================
CP1162 1162 CSIBM1162 IBM-1162 IBM1162 MSCP874
CP874 874 IBM874 WINDOWS-874
=====================================================================

As proofs, I'm attaching the encoding tables, that I got by running e.g.
./test-from WINDOWS-874 > WINDOWS-874.TXT

-- 
You are receiving this mail because:
You are the assignee for the bug.