[Bug 278229] iconv mapping tables for ISO 8859-2 and 8859-3 contain garbage

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 07 Apr 2024 11:03:31 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278229

            Bug ID: 278229
           Summary: iconv mapping tables for ISO 8859-2 and 8859-3 contain
                    garbage
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: eichelberg@offis.de

Created attachment 249799
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=249799&action=edit
Fixed mapping tables for ISO-8859-2, -3 and -5

The iconv mapping tables from Unicode to ISO-8859-2 and ISO-8859-3 contain
many incorrect mappings where Unicode characters not available in the ISO
character set are mapped to four character sequences essentially containing
garbage. 

In the FreeBSD source tree, the source files for these mapping tables are
located in /share/i18n/csmapper/ISO-8859/. It should be noted that the mapping
tables for ISO 8859-4 to 8859-16 are between 16 and 24 kBytes, whereas the
mapping tables for ISO 8859-2 and ISO 8859-3 are over a megabyte in size.

Apparently the majority of mappings that map one Unicode code position to a
sequence of four ISO characters contain garbage. This issue was already present
in the initial commit for these files in February 2011 and has apparently never
been noticed. 

Attached to this bug report are corrected mapping tables for ISO 8859-2 and ISO
8859-3. These retain all mappings to four-byte character sequences that are
also present in the other ISO 8859 mapping tables and remove all others. 

Furthermore, the table for ISO 8859-5 is also attached. This currently contains
many duplicate lines, which do not cause problems when processed, but are
unneccessary and should be removed.

-- 
You are receiving this mail because:
You are the assignee for the bug.