[Bug 269127] devel/icu: Multibyte character is included in DateTimePatterns for en locale in release 72

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 24 Jan 2023 03:45:23 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269127

            Bug ID: 269127
           Summary: devel/icu: Multibyte character is included in
                    DateTimePatterns for en locale in release 72
           Product: Ports & Packages
           Version: Latest
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: office@FreeBSD.org
          Reporter: tatsuki_makino@hotmail.com
          Assignee: office@FreeBSD.org
             Flags: maintainer-feedback?(office@FreeBSD.org)

For example, the following JavaScript code may produce unintended results.

(function () {
        var i, d=[], s=[];
        d[0] = new Date(0);
        s[0] = d[0].toLocaleString("en-US");
        d[1] = new Date(s[0]);
        console.log(d[0], s[0], d[1]);
        for (i = 0; i < s[0].length; ++i) {
                console.log(s[0].charAt(i), s[0].charCodeAt(i).toString(16));
        }
})();

d[1] is expected to be the same as d[0], but is "Invalid Date" in icu-dependent
web browsers (firefox-esr-102.7.0,1, chromium-109.0.5414.74 and...
seamonkey-2.49.4_27 :) ).
The reason for this is that the string converted to LocaleString contains
U+202F.

One problem with this is that the en, en-* locale have been deified as not
containing multibyte characters in the language areas that use multibyte
characters (e.g. Japan :) ).
This is why they would choose this method.
In fact, there are sites that display "Invalid Date" because of this.

The problem with this is that it will behave differently with browsers that are
not using icu.
As far as I have tried, Windows10+ChromeEdge and Android+Edge return
LocaleString without multibyte characters, which works as expected.

I think the distribution file for the port already has a database of the parts
related to this, but the source is this.
https://github.com/unicode-org/icu/blob/bb0e745e25c99cc57055caf45c81b95ef63b25d4/icu4c/source/data/locales/en.txt

What should it be?

-- 
You are receiving this mail because:
You are the assignee for the bug.