[Bug 272293] The mbrtoc32 and mbrtoc16 functions don't recognize the same multibyte sequences as mbrtowc
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272293] The mbrtoc32 and mbrtoc16 functions don't recognize the same multibyte sequences as mbrtowc"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272293] The mbrtoc32 and mbrtoc16 functions don't recognize the same multibyte sequences as mbrtowc"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272293] The mbrtoc32 and mbrtoc16 functions don't recognize the same multibyte sequences as mbrtowc"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272293] The mbrtoc32 and mbrtoc16 functions don't recognize the same multibyte sequences as mbrtowc"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272293] The mbrtoc32 and mbrtoc16 functions don't recognize the same multibyte sequences as mbrtowc"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 30 Jun 2023 13:58:00 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272293 Bug ID: 272293 Summary: The mbrtoc32 and mbrtoc16 functions don't recognize the same multibyte sequences as mbrtowc Product: Base System Version: 13.2-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: misc Assignee: bugs@FreeBSD.org Reporter: bruno@clisp.org Attachment #243081 text/plain mime type: Created attachment 243081 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=243081&action=edit test case foo.c It is clear from ISO C 23 (description of mbrtowc: § 7.31.6.3.2, description of mbrtoc32: § 7.30.1.5, description of mbrtoc16: § 7.30.1.3) that the notion of valid multibyte character is independent of which of these function a program uses. When a multibyte character is valid according to one of these functions, it should be valid according to the two others as well. This is not the case in FreeBSD 13.2. Test case: =============================== foo.c ============================ #include <locale.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <wchar.h> #include <uchar.h> int main () { if (setlocale (LC_ALL, "zh_CN.GB18030") != NULL) { mbstate_t state; wchar_t wc = (wchar_t) 0xBADFACE; memset (&state, '\0', sizeof (mbstate_t)); if (mbrtowc (&wc, "\224\071\375\067", 4, &state) == 4) { printf ("mbrtowc return value = 4\n"); { char32_t c32 = (char32_t) 0xBADFACE; memset (&state, '\0', sizeof (mbstate_t)); size_t ret = mbrtoc32 (&c32, "\224\071\375\067", 4, &state); printf ("mbrtoc32 return value = %d\n", (int) ret); } { char16_t c16 = (char16_t) 0xBADFACE; memset (&state, '\0', sizeof (mbstate_t)); size_t ret = mbrtoc16 (&c16, "\224\071\375\067", 4, &state); printf ("mbrtoc16 return value = %d\n", (int) ret); } } } } ========================================================================== $ cc -Wall foo.c $ ./a.out Expected result (e.g. as seen on glibc 2.35): mbrtowc return value = 4 mbrtoc32 return value = 4 mbrtoc16 return value = 4 Actual result: mbrtowc return value = 4 mbrtoc32 return value = -2 mbrtoc16 return value = -2 I think I've seen this effect also with other encodings than GB18030. But the test case above is with GB18030. -- You are receiving this mail because: You are the assignee for the bug.