[Bug 264275] sed complaining about trailing backslash when using Umlauts
Date: Thu, 27 Oct 2022 13:43:24 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264275 Daniel Tameling <tamelingdaniel@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tamelingdaniel@gmail.com --- Comment #1 from Daniel Tameling <tamelingdaniel@gmail.com> --- The error comes from trying to compile the umlaut as a regex. I managed to create a small reproducer that just calls regcomp. The error seems to come from this snippet in the p_simp_re function in lib/libc/regex/regcomp.c: if ((c & BACKSL) == 0 || may_escape(p, wc)) ordinary(p, wc); else SETERROR(REG_EESCAPE); Both checks in the if statement are false and thus we end up with the trailing backslash error. In may_escape this is the return statement that gets taken: if (isalpha(ch) || ch == '\'' || ch == '`') return (false); ch is the wint_t representation of the umlaut, which is 0xe4. In de_DE.ISO8859-1, the isalpha call returns true. (If I do it with an UTF8 รค in an UTF8 locale, ch becomes also 0xe4, but the isalpha call returns false, so this doesn't trigger the trailing backslash error.) -- You are receiving this mail because: You are the assignee for the bug.