svn commit: r286465 - projects/collation/lib/libc/regex

Baptiste Daroussin bapt at FreeBSD.org
Sat Aug 8 19:30:00 UTC 2015


Author: bapt
Date: Sat Aug  8 19:29:59 2015
New Revision: 286465
URL: https://svnweb.freebsd.org/changeset/base/286465

Log:
  The regex code does not work with multibyte codesets like UTF-8.
  In fact, it doesn't even work with single-byte codesets like ISO-8859-1.
  The comparison blows up at index 128 (the range is 0 to UCHAR_MAX (255).
  
  As a temporary workaround, all comparisons will be done in C locale
  regardless of the environment setting.  The regex library needs to be
  updated to handle all codesets.
  
  Obtained from:	Dragonfly

Modified:
  projects/collation/lib/libc/regex/regcomp.c

Modified: projects/collation/lib/libc/regex/regcomp.c
==============================================================================
--- projects/collation/lib/libc/regex/regcomp.c	Sat Aug  8 19:24:32 2015	(r286464)
+++ projects/collation/lib/libc/regex/regcomp.c	Sat Aug  8 19:29:59 2015	(r286465)
@@ -38,6 +38,13 @@
  *	@(#)regcomp.c	8.5 (Berkeley) 3/20/94
  */
 
+/*
+ * This implementation currently only works with C locale
+ * It's definitely limited by UCHAR_MAX, but not even ISO-8859 charsets
+ * are working.  The forced changing of locale to C for the comparison
+ * is considered a workaround until a better solution is found.
+ */
+
 #if defined(LIBC_SCCS) && !defined(lint)
 static char sccsid[] = "@(#)regcomp.c	8.5 (Berkeley) 3/20/94";
 #endif /* LIBC_SCCS and not lint */
@@ -768,8 +775,9 @@ p_b_term(struct parse *p, cset *cs)
 	char c;
 	wint_t start, finish;
 	wint_t i;
+	locale_t loc = &__xlocale_C_locale;  /* see note under license */
 	struct xlocale_collate *table =
-		(struct xlocale_collate*)__get_locale()->components[XLC_COLLATE];
+		(struct xlocale_collate*)loc->components[XLC_COLLATE];
 
 	/* classify what we've got */
 	switch ((MORE()) ? PEEK() : '\0') {


More information about the svn-src-projects mailing list