Re: Grep with non-ascii
- In reply to: Tomoaki AOKI : "Re: Grep with non-ascii"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 03 Feb 2023 14:26:17 UTC
Den Fri, 3 Feb 2023 20:39:48 +0900 skrev Tomoaki AOKI <junchoon@dec.sakura.ne.jp>: > On Fri, 3 Feb 2023 11:06:42 +0100 > Eivind Nicolay Evensen <eivinde@terraplane.org> wrote: > > > Hello. > > > > I just noticed this today: > > > > elg!ene[~]> printf "bø\nhei\nøl\n" | grep ø > > grep: trailing backslash (\) > > elg!ene[~]> echo $LC_CTYPE $LANG > > nb_NO.ISO8859-1 nb_NO.ISO8859-1 > > > > While I have the result I envisioned with gnugrep: > > > > elg!ene[~]> printf "bø\nhei\nøl\n" | ggrep ø > > bø > > øl > > > > Also, on OpenIndiana, linux and Netbsd, grep gives the proper > > result. > > > > Is lib/libc/regex the right place to look into this if I > > find the time, or does anybody know this enough to know the > > problem? > > > > Regards > > -- > > Eivind Nicolay Evensen > > Possibly a locale problem, or depending on what command line shell you > are using. > > Tried copy/pasting to command line, I got the result below. > > % printf "bø\nhei\nøl\n" | grep ø > bø > øl > > I'm using LC_ALL=ja_JP.UTF-8, LANG=ja_JP.UTF-8 as locale and > shells/zsh as command line shell. > > What happenes if you switch locale to nb_NO.UTF-8? > Indeed seems like a locale problem, because it works when I change it: elg!ene[~]> grep ø grep: trailing backslash (\) (i select UTF-8 encoding in the xterm menu here) elg!ene[~]> setenv LC_CTYPE nb_NO.UTF-8 elg!ene[~]> grep ø zzz æøå æøå ^D Perhaps for more of them, I just tried this (back to non-utf8 encoding in xterm): elg!ene[~]> setenv LC_CTYPE sv_SE.ISO8859-1 elg!ene[~]> grep grep: trailing backslash (\) and elg!ene[~]> setenv LC_CTYPE de_DE.ISO8859-1 elg!ene[~]> grep grep: trailing backslash (\) elg!ene[~]> grep grep: trailing backslash (\) elg!ene[~]> -- Eivind Nicolay Evensen