Re: FreeBSD awk behavior change proposal
- In reply to: Rodney W. Grimes: "Re: FreeBSD awk behavior change proposal"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 09 Jul 2021 14:36:40 UTC
Am 09.07.21 um 15:21 schrieb Rodney W. Grimes: >> Greetings, >> >> I've posted https://reviews.freebsd.org/D31114 which eliminates the last >> delta we have from upstream one-true-awk. This delta has basically been >> rejected by upstream as being a really bad idea. Let me give some >> background. >> >> In 2005, FreeBSD changed one-true-awk to honor the locale's collating order. >> https://svnweb.freebsd.org/base/head/usr.bin/awk/b.c.diff?annotate=146322&pathrev=201988 >> This was billed as a temporary patch. It was also compatible with >> the then-current behavior of gawk. That temporary patch has lasted 16 >> years now. >> >> However, IEEE Std 1003.1-2008 changed the behaivor of ranges in regular >> expressions outside of the "C" and "POSIX" locales to be undefined. >> >> Starting in 2011, gawk 4.0 stopped using the locale for the range >> regular expressions and used the traditional behavior only. The >> maintainer had grown weary of answering why '[A-Z]' would sometimes >> match lower-case expressions. The details about are explained here: >> https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html >> >> To restore compatibility with other implementaitons of awk, revert this >> patch. FreeBSD is the odd-system out. It also has the nice side effect >> of eliminating the last of our differences with upstream one-true-awk. >> >> I'd like to commit the change at least to -current. Ideally, I'd like to MFC >> the change. I believe better compatibility with gawk and other awk >> implementations justifies this change in behavior because the current >> behavior is outside the mainstream enough to be considered a bug. >> >> I'd like to solicit input before I do this, however. > > My only concern on this is does anything in the ports system get > tickled by this change, I know its a pita, but maybe have an exp > run done? I reviewed and accepted the differential, and by examination > I do not see how this could cause an issue now, so Meh give it a long > back in -current and things should be ok. While possible in theory, I do not see how the ports system could be affected in practice. Ports are built in a C/POSIX locale on the official builders, and thus using a different locale and collating sequence on a user's system could break the port, but should never be a requirement. I have checked the port Makefiles for occurrences of LANG or LC_* outside specific command invocations (e.g. to set the locale for a sort command). These are the results: - ${USE_LOCALE} is used in bsd.port.mk, but the only case where a locale other than C or en_US.UTF-8 is specified is shells/fd which has USE_LOCALE=ja (i.e. does not specify an encoding). - ${ELIXIR_LOCALE} is used to set LANG and LC_ALL for USES=elixir. But ELIXIR_LOCALE is only ever set to en_US.UTF-8, AFAICT. - print/libpaper explicitly requests LANG=C LC_ALL=C for AWK. - The only port that requests a locale that is not en_US.UTF-8, en_US.ISO8859-1, or C is textproc/te-hunspell, which uses LANG=te_IN.utf8 LC_ALL=te_IN.utf8 to execute wordlist2hunspell, but only for this single shell script that does not invoke AWK and which does internally use LC_ALL=C for sort and uniq to make those not depend on an externally set locale. All other cases where LC_* or LANG are used in port Makefiles are in e.g. EXTRACT_CMD, TEST_ENV or in patch files, but those do enforce a C or C.UTF-8 locale (or en_US.*) and thus have no effect on the proposed change to AWK (besides often only setting the locale for a TAR file extraction). If an exp-run is planned for other reasons, using the modified AWK could be thrown in as a little risk modification. But I do not see any possible effect on the ports system, after performing a grep for LANG and LC_* on the Makefiles and patch files. Regards, STefan