[Bug 273245] textproc/groff: groff_mdoc(7): output from 'man 7 groff_mdoc' is badly broken
Date: Mon, 04 Sep 2023 22:07:38 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273245 --- Comment #6 from G. Branden Robinson <g.branden.robinson@gmail.com> --- Hi Wolfgang, I cloned the freebsd-src repository to have a look at the 35 cases within it that concerned you. > for i in freebsd-{src,ports,doc};do (cd $i && printf "$i "; git grep 'roff.* -man[^d]' |wc -l );done > freebsd-src 35 $ git grep -n 'roff.* -man[^d]' contrib/byacc/aclocal.m4:1047:${NROFF_NOTE} [\$](SHELL) -c "tbl [\$]*.$2 | nroff -man | col -bx" >[\$]@ contrib/byacc/aclocal.m4:1053:${GROFF_NOTE} [\$](SHELL) -c "tbl [\$]*.$2 | groff -man" >[\$]@ contrib/byacc/aclocal.m4:1056:${GROFF_NOTE} GROFF_NO_SGR=stupid [\$](SHELL) -c "tbl [\$]*.$2 | nroff -rHY=0 -Tascii -man | col -bx" >[\$]@ contrib/byacc/configure:8368:${NROFF_NOTE} \$(SHELL) -c "tbl \$*.1 | nroff -man | col -bx" >\$@ contrib/byacc/configure:8374:${GROFF_NOTE} \$(SHELL) -c "tbl \$*.1 | groff -man" >\$@ contrib/byacc/configure:8377:${GROFF_NOTE} GROFF_NO_SGR=stupid \$(SHELL) -c "tbl \$*.1 | nroff -rHY=0 -Tascii -man | col -bx" >\$@ byacc is maintained by Thomas Dickey. He's using Autoconf macros to produce output from sources that are known to be in man(7) format. https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/yacc.1#L30 `CF_MAKE_DOCS` appears to be an Autoconf macro private to the byacc distribution; I see no other occurrences in `freebsd-src`. He also has a Autoconf test to determine how to generate HTML from man(7) pages. https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/aclocal.m4#L1580 Note in particular the here document at line 1694. https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/aclocal.m4#L1694 contrib/dialog/makefile.in:145:@NROFF_NOTE@ GROFF_NO_SGR=stupid $(SHELL) -c "tbl $< | nroff -rHY=0 -Tascii -man | col -bx" >$@ contrib/dialog/makefile.in:151:@GROFF_NOTE@ $(SHELL) -c "tbl $< | groff -man" >$@ Dialog is another Thomas Dickey project. It also builds inputs it knows to be in man(7) format. $ grep -nrFw .TH contrib/dialogcontrib/dialog/dialog.3:50:.TH \*D 3 "" "$Date: 2021/01/17 18:02:44 $" contrib/dialog/dialog.1:51:.TH \*D 1 "" "$Date: 2021/01/17 17:25:01 $" contrib/dialog/configure:8028:.TH HEAD1 HEAD2 HEAD3 HEAD4 HEAD5 contrib/dialog/aclocal.m4:5919:.TH HEAD1 HEAD2 HEAD3 HEAD4 HEAD5 $ grep -nrFw .Dd contrib/dialog || echo NONE # look for mdoc(7) documents NONE Next... contrib/ee/ee.1:5:.\" nroff -man ee.1 $ head contrib/ee/ee.1 .\" .\" .\" To format this reference page, use the command: .\" .\" nroff -man ee.1 .\" .\" $Header: /home/hugh/sources/old_ae/RCS/ee.1,v 1.22 2001/12/16 04:49:27 hugh Exp $ .\" .\" .TH ee 1 "" "" "" The man page is telling us explicitly (in a comment) what macro package to use to format it, and unsurprisingly getting it right. Next... contrib/ldns/makewin.sh:243:for x in man1/*.1; do groff -man -Tascii -Z "$x" | grotty -cbu > cat1/"$(basename "$x" .1).txt"; done contrib/ldns/makewin.sh:246:for x in man3/*.3; do groff -man -Tascii -Z "$x" | grotty -cbu > cat3/"$(basename "$x" .3).txt"; done Again we have renderings of known documents. Let's see what package they use. $ find contrib/ldns -name "*.[13]" | xargs grep -nEw '\.(Dd|TH)' contrib/ldns/drill/drill.1:2:.TH drill 1 "28 May 2006" contrib/ldns/packaging/ldns-config.1:1:.TH ldns-config 1 "22 Sep 2011" So that's two more correct uses of '-man'. Next... contrib/ncurses/aclocal.m4:5607: nroff -man \$TMP >\$TMP.out contrib/ncurses/configure:14517: nroff -man \$TMP >\$TMP.out Another Thomas Dickey project. These come from his Autoconf macro `CF_MAN_PAGES`. I'll skip ahead here and note that I'm familiar with the ncurses man pages, having recently proposed patches to them. https://lists.gnu.org/archive/html/bug-ncurses/2023-09/ They are exclusively in man(7) format, not mdoc(7). Here again we have a case of a maintainer knowing what format is required, and using it. Next... contrib/tcp_wrappers/Banners.Makefile:12:# sequences as described in the hosts_access.5 manual page (`nroff -man' contrib/tcp_wrappers/CHANGES:2:configuration checker. See the `tcpdchk.8' manual page (`nroff -man' contrib/tcp_wrappers/CHANGES:349:have all rules within a single file. See "nroff -man hosts_options.5" contrib/tcp_wrappers/Makefile:575:# and hosts_options.5 manual pages (`nroff -man' format). contrib/tcp_wrappers/README:240:hosts_access.5 manual page, which is in `nroff -man' format. A later contrib/tcp_wrappers/README:257:The hosts_options.5 manual page (`nroff -man' format) documents an contrib/tcp_wrappers/README:395:documented in the hosts_options.5 document, which is in `nroff -man' contrib/tcp_wrappers/README:432:`nroff -man' format) can guide the requests to the right server. These contrib/tcp_wrappers/README:453:given in the hosts_options.5 manual page (`nroff -man' format). An contrib/tcp_wrappers/README:897:hosts_access.5, which is in `nroff -man' format. This is a lengthy contrib/tcp_wrappers/README:904:The examples in the hosts_access.5 document (`nroff -man' format) show contrib/tcp_wrappers/README:912:hosts_options.5 document (`nroff -man' format). contrib/tcp_wrappers/README:918:program is described in the tcpdchk.8 document (`nroff -man' format). contrib/tcp_wrappers/README:929:described in the tcpdmatch.8 document (`nroff -man' format). contrib/tcp_wrappers/README:967:programs. The hosts_access.3 manual page (`nroff -man' format) contrib/tcp_wrappers/options.c:4: * manual page (source file: hosts_options.5, "nroff -man" format). These are all source comments or text file contents, and do not drive construction of anything; they therefore cannot cause failures. Nevertheless, let us see what macro package is employed by "tcp_wrappers". $ find contrib/tcp_wrappers -name "*.[1-9]" | xargs grep -nEw '\.(Dd|TH)' contrib/tcp_wrappers/hosts_options.5:1:.TH HOSTS_OPTIONS 5 contrib/tcp_wrappers/tcpdmatch.8:1:.TH TCPDMATCH 8 contrib/tcp_wrappers/tcpd.8:1:.TH TCPD 8 contrib/tcp_wrappers/tcpdchk.8:1:.TH TCPDCHK 8 contrib/tcp_wrappers/hosts_access.5:1:.TH HOSTS_ACCESS 5 contrib/tcp_wrappers/hosts_access.3:1:.TH HOSTS_ACCESS 3 It would appear once again that the upstream maintainer is familiar with their own man pages. Next... contrib/tcsh/tcsh.man2html:13:# in the exact same style of nroff -man, i.e. any other manpage. This is a comment. Some context might be helpful. $ git grep -C2 -n 'roff.* -man[^d]' contrib/tcsh/ contrib/tcsh/tcsh.man2html-11-# contrib/tcsh/tcsh.man2html-12-# Designed for tcsh manpage. Guaranteed not to work on manpages not written contrib/tcsh/tcsh.man2html:13:# in the exact same style of nroff -man, i.e. any other manpage. contrib/tcsh/tcsh.man2html-14-# contrib/tcsh/tcsh.man2html-15-# Makes links FROM items which are both a) in particular sections (see Given that context "guaranteed *not* to work on [other man pages]", it does not seem fair to hold this source remark as evidence militating against groff's change. Next... contrib/tzcode/workman.sh:18:.rm }F" | nroff -man - ${1+"$@"} | perl -ne ' More context is warranted here, too. $ head -n 18 contrib/tzcode/workman.sh #! /bin/sh # Convert manual page troff stdin to formatted .txt stdout. # This file is in the public domain, so clarified as of # 2009-05-17 by Arthur David Olson. if (type nroff && type perl) >/dev/null 2>&1; then # Tell groff not to emit SGR escape sequences (ANSI color escapes). GROFF_NO_SGR=1 export GROFF_NO_SGR echo ".am TH .hy 0 .na .. .rm }H .rm }F" | nroff -man - ${1+"$@"} | perl -ne ' Your concern might, at first glance, seem warranted here; the tool does purport to be of general use. However, closer inspection reveals that it was written in ignorance or deliberate neglect of the mdoc(7) package altogether; observe how it appends to the 'TH' macro, which is unused in mdoc(7). It would be straightforward to make this script handle mdoc(7) as well; simply append the same two requests to the `Dd` macro. The removals of '}H' and '}F' strings/macros/diversions suggests a familiarity with the AT&T Unix man(7) implementation or its descendants in USG/System III/System V proprietary Unix or BSD prior to Networking Release/2 (when the Berkeley CSRG replaced Unix troff with groff). Further, if nroff or perl programs (or shell functions) are unavailable, this shell script proceeds to use mandoc anwyay. $ tail -n 6 contrib/tzcode/workman.sh elif (type mandoc && type col) >/dev/null 2>&1; then mandoc -man -T ascii "$@" | col -bx else echo >&2 "$0: please install nroff and perl, or mandoc and col" exit 1 fi Next... usr.bin/man/man.conf.5:116:NROFF_JA /usr/local/bin/groff -man -dlang=ja_JP.eucJP usr.bin/man/man.conf.5:117:TROFF_JA /usr/local/bin/groff -man -dlang=ja_JP.euc.jp This is from an example in FreeBSD's man.conf(5) page. The assignment to the "lang" string has no particular effect in groff, and to the best of my knowledge it never has. This may be evidence of jgroff, a fork of groff that was made unnecessary about in the Debian Project about 20 years ago (https://www.debian.org/security/2002/dsa-107 ) and was superseded by groff support in the 1.21 release about 13 years ago. https://lists.gnu.org/archive/html/groff/2010-12/msg00051.html (Also, is that a typo adding a period in line 117.) I suggest that the foregoing might be bitrotted. Last... usr.bin/man/man.sh:1074:NROFF='groff -S -P-h -Wall -mtty-char -man' usr.bin/man/man.sh:1078:TROFF='groff -S -man' This is a case that should certainly be addressed, if this script still fulfills the purpose claimed for it in its initial commit in 2010: "Implementaiton [sic] of man, manpath, whatis, and apropos written entirely in sh." I strongly recommend s/-man/&doc/ here so that FreeBSD users will continue to have a positive experience. Also, for what it's worth, the `-S` option has been unnecessary; "safer" mode has been the default in groff since version 1.12, released on 14 December 1999. However, 32-33 false positives in a set of 35 suggests to me that your scanning criteria could benefit from sensitivity tuning. An error rate of over 90% is generally considered unusable in serious measurement applications. Regards, Branden -- You are receiving this mail because: You are on the CC list for the bug.