svn commit: r292955 - head/lib/libmd

Bruce Evans brde at optusnet.com.au
Thu Dec 31 03:29:22 UTC 2015


On Wed, 30 Dec 2015, Jonathan T. Looney wrote:

> Log:
>  Fix a file descriptor leak in mdXhl.c (which is used by numerous hashing
>  algorithms.

This code had amazingly low quality and is still especially broken near
the main leak fixed.

> Modified: head/lib/libmd/mdXhl.c
> ==============================================================================
> --- head/lib/libmd/mdXhl.c	Wed Dec 30 17:36:34 2015	(r292954)
> +++ head/lib/libmd/mdXhl.c	Wed Dec 30 18:04:50 2015	(r292955)
> @@ -59,14 +59,18 @@ MDXFileChunk(const char *filename, char
> 	f = open(filename, O_RDONLY);
> 	if (f < 0)
> 		return 0;
> -	if (fstat(f, &stbuf) < 0)
> -		return 0;
> +	if (fstat(f, &stbuf) < 0) {
> +		i = -1;
> +		goto error;
> +	}
> 	if (ofs > stbuf.st_size)
> 		ofs = stbuf.st_size;

st_size is only valid for regular files.  The first bug is using it
here.  Usually it is 0 when it is invalid.  This code sets the offset
to 0 then.  I think this is just to avoid a false negative in the buggy
seek test.

> 	if ((len == 0) || (len > stbuf.st_size - ofs))
> 		len = stbuf.st_size - ofs;

Style bugs (extra parentheses) and second use of the invalid st_size.

> -	if (lseek(f, ofs, SEEK_SET) < 0)
> -		return 0;
> +	if (lseek(f, ofs, SEEK_SET) < 0) {
> +		i = -1;
> +		goto error;
> +	}

md5 is very broken by using this function for non-seekable files.  The
main case of a non-seekable file is a pipe.  This case works for at
least md5(1) by not using this function -- it uses MDFilter() which
uses stdio's fread() which works.
   (stdio knows better than to use st_size, but it naively trusts
   st_blksize and uses it to give a pessimized small block size for read()
   called from fread().  This code handles buffer sizing and allocation
   worse using its low quality methods:
   - its buffer used to have size BUFSIZ.  This size is not usable for
     anything except possibly buffering keyboard input and stderr.  It
     is mainly part of the broken setbuf() API.  Its value is 1024,
     perhaps to avoid changing this API.  1024 large enough for more
     uses in 1980.
   - its buffer now has size 16*1024 (spelled with a style bug like that).
     This sometimes matches and sometimes exceeds the size used by stdio.
     stdio at least attempts to choose the best size, but is defeated by
     stat() putting useless sizes in st_blksize.
   - it also tries to pessimize the i/o by asking for a misaligned buffer.
     Its buffer is just a local char[] variable.  Compilers usally mostly
     foil this attempt by aligning large variables on the stack.

     Misaligned usr buffers should only pessimize the i/o, but last time
     I looked they cause DMA errors in ahci.  This breaks bsdlabel(8).
     bsdlabel ask for a misaligned buffer and gets it at least when
     statically linked because its buffer is a global char [] variable
     and the neither the compiler nor the linker gives more alignment
     than requested.

     The DMA error would break md5 (or just cat) similarly if the buffer
     were misaligend.  But the main bug here breaks md5 on disks without
     trying any i/o.

Back to the main bug.  This seek test detects some cases of non-regular
files.  These files indeed cannot be handled by this code.  But the error
handling is broken -- it is just to set errno and return 0 (EOF).  0
means no error and errno is no set.  So all non-seekable files are
treated as empty regular files.  E.g., md5 /proc/0/* gives:

md5: /proc/0/ctl: Permission denied
MD5 (/proc/0/cmdline) = d41d8cd98f00b204e9800998ecf8427e
MD5 (/proc/0/etype) = d41d8cd98f00b204e9800998ecf8427e
MD5 (/proc/0/rlimit) = d41d8cd98f00b204e9800998ecf8427e
MD5 (/proc/0/status) = d41d8cd98f00b204e9800998ecf8427e

(/proc gives my favourite examples of irregular regular files.  It
doesn't work to use S_ISREG(&stbuf) to determine if st_size can be
trusted, since too many irregular files claim to be regular: e.g.,
file /proc/0/* gives:

/proc/0/cmdline: empty
/proc/0/ctl:     empty
/proc/0/etype:   empty
/proc/0/rlimit:  empty
/proc/0/status:  empty

wc /proc/0/* works.  md5 works like wc using a hack to avoid calling this
broken function.  E.g.,

     for i in $(ls /proc/0/*); do echo -n "$i: "; md5 <$i; done  # gives

/proc/0/cmdline: 3c5896b1ac441f4998f052e2126e8d20
/proc/0/ctl: d41d8cd98f00b204e9800998ecf8427e
/proc/0/etype: 674441960ca1ba2de08ad4e50c9fde98
/proc/0/rlimit: 67d6ad67b412e1bceb7cb508a3492197
/proc/0/status: 3ccc3067b97c3232ea2dbcb64c458fd4

> 	n = len;
> 	i = 0;
> 	while (n > 0) {

Disk device files are seekable, so the lseek test doesn't result in
mishandling them.  Instead, they are treated as empty files since
their st_size is 0.  E.g., md5 /dev/ad* gives:

MD5 (/dev/ad0) = d41d8cd98f00b204e9800998ecf8427e
MD5 (/dev/ad0s1) = d41d8cd98f00b204e9800998ecf8427e
MD5 (/dev/ad0s2) = d41d8cd98f00b204e9800998ecf8427e
MD5 (/dev/ad0s2a) = d41d8cd98f00b204e9800998ecf8427e
...

The workarounds are much the same as for non-seekable files:

     for i in /dev/ad0s2a; do echo -n "$i: "; md5 <$i; done  # gives

/dev/ad0s2a: 3b4c134a41615cbc60ef778acd188ed4

Most disk devices are too large to run md5 on, but this partition has
size 64MB.  I sometimes run md5 and other utilities like cmp on
partitions or USB disks of size up to 10GB and get annoyed by the i/o
pessimizations.  It is faster to copy everything to a regular file
and check that.  cp has reasonably good i/o buffer sizes and the disk
cache works well.  However, copying a disk to a regular file and looking
at its partitions there using md(4) works even worse.  md(4) defeats
the disk cache using IO_DIRECT.

> @@ -79,6 +83,7 @@ MDXFileChunk(const char *filename, char
> 		MDXUpdate(&ctx, buffer, i);
> 		n -= i;
> 	}
> +error:
> 	e = errno;
> 	close(f);
> 	errno = e;

I don't understand the plumbing for MDXFileChunk().  Usually the whole
file is wanted.  MDXFileChunk() cannot work with a nonzero offset on
non-seekable files.  But it is preferred to the working MDFilter() even
in the usual case of a zero offset.

MDFilter() is actually a static function in md5(1).  libmd apparently
has no working input function, since MDXFileChunk() is its only function
that uses *read().  MDXFile() is just a wrapper around MDXFileChunk().

The seek test is also broken for certain offsets.  FreeBSD supports
kseeking to negative offsets on device files.  Such offsets occur
for high addresses in /dev/kmem on 64-bit systems.  errno should be
set to 0 before the call and used instead of the return value to
determine if the syscall failed.  See dd/dd.c for a seek test that is
closer to working.  dd checks for EPIPE.  It also messes around with
S_ISCHR(), S_ISBLK() and FIODTYPE.  Its algorithm is:
- trust S_ISCHR() and S_ISBLK() and don't do the general lseek test
   for these cases.  These can be trusted more than S_ISREG().
- for device files according to this classification, abort the whole
   program if FDIOTYPE fails.  Trust it for setting ISSEEK and ISTAPE
   flags if it succeeds.
- otherwise, do a general lseek test with an EPIPE.

MD shouldn't try as hard as dd since there are probably bugs in the
complications.  But it should try to fail safe since it is more
security-related than dd.  That means bailing out on any detected
error and not trusting st_size unless it is nonzero even if the file
is regular.

MDXFileChunk()'s error handling seems to be to return 0 on all errors.
This is undocumented, and is ambiguous since 0 is also returned for
empty files and for files misclassified as empty.  The ambiguity can
sometimes be resolved by checking errno, but libmd and md5 never check
errno.  md5 ends up never checking the error and printing
d41d8cd98f00b204e9800998ecf8427e for both empty files and for all errors
that occur before any bytes are handled.

Bruce


More information about the svn-src-head mailing list