rtld: Relocation from unversioned binary matches oldest version instead of "default"

From: obiwac <obiwac_at_gmail.com>
Date: Fri, 13 May 2022 11:16:39 UTC
Wassup,

This may not be strictly speaking a bug with rtld, but it sure is
weird/awkward behaviour considering the existing information I could
gather.

In an unversioned shared object which references a symbol which has
multiple versions (e.g. readdir@@FBSD_1.5 & readdir@FBSD_1.0, found
with 'readelf -s /lib/libc.so.7 | grep readdir@'), the dynamic linker
always selects the oldest version (so readdir@FBSD_1.0 in this case).
But as I understand it from documents such as [1], shouldn't the
default symbol (readdir@@FBSD_1.5) be used instead? ("Default" means
"unhidden" in the context of rtld afaiu, i.e. a symbol where '!(versym
& VER_NDX_HIDDEN)'.)

The code in question in rtld which exhibits this behaviour is in
'libexec/rtld-elf/rtld.c:matched_symbol':

    /*
     * If we are not called from dlsym (i.e. this is a normal
     * relocation from unversioned binary, accept the symbol
     * immediately if it happens to have first version after
     * this shared object became versioned. Otherwise, if
     * symbol is versioned and not hidden, remember it. If it
     * is the only symbol with this name exported by the
     * shared object, it will be returned as a match at the
     * end of the function. If symbol is global (verndx < 2)
     * accept it unconditionally.
     */
    if ((req->flags & SYMLOOK_DLSYM) == 0 && verndx == VER_NDX_GIVEN) {
        result->sym_out = symp;
        return (true);
    }
    else if (verndx >= VER_NDX_GIVEN) {
        if ((versym & VER_NDX_HIDDEN) == 0) {
            if (result->vsymp == NULL) result->vsymp = symp;
            result->vcount++;
        }
        return (false);
    }

I imagine the intention behind this is to not break older unversioned
shared objects if the default symbol for a certain function it uses is
updated while the older version is still provided, but it makes it
such that you're forced to provide a version for your symbols in newer
programs.

This means the common method for creating shared objects for instance
is incorrect and yields difficult to debug errors, e.g. in the case of
readdir, where a new program will use the new 'dirent' structure, but
'readdir' will be in reality relocated to 'freebsd11_readdir', which
assumes the use of 'freebsd11_dirent':

    % cc -g -fPIC -c lib.c -o lib.o
    % ld -shared lib.o -o liblib.so
    % readelf -sD liblib.so | grep readdir # shows readdir unversioned
    4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND readdir

Simple fix on the user's end would be to force 'liblib.so' to use
versioned symbols:

    % ld -shared lib.o -o liblib.so /lib/libc.so.7
    % readelf -sD liblib.so | grep readdir # shows readdir versioned
     4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND readdir@FBSD_1.5 (3)

But that's a bit awkward I feel, and I don't see anyone suggesting to
do such a thing.

One other bit of weirdness is that LLVM equivalents to GNU tools (e.g.
'llvm-objdump') don't seem to have/care about the notion of a
"default" version:

    % objdump -T /lib/libc.so.7 | grep readdir
    00000000000af200 g    DF .text    00000000000000be  FBSD_1.5    readdir_r
    00000000000af3b0 g    DF .text    00000000000000ed (FBSD_1.0)   readdir_r
    00000000000af1a0 g    DF .text    0000000000000053  FBSD_1.5    readdir
    00000000000af2c0 g    DF .text    00000000000000ed (FBSD_1.0)   readdir
    % llvm-objdump -T /lib/libc.so.7 | grep readdir
    00000000000af200 g    DF .text    00000000000000be readdir_r
    00000000000af3b0 g    DF .text    00000000000000ed     readdir_r
    00000000000af1a0 g    DF .text    0000000000000053 readdir
    00000000000af2c0 g    DF .text    00000000000000ed readdir

This difference in functionality is very frustratingly not mentioned
anywhere that I can find in llvm-objdump's documentation, but perhaps
this is an indication that unversioned binaries are deprecated and
should not be used at all going forward? I don't know, and it irks me
quite a bit I can't find any information about this.

Currently I'm using a patched version of rtld which behaves the way I
understood it should, but I'm still asking here to clarify things,
because this stuff has given me quite a few questions and I can't seem
to find very many answers.

Perhaps kib@ could help with this?

[1]: https://people.freebsd.org/~deischen/symver/library_versioning.txt