llvm & RTTI over shared libraries

From: <jbo_at_insane.engineer>
Date: Thu, 14 Apr 2022 16:36:24 UTC
Hello folks!

I'm in the middle of moving to FreeBSD as my primary development platform (desktop wise).
As such, I am currently building various software tools I've written over the years on FreeBSD for the first time. Most of those were developed on either Linux+GCC or on Windows+Mingw (MinGW -> GCC).

Today I found myself debugging a piece of software which runs fine on FreeBSD when compiled with gcc11 but not so much when compiling with clang14.
I managed to track down the problem but I lack the deeper understanding to resolve this properly - so here we are.

The software in question is written in C++20 and consisting of:
  - An interface library (just a bunch of header files).
  - A main executable.
  - A bunch of plugins which the executable loads via dlopen().

The interface headers provide several types. Lets call them A, B, C and D. where B, C and D inherit from A.
The plugins use std::dynamic_pointer_cast() to cast an std::shared_ptr<A> (received via the plugin interface) to the derived classes such as std::shared_ptr<B>.
This is where the trouble begins.

If everything (the main executable and the plugins) are compiled using gcc11, everything works "as I expect it".
However, when compiling everything with clang14, the main executable is able to load the plugins successfully but those std::dynamic_pointer_cast() calls within the plugins always return nullptr.

After some research I seem to understand that the way that RTTI is handled over shared library boundaries is different between GCC and LLVM.
This is where my understanding starts to get less solid.

I read the manual page of dlopen(3). It would seem like the flag RTLD_GLOBAL would be potentially interesting to me: "Symbols from this shared object [...] of needed objects will be available for re-solving undefined references from all other shared objects."
The software (which "works as intended" when compiled with GCC) was so far only calling dlopen(..., RTLD_LAZY).
I'm not even sure whether this applies to my situation. My gut feeling tells me that I'm heading down the wrong direction here. After all, the main executable is able to load the plugins and to call the plugin's function which receives an std::shared_ptr<A> as parameter just fine, also when compiled with LLVM.
Is the problem I'm experiencing related to the way that the plugin (shared library) is loaded or the way that the symbols are being exported?
In the current state, the plugins do not explicitly export any symbols.

Here's a heavily simplified version of my scenario:


=== interface.hpp ===

struct A {};
struct B : A {};
struct C : A {};
struct D : A {};

struct plugin
{
    virtual void do_stuff(std::shared_ptr<A> in);
};


=== plugin1 ===

struct plugin1 :
    plugin
{
    void do_stuff(std::shared_ptr<A> a) override
    {
        auto b = std::dynamic_pointer_cast<B>(a);
        if (!b)
            return;

        // GCC  ->   success
        // LLVM ->   b always nullptr
    }
};


Could you guys help me out here?


Best regards,
~ Joel