[Bug 244493] databases/lmdb: issue with MDB_USE_POSIX_MUTEX

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri Feb 28 10:48:43 UTC 2020


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244493

            Bug ID: 244493
           Summary: databases/lmdb: issue with MDB_USE_POSIX_MUTEX
           Product: Ports & Packages
           Version: Latest
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: delphij at FreeBSD.org
          Reporter: freebsd at kempniu.pl
          Assignee: delphij at FreeBSD.org
             Flags: maintainer-feedback?(delphij at FreeBSD.org)
 Attachment #212019 text/plain
         mime type:

Created attachment 212019
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=212019&action=edit
Sample program reproducing the issue

Hi there,

Ports r519246 switched LMDB from MDB_USE_POSIX_SEM to
MDB_USE_POSIX_MUTEX.  Unfortunately, it seems that there are some edge
cases in which LMDB does not play nicely with FreeBSD's process-shared
mutexes.

The particular problem I observed is that when a single process reopens
an LMDB environment (that is, an environment is opened, closed, then
opened again by the same process), then other processes trying to access
the reopened environment fail to grab the read table lock -
pthread_mutex_lock() returns EINVAL (22).  AFAICT, this happens because
libthr is unable to find the shared memory segment with the relevant
part of the LMDB lockfile mmap()'d.

I attached the simplest test case I could come up with.  To reproduce
the problem, first compile lmdb-mutex.c:

    cc -I/usr/local/include -L/usr/local/lib -llmdb lmdb-mutex.c -o lmdb-mutex

Then, run the first instance of the program.  It should start fine and
sleep for 30 seconds.

Before the first instance exits, start a second instance of the program.
It should fail with:

    Assertion failed: (mdb_txn_begin(env, 0, MDB_RDONLY, &txn) == MDB_SUCCESS),
function main, file lmdb-mutex.c, line 14.

The sample program works fine in the above scenario if LMDB is compiled
with MDB_USE_POSIX_SEM.  It also works fine on other operating systems
using MDB_USE_POSIX_MUTEX.

One workaround I could come up with is enabling ASLR - it causes mmap()
to return different addresses for the LMDB lockfile mapping upon each
call to mdb_env_open(), causing libthr to use different off-pages for
the read table mutexes for the "old" and "new" environment (IIUC).

Note that LMDB never calls pthread_mutex_destroy() for the read table
lock when an environment is closed which I believe prevents the shared
memory segment for the "old" environment from being released.  But
please do not take my word for it, I do not understand
sys/kern/kern_umtx.c, libthr, and LMDB internals well enough to fully
explain what is happening (though I sure would like to find out!)

To see an example occurrence of this problem in the wild, install BIND
(e.g. dns/bind911), put the following into named.conf:

    options {
        allow-new-zones yes;
    };

and then run "named -g -c named.conf".  After it starts up, run
"named-nzd2nzf _default.nzd".  It will fail with:

    named-nzd2nzf: mdb_txn_begin: Invalid argument

My humble suggestion would be to revert the LMDB port back to
MDB_USE_POSIX_SEM for the time being, unless someone can immediately see
what the problem is and is able to fix it.

Hope this helps, please let me know if I can be of any further
assistance with this issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-ports-bugs mailing list