lockf() vs. flock() -- lockf() not locking?

Mon Apr 6 21:18:14 UTC 2015

Recently an application I use switched from using flock() for advisory file locking to lockf() in the code that protects against concurrent writes to a file that is being shared and updated by multiple processes (not threads in a single process). The code seems reliable — a lock manager class opens the file & obtains the lock, then the read/update method opens the file using a separate file descriptor & reads/writes the file, flushes & closes the second file descriptor, and then destroys the lock manager object which unlocks the file & closes the first file descriptor.

Surprisingly this simple change seems to have made the code unreliable by allowing concurrent writers to the file and corrupting its contents:

-    if (flock(fd, LOCK_EX) != 0)
+    if (lockf(fd, F_LOCK, 0) != 0)
         throw std::runtime_error("Failed to get a lock of " + filename);

. . .
     if (fd != -1) {
-        flock(fd, LOCK_EX);
+        lockf(fd, F_ULOCK, 0);
         close(fd);
         fd = -1;
     }

From my reading of the lockf(3) man page and reviewing the implementation in lib/libc/gen/lockf.c, and corresponding code in sys/kern/kern_descrip.c, it appears the lockf() call should be successfully obtaining an advisory lock over the whole file like a successful flock() did. However, I have a stress test that quickly corrupts the target file using the lockf() implementation, and the test fails to cause corruption using the flock() implementation. I’ve instrumented the code, and it's clear that multiple processes are simultaneously in the block of code after the “lockf(fd, F_LOCK, 0)” line.

Am I missing something obvious? Any ideas?

Thanks,
Guy