Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations

From: Matthew L. Dailey <Matthew.L.Dailey_at_dartmouth.edu>
Date: Wed, 21 Aug 2024 14:29:30 UTC
Hi all,

I posted messages to the this list back in February and March 
(https://lists.freebsd.org/archives/freebsd-current/2024-February/005546.html) 
regarding kernel panics we were having with nfs clients doing hdf5 file 
operations. After a hiatus in troubleshooting, I had more time this 
summer and have found the cause - the vfs.nfsd.enable_locallocks sysctl.

When this is set to 1, we can induce either a panic or hung nfs server 
(more rarely) usually within a few hours, but sometimes within several 
days to a week. We have replicated this on 13.0 through 15.0-CURRENT 
(20240725-82283cad12a4-271360). With this set to 0 (default), we are 
unable to replicate the issue, even after several weeks of 24/7 hdf5 
file operations.

One other side-effect of these panics is that on a few occasions it has 
corrupted the root zpool beyond repair. This makes sense since kernel 
memory is getting corrupted, but obviously makes this issue more impactful.

I'm hoping this is enough information to start narrowing down this 
issue. We are specifically using this sysctl because we are also serving 
files via samba and want to ensure consistent locking.

I have provided some core dumps and backtraces previously, but am happy 
to provide more as needed. I also have a writeup of exactly how to 
reproduce this that I can send directly to anyone who is interested.

Thanks so much for any and all help with this tricky problem. I'm happy 
to do whatever I can to help get this squashed.

Best,
Matt