Possible bug in zfs send or pipe implementation?

From: Garrett Wollman <wollman_at_bimajority.org>
Date: Sun, 14 Jul 2024 02:02:17 UTC
I'm migrating an old file server to new hardware using syncoid.  Every
so often, the `zfs send` process gets stuck with the following
kstacks:

 7960 108449 zfs                 -                   mi_switch sleepq_catch_signals sleepq_wait_sig _sleep pipe_write zfs_file_write_impl zfs_file_write dump_record dmu_dump_write do_dump dmu_send_impl dmu_send_obj zfs_ioc_send zfsdev_ioctl_common zfsdev_ioctl devfs_ioctl vn_ioctl devfs_ioctl_f 
 7960 126072 zfs                 send_traverse_threa mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_cb traverse_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp 
 7960 126074 zfs                 send_merge_thread   mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_merge_thread fork_exit fork_trampoline 
 7960 126075 zfs                 send_reader_thread  mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_reader_thread fork_exit fork_trampoline 

Near as I can tell, the thread first thread is trying to write
serialized data data to the output pipe and is blocked.  The other
threads are stuck because the write process isn't making progress.

The process reading from the pipe (which is just a progress meter) is
sitting in select() waiting for the pipe to become ready, so either
zfs_file_write() is doing something wrong, or the pipe implementation
has lost a selwakeup() somewhere.  (Or, possibly but unlikely, the
progress meter has lost the read end of the pipe from its read
fd_set.)  Unfortunately, neither fstat nor procstat print any useful
information about the state of the pipe, so I can only try to deduce
what's going on from the observable behavior.

-GAWollman