Re: Possible bug in zfs send or pipe implementation?
Date: Sun, 14 Jul 2024 02:42:32 UTC
On Sat, Jul 13, 2024 at 7:02 PM Garrett Wollman <wollman@bimajority.org> wrote: > > I'm migrating an old file server to new hardware using syncoid. Every > so often, the `zfs send` process gets stuck with the following > kstacks: > > 7960 108449 zfs - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep pipe_write zfs_file_write_impl zfs_file_write dump_record dmu_dump_write do_dump dmu_send_impl dmu_send_obj zfs_ioc_send zfsdev_ioctl_common zfsdev_ioctl devfs_ioctl vn_ioctl devfs_ioctl_f > 7960 126072 zfs send_traverse_threa mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_cb traverse_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp > 7960 126074 zfs send_merge_thread mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_merge_thread fork_exit fork_trampoline > 7960 126075 zfs send_reader_thread mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_reader_thread fork_exit fork_trampoline > > Near as I can tell, the thread first thread is trying to write > serialized data data to the output pipe and is blocked. The other > threads are stuck because the write process isn't making progress. # ps axHl should show you what wchan's the processes are waiting on and that might give you a clue w.r.t. what is happening? If is easy to build a kernel from sources and boot that, you could try defining PIPE_NODIRECT in sys/kern/sys_pipe.c and see if that avoids the hangs? rick > > The process reading from the pipe (which is just a progress meter) is > sitting in select() waiting for the pipe to become ready, so either > zfs_file_write() is doing something wrong, or the pipe implementation > has lost a selwakeup() somewhere. (Or, possibly but unlikely, the > progress meter has lost the read end of the pipe from its read > fd_set.) Unfortunately, neither fstat nor procstat print any useful > information about the state of the pipe, so I can only try to deduce > what's going on from the observable behavior. > > -GAWollman >