Re: Possible bug in zfs send or pipe implementation?

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Sun, 14 Jul 2024 03:58:07 UTC
On Sat, Jul 13, 2024 at 8:50 PM Rick Macklem <rick.macklem@gmail.com> wrote:
>
> On Sat, Jul 13, 2024 at 8:19 PM Garrett Wollman <wollman@bimajority.org> wrote:
> >
> > <<On Sat, 13 Jul 2024 19:42:32 -0700, Rick Macklem <rick.macklem@gmail.com> said:
> >
> > > # ps axHl
> > > should show you what wchan's the processes are waiting on and that might
> > > give you a clue w.r.t. what is happening?
> >
> > zfs is waiting to write into the pipe and pv (the progress meter) is
> > waiting in select.
> Just to clarify it, are you saying zfs is sleeping on "pipewr"?
If I am reading the code correctly, if it sleeping on "pipewr", it is
out of space
and that is controlled via:
kern.ipc.maxpipekva
and you can see what it is using by looking at
kern.ipc.pipekva
(Unfortunately, I don't think you can change kern.ipc.maxpipekva on the fly.
It looks like it is a loader tunable, so you'd need to reboot to make
it larger.)

Anyhow, you can take a look at the sysctls. They might help?

There is quite a detailed comment in sys/kern/sys_pipe.c related to this.

rick

> (There is also a msleep() for "pipbww" in pipe_write().)
>
> rick
>
> >
> > > If is easy to build a kernel from sources and boot that, you could try defining
> > > PIPE_NODIRECT in sys/kern/sys_pipe.c and see if that avoids the hangs?
> >
> > It's easy to build a kernel from sources, but not easy to reboot the
> > server -- it's being retired shortly, and because of time constraints
> > I need to get it drained before the next scheduled outage.
> >
> > -GAWollman
> >