nfsd kernel threads won't die via SIGKILL

Wed Jun 27 01:05:12 UTC 2018

Konstantin Belousov wrote:
On Mon, Jun 25, 2018 at 02:04:32AM +0000, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Sat, Jun 23, 2018 at 09:03:02PM +0000, Rick Macklem wrote:
> >> During testing of the pNFS server I have been frequently killing/restarting the nfsd.
> >> Once in a while, the "slave" nfsd process doesn't terminate and a "ps axHl" shows:
> >>   0 48889     1   0   20  0  5884  812 svcexit  D     -   0:00.01 nfsd: server
> >>   0 48889     1   0   40  0  5884  812 rpcsvc   I     -   0:00.00 nfsd: server
> >> ... more of the same
> >>   0 48889     1   0   40  0  5884  812 rpcsvc   I     -   0:00.00 nfsd: server
> >>   0 48889     1   0   -8  0  5884  812 rpcsvc   I     -   1:51.78 nfsd: server
> >>   0 48889     1   0   -8  0  5884  812 rpcsvc   I     -   2:27.75 nfsd: server
> >>
> >> You can see that the top thread (the one that was created with the process) is
> >> stuck in "D"  on "svcexit".
> >> The rest of the threads are still servicing NFS RPCs.
[lots of stuff snipped]
>Signals are put onto a signal queue between a time where the signal is
>generated until the thread actually consumes it.  I.e. the signal queue
>is a container for the signals which are not yet acted upon.  There is
>one signal queue per process, and one signal queue for each thread
>belonging to the process.  When you signal the process, the signal is
>put into some thread' signal queue, where the only criteria for the
>selection of the thread is that the signal is not blocked.  Since
>SIGKILL is never blocked, it is put anywhere.
>
>Until signal is delivered by cursig()/postsig() loop, typically at the
>AST handler, the only consequence of its presence are the EINTR/ERESTART
>errors returned from the PCATCH-enabled sleeps.
Ok, now I think I understand how this works. Thanks a lot for the explanation.

> >Your description at the start of the message of the behaviour after
> >SIGKILL, where other threads continued to serve RPCs, exactly matches
> >above explanation. You need to add some global 'stop' flag, if it is not
I looked at the code and there is already basically a "global stop flag".
It's done by setting the sg_state variable to CLOSING for all thread groups
in a function called svc_exit().  (I missed this when I looked before, so I
didn't understand how all the threads normally terminate.)

So, when I looked at svc_run_internal(), there is a loop while (state != closing)
that calls cv_wait_sig()/cv_timedwait_sig() and when these return EINTR/ERESTART
the call to svc_exit() is done to make the threads all return from the function.
--> The only way in can get into the broken situation I see sometimes is if the
      top thread (called "ismaster" by the code) somehow returns from
      svc_run_internal() without calling svc_exit(), so that the state isn't set to
      "closing".

      Turns out there is only one place this can happen. It's this line:
                       if (grp->sg_threadcount > grp->sg_maxthreads)
                                break;
      I wouldn't have thought that sg_threadcount would have become ">" than
      sg_maxthreads, but when I looked at the output of "ps" that I pasted into
      the first message, there are 33 threads. (When I started the nfsd, I specified
      32 threads, so I think it did the "break;" at this place to get out of the loop
      and return from svc_run_internal() without calling svc_exit().)

      I think changing the above line to:
                     if (!ismaster && grp->sg_threadcount > grp->sg_maxthreads)
      will fix it.

  I'll test this and see if I can get it to fail.

Thanks again for your help, rick