sosend returning ERESTART

Thu Jan 19 22:55:52 UTC 2017

Konstantin Belousov wrote:
>On Wed, Jan 18, 2017 at 10:52:02PM +0000, Rick Macklem wrote:
>> Colin Percival wrote:
>> >On 01/18/17 02:36, Konstantin Belousov wrote:
>> >> On Wed, Jan 18, 2017 at 04:37:40AM +0000, Colin Percival wrote:
>> >>> Thanks, looks like that was exactly it -- if the TCP send buffer was full
>> >>> we would call sbwait, and if a signal arrived it would return ERESTART.
>> >>> It looks like setting the SB_NOINTR flag will prevent this; I'm testing a
>> >>> patch right now.
>> >>
>> >> Note that passing SB_NOINTR unconditionally or even only for mounts
>> >> with nointr (default) option is wrong. You make the socket operation
>> >> uninterruptible, process terminate-ability becomes depended on the
>> >> external factor, the behaviour of the remote system.
>> >
>> >I'm not sure what you're getting at here.  The fact that "NFS mounted without
>> >the intr flag" + "unresponsive NFS server" = "unkillable processes" has been
>> >a (mis)feature of NFS for decades.
>> The case I would like to see work is the forced dismount. I need to go look at
>> what it does and see if SB_NOINTR would break it worse than it is broken now.
>> (It is currently broken when something like "umount" without -f is done, which
>>  locks up the mounted on vnode so "umount -f" never gets to the umount(2) syscall.
>>  I do plan on a "straight ot NFS" option for umount(8) to avoid this problem, but
>>  haven't gotten around to it.)
>>
>> The alternative to SB_NOINTR is looping and doing the sosend() again for the
>> case where it returns ERESTART and "intr" wasn't set on the mount.
>Note that the condition of pending signal which triggered ERESTART is
>permanent until the signal is delivered or blocked. In other words, or
>future PCATCH sleeps will fail with ERESTART/EINTR.
Right. But presumably if the TCP connection is still working, a subsequent
attempt will not have to sleep in sblock() or sbwait() in sosend() and will
succeed?
I think Colin was already testing this looping version before SB_NOINTR and
found it worked well for his case.
--> I think this does imply that it should only loop N times and then give up
      and reply RPC_CANTSEND (which is what it does the first time now).
      - The RPC_CANTSEND is what triggers the client to create a new TCP connection
        and this is what causes grief for his mounts against the AmazonEFS server
        (which is broken because the new TCP connection often results in a
         NFS4ERR_BAD_SESSION which should not happen.)

Colin, have you tested the "loop on ERESTART" version of the patch?
And maybe you could add a loop counter to limit the number of iterations?

rick
[stuff snipped]