svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls
Rick Macklem
rmacklem at uoguelph.ca
Wed Jul 1 22:47:31 UTC 2020
Benjamin Kaduk wrote:
>On Wed, Jul 01, 2020 at 01:23:50AM +0000, Rick Macklem wrote:
>> Rick Macklem wrote:
>> >Benjamin Kaduk wrote:
>> >>On Tue, Jun 30, 2020 at 04:20:45PM +0000, Rick Macklem wrote:
>> >>> If you happen to know how to set a timeout for SSL_connect() in the openssl
>> >>> library, I would be interested in hearing that.
>> >>
>> >>As it happens, I took a look before I wrote the initial note, and there
>> >>doesn't seem to be any intrinsic TLS (not DTLS) handshake timeouts in
>> >>libssl itself; I expect this is actually just the (kernel's!) TCP timeout.
>> >>So you'd be getting the socket fd (e.g., SSL_get_fd(), if you don't have a
>> >>reference already) and using setsockopt() to set the timeout(s).
>> >Interesting. The test case I simulated did not close the TCP socket used by
>> >SSL_connect(). The server just replied to the STARTTLS Null RPC, but did not
>> >call SSL_accept(), so the server side just isn't playing "handshake".
>> >"netstat -a" showed the connection as ESTABLISHED.
>> >During debugging, I also used the trick of putting:
>> > while (1)
>> > sleep(1);
>> >right after the SSL_connect() call and, when watching it via "ps",
>> >it would switch from "sbwait" to "nanoslp" after 6 minutes and
>> >a syslog() call showed that SSL_connect() had returned -1.
>> >
>> >So, if the TCP connection was "established", what caused the SSL_connect()
>> >to return with an error (-1) after 6 minutes?
>> >
>> >Now, there is a 6 minute idle timeout in the RPC code for TCP where it,
>> >by default, closes the connection when there is 6 minutes without any
>> >activity. (I have to look if waiting for a reply for the upcall implies "no activity" and >if
>> >this also happens for AF_LOCAL sockets, which is what the upcalls use.)
>> Ok, I figured out what is happening for this test.
>> It is the 6 minute idle timeout, but it occurs at the server end, where the NFS server
>> end shuts down the TCP connection.
>
>Ah, that makes sense.
>
>> Now, the client cannot assume all servers will do this.
>
>Right.
>
>> I'm going to try playing around with doing a shutdown of the socket on the
>> client end after a shorter timeout on the upcall and see if that can get
>> SSL_connect() to return with a failure in the daemon.
>>
>> >Now, if that happens, a SIGPIPE would be posted to the daemon, which
>> >is SIG_IGN'd by the daemon. But maybe the SIGPIPE somehow causes
>> >SSL_connect() to return -1 by making the syscall it is doing (read/recv on the
>> >TCP socket sitting in sbwait) return EINTR, or something like that?
>> Ignore this "theory". It was bunk.
>
>Non-ignored signals would cause SSL_connect() to return, but ignored ones
>should be wholly ignored, yes.
>
>> >I can change this 6minute timeout to see if that affects it.
>> Can't be changed, since it is at the server end of the TCP connection.
>
>Can't you set a client-side (e.g., read) timeout, though?
Well, in this case it would be the read (or recv or ??) that is done inside the
SSL_connect().
The timer I can control is the one that I had set to 10minutes, which times out
the upcall RPC to the userland daemon. I had set it to 10minutes so the
SSL_connect() would time out first, but now that I know that won't always happen..
This timer is now set to 15sec and after it times out, the kernel code does a
soshutdown(so, SHUT_RD) in the client, which seems to be sufficient to get
SSL_connect() to return an error.
This seems sufficient and works ok for the testing I've done.
15sec is pretty arbitrary, but I figure a timeout on the order of seconds is
reasonable for RPC upcalls to the local daemon. (I'd guess that taking even
1sec to do an upcall would indicate something is broken.)
If others feel 15sec isn't an appropriate timeout, feel free to comment.
(Note that this timeout should only happen when something is broken, like
the server that does a "STARTTLS" reply but does not do a TLS handshake.)
Thanks for the comments, rick
to return an error.
-Ben
> (A comment in the krpc code mentions a 5minute timeout in the client,
> but I don't see that in the code?)
>
> >When you've got upcalls and library functions both talking to sockets it
> >can get interesting.
> >
> >Thanks for the comments, rick
>
> Correcting myself, rick
>
> -Ben
>
More information about the svn-src-projects
mailing list