svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls
Rick Macklem
rmacklem at uoguelph.ca
Tue Jun 30 16:20:56 UTC 2020
Benjamin Kaduk wrote:
>On Tue, Jun 30, 2020 at 7:49 AM Rick Macklem <rmacklem at freebsd.org<mailto:rmacklem at freebsd.org>> wrote:
>Author: rmacklem
>Date: Tue Jun 30 14:49:51 2020
>New Revision: 362798
>URL: https://svnweb.freebsd.org/changeset/base/362798
>
>Log:
> Testing when a server does not respond to TLS handshake records exposed
> a couple of problems, since the daemon would be in SSL_connect() for 6 minutes.
>
> - When the upcall timed out and was retried, the RPCTLS_SYSC_CLSOCKET syscall
> was broken and did not return an error upon a retry. It allocated a file
> descriptor for a NULL socket.
> - The socket structure in the kernel could be free'd while the daemon was
> still using it in SSL_connect().
> - Adjust the timeout a retry count so that upcalls are only attempted once
> with a 10minute timeout.
>
>
>10 minutes seems really long! It sounds from the description like the upcall so >that
>userspace can run SSL_connect() was taking 6 minutes, and you needed 10 >minutes so
>as to be longer than the 6 minutes that is "out of your control"?
Well, I think a long timeout here is ok, since a timeout indicates a broken daemon.
(The upcalls to the local daemon should be reliable and cannot safely be redone.
In a perfect world, the upcall mechanism would be "exactly once" instead of
"at least once". I think an upcall might fail when the mbuf pool in the kernel
is exhausted, but that should be rare.)
>I feel like there should be some sockopts available to get the SSL_connect() timeout
>down, so that the upcall timeout doesn't need to be so long, either.
Yes, 6 minutes does seem like a long time. I only discovered this yesterday when
I simulated a server that did not respond to handshake records.
I haven't yet dug into the openssl code to see if there is a way to adjust this
timeout.
I also do not know what a good timeout value for SSL_connect() might be,
even if the daemon can override the default.
In practice, this should only happen when trying to do an NFS mount on
a broken server which responds to the "STARTTLS" Null RPC, but does not
do the handshake.
Having the mount attempt stuck for 6minutes before failing is not that serious
a problem, imho.
(When systems boot after something like a power failure, delays getting NFS
mounts done, due to the NFS server/network needing to be up, is fairly
normal. The "-b" option to put the mount attempt in background has been
around for a long time for this.)
If you happen to know how to set a timeout for SSL_connect() in the openssl
library, I would be interested in hearing that.
rick
-Ben
More information about the svn-src-projects
mailing list