NFS: rpcsec_gss with Linux clients
Attila Bogár
attila.bogar at linguamatics.com
Fri Aug 31 15:39:48 UTC 2012
Hi,
In the wireshark trace I see, that during an NFS mount, Linux opens two
TCP connections.
Linux creates the GSS conect on one tcp connection, sends a DESTROY
destroys rpcsec,
but immediately (without waiting for the DESTROY reply) - reusing the
context on the other TCP connection.
I don't know who is guilty the BSD or the Linux (or both) as I haven't
spent time reading the RFCs.
This is very difficult to reproduce if the server is very fast. You
have to use an extremely fast client.
With a Linux virtual machine I couldn't reproduce. Even printf's in the
bsd kernel destroy the balance and everything starts to suddenly work
because of the timing. This is a quantum bug.
Look at /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c
In svc_rpc_gss()
case RPCSEC_GSS_DESTROY:
svc_rpc_gss_validate returns FALSE during the DESTROY.
I don't quite know why, but during the destroy within the the
svc_rpc_gss_validate() gss_verify_mic() returns maj_stat =
GSS_S_DEFECTIVE_TOKEN, no matter what heimdal version I use.
As a consequence, client->cl_state is marked CLIENT_STALE;
I think client locking should have been used at this point.
In the meantime the next TCP connection's nfs PUTROOTFH request is being
processed in the kernel.
And this is the point where the problem may or may not happen.
In svc_rpc_gss() at the beginning svc_rpc_gss_timeout_clients() is called.
If it's called before svc_rpc_gss_validate() marked the cl_state
CLIENT_STALE and the Linux client survived.
Here is my patch for review. This is my first ever kernel patch.
I'm going to open a PR...
Constructive comments are welcome.
Thanks,
Attila
--- /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c.orig 2012-08-30
23:34:00.000000000 +0100
+++ /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c 2012-08-31
15:59:40.000000000 +0100
@@ -565,7 +565,8 @@
*/
client->cl_state = CLIENT_NEW;
client->cl_locked = FALSE;
- client->cl_expiration = time_uptime + 5*60;
+ /* we are now more cautious */
+ client->cl_expiration = time_uptime + 4*60;
return (client);
}
@@ -930,7 +931,11 @@
if (cred_lifetime == GSS_C_INDEFINITE)
cred_lifetime = time_uptime + 24*60*60;
- client->cl_expiration = time_uptime + cred_lifetime;
+ /*
+ * we are now more cautious
+ * 12 sec is just an adhoc hack value
+ */
+ client->cl_expiration = time_uptime + cred_lifetime - 12;
/*
* Fill in cred details in the rawcred structure.
@@ -990,7 +995,7 @@
gss_buffer_desc rpcbuf, checksum;
OM_uint32 maj_stat, min_stat;
gss_qop_t qop_state;
- int32_t rpchdr[128 / sizeof(int32_t)];
+ int32_t rpchdr[2048 / sizeof(int32_t)];
int32_t *buf;
rpc_gss_log_debug("in svc_rpc_gss_validate()");
@@ -1024,7 +1029,12 @@
if (maj_stat != GSS_S_COMPLETE) {
rpc_gss_log_status("gss_verify_mic", client->cl_mech,
maj_stat, min_stat);
- client->cl_state = CLIENT_STALE;
+ /*
+ * Linux nfs-utils>=1.2.3 is re-using GSS context
+ * on other TCP NFS connection after it DESTROYED it
+ * The garbage collector will remove client at cl_expiration
+ */
+ /* client->cl_state = CLIENT_STALE; */
return (FALSE);
}
More information about the freebsd-fs
mailing list