NFS: rpcsec_gss with Linux clients

Attila Bogár attila.bogar at
Fri Aug 31 15:39:48 UTC 2012


In the wireshark trace I see, that during an NFS mount, Linux opens two 
TCP connections.
Linux creates the GSS conect on one tcp connection, sends a DESTROY 
destroys rpcsec,
but immediately (without waiting for the DESTROY reply) - reusing the 
context on the other TCP connection.

I don't know who is guilty the BSD or the Linux (or both) as I haven't 
spent time reading the RFCs.

This is very difficult to reproduce if the server is very fast.  You 
have to use an extremely fast client.
With a Linux virtual machine I couldn't reproduce. Even printf's in the 
bsd kernel destroy the balance and everything starts to suddenly work 
because of the timing. This is a quantum bug.

Look at /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c

In svc_rpc_gss()

svc_rpc_gss_validate returns FALSE during the DESTROY.

I don't quite know why, but during the destroy within the the 
svc_rpc_gss_validate() gss_verify_mic() returns maj_stat = 
GSS_S_DEFECTIVE_TOKEN, no matter what heimdal version I use.

As a consequence, client->cl_state is marked CLIENT_STALE;

I think client locking should have been used at this point.

In the meantime the next TCP connection's nfs PUTROOTFH request is being 
processed in the kernel.

And this is the point where the problem may or may not happen.
In svc_rpc_gss() at the beginning svc_rpc_gss_timeout_clients() is called.
If it's called before svc_rpc_gss_validate() marked the cl_state 
CLIENT_STALE and the Linux client survived.

Here is my patch for review.  This is my first ever kernel patch.

I'm going to open a PR...

Constructive comments are welcome.


--- /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c.orig 2012-08-30 
23:34:00.000000000 +0100
+++ /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c        2012-08-31 
15:59:40.000000000 +0100
@@ -565,7 +565,8 @@
         client->cl_state = CLIENT_NEW;
         client->cl_locked = FALSE;
-       client->cl_expiration = time_uptime + 5*60;
+       /* we are now more cautious */
+       client->cl_expiration = time_uptime + 4*60;

         return (client);
@@ -930,7 +931,11 @@
                 if (cred_lifetime == GSS_C_INDEFINITE)
                         cred_lifetime = time_uptime + 24*60*60;

-               client->cl_expiration = time_uptime + cred_lifetime;
+               /*
+                * we are now more cautious
+                * 12 sec is just an adhoc hack value
+                */
+               client->cl_expiration = time_uptime + cred_lifetime - 12;

                  * Fill in cred details in the rawcred structure.
@@ -990,7 +995,7 @@
         gss_buffer_desc          rpcbuf, checksum;
         OM_uint32                maj_stat, min_stat;
         gss_qop_t                qop_state;
-       int32_t                  rpchdr[128 / sizeof(int32_t)];
+       int32_t                  rpchdr[2048 / sizeof(int32_t)];
         int32_t                 *buf;

         rpc_gss_log_debug("in svc_rpc_gss_validate()");
@@ -1024,7 +1029,12 @@
         if (maj_stat != GSS_S_COMPLETE) {
                 rpc_gss_log_status("gss_verify_mic", client->cl_mech,
                     maj_stat, min_stat);
-               client->cl_state = CLIENT_STALE;
+               /*
+                * Linux nfs-utils>=1.2.3 is re-using GSS context
+                * on other TCP NFS connection after it DESTROYED it
+                * The garbage collector will remove client at cl_expiration
+                */
+               /* client->cl_state = CLIENT_STALE; */
                 return (FALSE);

More information about the freebsd-fs mailing list