[Bug 251347] NFS hangs on client side when mounted from outside in Jail Tree (BROKEN NFS SERVER OR MIDDLEWARE)
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 08 Sep 2021 00:56:28 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251347 --- Comment #13 from Rick Macklem <rmacklem@FreeBSD.org> --- Ok, let me try to explain what the "...BROKEN MIDDLEWARE OR.." message means. There are certain file attributes, such as fileno (think i-node#) that should *never change*. When the NFS client receives file attributes where fileno for a given file has changed, it knows something is "badly broken". One cause of this was a middleware box (hardware/software that sits between the NFS client and NFS server in the network infrastructure) that could fail. - This "middleware box" cached NFS requests/replies. If it saw a request from the NFS client for attributes for the same file it replied to the Getattr with cached attributes. --> This reduced NFS server load, since the NFS server never saw the Getattr RPC request. Such a technology existed and would sometimes reply with bogus attributes for a different file. What was this device called? I have no idea. The guy who told me about this gave no details w.r.t. vendor/product/... (I assumed he was under NDA and could not disclose details beyond this broken device generating the above problem. Since it seems that the FreeBSD server is not broken in this regard (I would see a lot more bug reports about this if it was), then what else might cause this to happen? (ie. fileno mysteriously changes) Here's some unlikely, but possible theories: - Flakey memory in the NFS server that sometimes flips a bit that happens to be used to store the "fileno" attribute. - Flakey network interface transmit side that flips a bit before calculating the network checksum, so that the network checksum succeeds. --> It would seem that most garbled network packets would be caught by checksum failures, but checksums are not infallible. You may be able to dream up more. Mostly within the network fabric between the client<-->server. Given how unlikely these latter possibilities are, you can see why the known case of the "broken middleware box" gets mention in the message. -- You are receiving this mail because: You are the assignee for the bug.