Unstable NFS on recent CURRENT
Paul Mather
paul at gromit.dlib.vt.edu
Wed Mar 9 16:12:44 UTC 2016
On Mar 8, 2016, at 7:49 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> Paul Mather wrote:
>> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
>>
>>> Paul Mather (forwarded by Ronald Klop) wrote:
>>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather <paul at gromit.dlib.vt.edu>
>>>> wrote:
>>>>
>>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been
>>>>> having trouble with NFS. I have been doing a buildworld and buildkernel
>>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this process has
>>>>> resulted in the buildworld failing at some point, with a variety of
>>>>> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR"
>>>>> of /usr/src doesn't manage to complete. It errors out thus:
>>>>>
>>>>> =====
>>>>> [[...]]
>>>>> total 0
>>>>> ls: ./.svn/pristine/fe: Permission denied
>>>>>
>>>>> ./.svn/pristine/ff:
>>>>> total 0
>>>>> ls: ./.svn/pristine/ff: Permission denied
>>>>> ls: fts_read: Permission denied
>>>>> =====
>>>>>
>>>>> On the console, I get the following:
>>>>>
>>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid
>>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR
>>>>> MIDDLEWARE)
>>>>>
> Oh, I had forgotten this. Here's the comment related to this error.
> (about line#445 in sys/fs/nfsclient/nfs_clport.c):
> 446 * BROKEN NFS SERVER OR MIDDLEWARE
> 447 *
> 448 * Certain NFS servers (certain old proprietary filers ca.
> 449 * 2006) or broken middleboxes (e.g. WAN accelerator products)
> 450 * will respond to GETATTR requests with results for a
> 451 * different fileid.
> 452 *
> 453 * The WAN accelerator we've observed not only serves stale
> 454 * cache results for a given file, it also occasionally serves
> 455 * results for wholly different files. This causes surprising
> 456 * problems; for example the cached size attribute of a file
> 457 * may truncate down and then back up, resulting in zero
> 458 * regions in file contents read by applications. We observed
> 459 * this reliably with Clang and .c files during parallel build.
> 460 * A pcap revealed packet fragmentation and GETATTR RPC
> 461 * responses with wholly wrong fileids.
>
> If you can connect the client->server with a simple switch (or just an RJ45 cable), it
> might be worth testing that way. (I don't recall the name of the middleware product, but
> I think it was shipped by one of the major switch vendors. I also don't know if the product
> supports NFSv4?)
>
> rick
Currently, the client is connected to the server via a dumb gigabit switch, so it is already fairly direct.
As for the above error, it appeared on the console only once. (Sorry if I made it sound like it appears every time.)
I just tried another buildworld attempt via NFS and it failed again. This time, I get this on the BeagleBone Black console:
nfs_getpages: error 13
vm_fault: pager read error, pid 5401 (install)
The other thing I have noticed is that if I induce heavy load on the NFS server---e.g., by starting a Poudriere bulk build---then that provokes the client to crash much more readily. For example, I started a NFS buildworld on the BeagleBone Black, and it seemed to be chugging along nicely. The moment I kicked off a Poudriere build update of my packages on the NFS server, it crashed the buildworld on the NFS client.
I have had problems with swap on FreeBSD/arm before. Swapping to a file does not appear to work for me. As a result, I switched to swapping to a partition on the SD card. Maybe this is unreliable, too?
Cheers,
Paul.
More information about the freebsd-arm
mailing list