NFS home directory performance tuning for Linux client
Kaya Saman
kayasaman at gmail.com
Mon Aug 21 14:00:20 UTC 2017
Hi,
I'm testing an Arch Linux client with my FreeBSD server which has
recently been updated to 11.1. The server runs a zpool spread over 15x
disks with SSD L2ARC and also just as another test point I am using a
separate SSD (zpool over 1x disk) in the server to compare and contrast
with.
For non-home NFS mounts I found Version4 to have good performance
however, when increasing the MTU size in the network: NIC's, switches,
routing etc... to 9000 ; I tend to see a lot of server timeouts, even
with rsize and wsize increased to 8192.
Hard setting the Linux clients to vers=3 in fstab sees stability, as in
no timeouts, and with no apparent decrease in performance either.
What is odd however, is that FreeBSD to FreeBSD will just work without
any issues at all, so I'm wondering if the NFS implementation on Linux
is slightly different causing these issues??
To the main question/issue however, when using as NFS home directory
setting vers=3 on the client makes the system unuseable. It takes
roughly 5-10 mins after login for anything to appear on screen then
again after clicking somewhere another 5-10mins for the response.
- setting to vers=4 improves things significantly but still if trying to
use an application like Chromium then the system will hang upon browsing
for 5-10 mins then come alive again??
I have set the server up as follows in rc.conf:
nfs_server_flags="-t -n 128 -h <IP>"
nfs_server_enable="YES"
nfsv4_server_enable="YES"
nfsuserd_enable="YES"
nfsuserd_flags="-domain domian.com"
rpc_statd_enable="YES"
rpc_lockd_enable="YES"
rpcbind_enable="YES"
rpcbind_flags="-h <IP>"
mountd_enable="YES"
mountd_flags="-r -n -l -h <IP>"
I've even tried to increase the sysctl variable: vfs.nfs.iodmax from 20
to 60
On the client side the fstab entry contains the following options:
vers=4,defaults,auto,tcp,retrans=10,timeo=30,rsize=8192,wsize=8192,noatime
and gets mounted to /mnt/home. I realize the 'tcp' flag doesn't need to
be there as v4 by default uses 'tcp' however, it is there when testing
with v3.
nfsstat command on server gives:
Client Info:
Rpc Counts:
Getattr Setattr Lookup Readlink Read Write Create
Remove
139585 0 399150 0 137485 0 0 0
Rename Link Symlink Mkdir Rmdir Readdir RdirPlus
Access
0 0 0 0 0 138482 0
393052
Mknod Fsstat Fsinfo PathConf Commit
0 94496 4 0 0
Rpc Info:
TimedOut Invalid X Replies Retries Requests
0 0 0 0 1302226
Cache Info:
Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits
Misses
16703860 139581 13000702 399150 667954 139782
0 0
BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits
Misses
0 0 116186 116459 94263 0 13744785
393052
Server Info:
Getattr Setattr Lookup Readlink Read Write Create
Remove
3200367 39011 89025 41 203807584 806982 410 7656
Rename Link Symlink Mkdir Rmdir Readdir RdirPlus
Access
6383 101 2 6 1 2880 265953
1643074
Mknod Fsstat Fsinfo PathConf Commit
0 94770 30 15 18370
Server Ret-Failed
0
Server Faults
0
Server Cache Stats:
Inprog Idem Non-idem Misses
0 0 0 209101544
Server Write Gathering:
WriteOps WriteRPC Opsaved
806982 806982 0
And nfsstat on client:
Client rpc stats:
calls retrans authrefrsh
327404 1293 327395
Client nfs v3:
null getattr setattr lookup access
0 0% 367 49% 0 0% 3 0% 3 0%
readlink read write create mkdir
0 0% 0 0% 0 0% 0 0% 0 0%
symlink mknod remove rmdir rename
0 0% 0 0% 0 0% 0 0% 0 0%
link readdir readdirplus fsstat fsinfo
0 0% 0 0% 1 0% 363 49% 2 0%
pathconf commit
1 0% 0 0%
Client nfs v4:
null read write commit open
0 0% 16452 5% 169214 51% 6150 1% 17026 5%
open_conf open_noat open_dgrd close setattr
11 0% 0 0% 4 0% 12719 3%
19691 6%
fsinfo renew setclntid confirm lock
12 0% 480 0% 6 0% 6 0% 11578 3%
lockt locku access getattr lookup
35 0% 10085 3% 5545 1% 24591 7%
17443 5%
lookup_root remove rename link symlink
3 0% 1612 0% 4266 1% 31 0% 15 0%
create pathconf statfs readlink readdir
105 0% 9 0% 7093 2% 4 0% 398 0%
server_caps delegreturn getacl setacl fs_locations
21 0% 0 0% 0 0% 0 0% 0 0%
rel_lkowner secinfo fsid_present exchange_id
create_session
2051 0% 0 0% 0 0% 0 0% 0 0%
destroy_session sequence get_lease_time reclaim_comp layoutget
0 0% 0 0% 0 0% 0 0% 0 0%
getdevinfo layoutcommit layoutreturn secinfo_no
test_stateid
0 0% 0 0% 0 0% 0 0% 0 0%
free_stateid getdevicelist bind_conn_to_ses destroy_clientid seek
0 0% 0 0% 0 0% 0 0% 0 0%
allocate deallocate layoutstats clone
0 0% 0 0% 0 0% 0 0%
The server isn't loaded at all, load is around 0.40 and the network is
also pretty free as the system has 4x NIC's in lagg with current
throughput under 10Mb/s.
Would anyone be able to offer any advice?
Many thanks.
More information about the freebsd-questions
mailing list