Trouble with NFSd under 6.1-Stable, any ideas?

Joerg Lehners Joerg.Lehners at Informatik.Uni-Oldenburg.DE
Wed May 24 09:58:13 PDT 2006


"Rong-en Fan" <grafan at gmail.com> wrote:
> On 5/14/06, Kris Kennaway <kris at obsecurity.org> wrote:
>> On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
>>>
[...]
>> Use tcpdump and related tools to find out what traffic is being sent.
>>
>> Also verify that you did not change your system configuration in any
>> way: there have been no changes to NFS since the release, so it is
>> unclear why an update would cause the problem to suddenly occur.
>>
>> Kris
>
> Hi Kris and Howard,
>
> As I posted few days ago, I have similar problems like Howard's
> (some details in the thread "6.1-RELEASE, em0 high interrupt rate
> and nfsd eats lots of cpu" on stable@). After binary searching
> the source tree, I found that
>
> RELENG_6_1, 2006.04.30.03.57 ok
> RELENG_6_1, 2006.04.30.04.00 bad
>
> The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
> With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
> the same problem occurs.
[...]

Confirmed!

I can create the problem here at will.

Setup 1: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006
with sys/kern/vfs_lookup.c 1.80.2.7, NFS schurks FreeBSD 6.1-STABLE as of
15. May 2006.

/usr/src from testido mounted on /mnt on schurks.
running 'cd /mnt ; du >/dev/null' two times (first after fresh boot of
testido second when all served data is in memory of testido):

joerg @ schurks> cd /mnt
joerg @ schurks> time du >/dev/null
    86.09s real     0.14s user     1.91s system
joerg @ schurks> time du >/dev/null
   205.10s real     0.20s user     1.92s system
joerg @ schurks>

Screenfull output of top on testido AFTER both tests (testido stopped
responding to screen output sometimes, especially during the
second test):

last pid:   329;  load averages:  4.14,  2.77,  1.25    up 0+00:07:30  18:44:47
29 processes:  1 running, 28 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 8420K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free
Swap: 4000M Total, 4000M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
   201 root        1   4    0  1232K   792K -        4:42 116.31% nfsd
   329 joerg       1  96    0  2404K  1676K RUN      0:00  0.00% top
   168 root        1 115    0  2456K  1760K select   0:00  0.00% sshd
   313 root        1  96    0  1428K  1168K select   0:00  0.00% rlogind
   194 root        1 115    0  1556K  1256K select   0:00  0.00% mountd
   299 root        1   8    0  1720K  1436K wait     0:00  0.00% login
   314 root        1   8    0  1748K  1460K wait     0:00  0.00% login
   298 root        1  96    0  1304K  1048K select   0:00  0.00% rlogind
   199 root        1   4    0  1356K  1040K accept   0:00  0.00% nfsd
   256 root        1  96    0  2892K  1760K select   0:00  0.00% ntpd
   315 joerg       1  20    0  1448K  1020K pause    0:00  0.00% ksh
   300 root        1   5    0  1448K   996K ttyin    0:00  0.00% ksh
   158 root        1  96    0  1332K   940K select   0:00  0.00% syslogd
   163 root        1  96    0  1448K  1128K select   0:00  0.00% inetd
   176 root        1  96    0  1408K  1044K select   0:00  0.00% rpcbind
   185 root        1  96    0  1476K  1148K select   0:00  0.00% ypbind
   261 root        1 115    0  1304K   952K select   0:00  0.00% lpd

Setup 2: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006
with sys/kern/vfs_lookup.c 1.80.2.6, NFS schurks FreeBSD 6.1-STABLE as of
15. May 2006.

Same tests as before:

joerg @ schurks> time du >/dev/null
    22.63s real     0.15s user     1.82s system
joerg @ schurks> time du >/dev/null
    16.52s real     0.17s user     1.68s system
joerg @ schurks>

Screenfull output of top on testido AFTER both tests (testido responded
fine during both tests):

last pid:   329;  load averages:  0.49,  0.26,  0.10    up 0+00:01:50  18:35:30
29 processes:  1 running, 28 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 8424K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free
Swap: 4000M Total, 4000M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
   201 root        1   4    0  1232K   792K -        0:03  3.76% nfsd
   168 root        1 115    0  2456K  1760K select   0:00  0.00% sshd
   329 joerg       1  96    0  2404K  1676K RUN      0:00  0.00% top
   313 root        1  96    0  1428K  1168K select   0:00  0.00% rlogind
   194 root        1 115    0  1556K  1256K select   0:00  0.00% mountd
   299 root        1   8    0  1720K  1440K wait     0:00  0.00% login
   314 root        1   8    0  1748K  1464K wait     0:00  0.00% login
   298 root        1  96    0  1304K  1048K select   0:00  0.00% rlogind
   199 root        1   4    0  1356K  1040K accept   0:00  0.00% nfsd
   315 joerg       1  20    0  1448K  1020K pause    0:00  0.00% ksh
   256 root        1  96    0  2892K  1760K select   0:00  0.00% ntpd
   300 root        1   5    0  1448K   996K ttyin    0:00  0.00% ksh
   158 root        1  96    0  1332K   940K select   0:00  0.00% syslogd
   163 root        1  96    0  1448K  1128K select   0:00  0.00% inetd
   261 root        1 109    0  1304K   952K select   0:00  0.00% lpd
   176 root        1  96    0  1408K  1044K select   0:00  0.00% rpcbind
   185 root        1  96    0  1476K  1148K select   0:00  0.00% ypbind


See the HUGE difference in consumed TIME.

The only difference was sys/kern/vfs_lookup.c version 1.80.2.6
vs. 1.80.2.7.


   Joerg
-- 
Mail: Joerg.Lehners at Informatik.Uni-Oldenburg.DE    Tel: 2198
Real: Joerg Lehners, Informatik ARBI, Uni Oldenburg, D-26111 Oldenburg
Unwoerter: Kostensenkung - Gewinnmaximierung - billig, billig, billig


More information about the freebsd-stable mailing list