NFS 75 second stall
alan bryan
alan.bryan at yahoo.com
Thu Jul 1 20:36:17 UTC 2010
--- On Thu, 7/1/10, Garrett Cooper <yanefbsd at gmail.com> wrote:
> From: Garrett Cooper <yanefbsd at gmail.com>
> Subject: Re: NFS 75 second stall
> To: "alan bryan" <alan.bryan at yahoo.com>
> Cc: freebsd-stable at freebsd.org
> Date: Thursday, July 1, 2010, 1:28 PM
> On Thu, Jul 1, 2010 at 1:18 PM, alan
> bryan <alan.bryan at yahoo.com>
> wrote:
> >
> >
> > --- On Thu, 7/1/10, Garrett Cooper <yanefbsd at gmail.com>
> wrote:
> >
> >> From: Garrett Cooper <yanefbsd at gmail.com>
> >> Subject: Re: NFS 75 second stall
> >> To: "alan bryan" <alan.bryan at yahoo.com>
> >> Cc: freebsd-stable at freebsd.org
> >> Date: Thursday, July 1, 2010, 12:23 PM
> >> On Thu, Jul 1, 2010 at 11:51 AM, alan
> >> bryan <alan.bryan at yahoo.com>
> >> wrote:
> >> >
> >> >
> >> > --- On Thu, 7/1/10, Garrett Cooper <yanefbsd at gmail.com>
> >> wrote:
> >> >
> >> >> From: Garrett Cooper <yanefbsd at gmail.com>
> >> >> Subject: Re: NFS 75 second stall
> >> >> To: "alan bryan" <alan.bryan at yahoo.com>
> >> >> Cc: freebsd-stable at freebsd.org
> >> >> Date: Thursday, July 1, 2010, 11:13 AM
> >> >> On Thu, Jul 1, 2010 at 11:01 AM, alan
> >> >> bryan <alan.bryan at yahoo.com>
> >> >> wrote:
> >> >> > Setup:
> >> >> >
> >> >> > server - FreeBSD 8-stable from
> today. 2 UFS
> >> dirs
> >> >> exported via NFS.
> >> >> > client - FreeBSD 8.0-Release.
> Running a
> >> test php
> >> >> script that copies around various files
> to/from 2
> >> separate
> >> >> NFS mounts.
> >> >> >
> >> >> > Situation:
> >> >> >
> >> >> > script is started (forked to do 20
> >> simultaneous runs)
> >> >> and 20 1GB files are copied to the NFS
> dir which
> >> works
> >> >> fine. When it then switches to reading
> those
> >> files back
> >> >> and simultaneously writing to the other
> NFS mount
> >> I see a
> >> >> hang of 75 seconds. If I do an "ls -l"
> on the
> >> NFS mount it
> >> >> hangs too. After 75 seconds the client
> has
> >> reported:
> >> >> >
> >> >> > nfs server
> 192.168.10.133:/usr/local/export1:
> >> not
> >> >> responding
> >> >> > nfs server
> 192.168.10.133:/usr/local/export1:
> >> is alive
> >> >> again
> >> >> > nfs server
> 192.168.10.133:/usr/local/export1:
> >> not
> >> >> responding
> >> >> > nfs server
> 192.168.10.133:/usr/local/export1:
> >> is alive
> >> >> again
> >> >> >
> >> >> > and then things start working
> again. The
> >> server was
> >> >> originally FreeBSD 8.0-Release also but
> was
> >> upgraded to the
> >> >> latest stable to see if this issue could
> be
> >> avoided.
> >> >> >
> >> >> > # nfsstat -s -W -w 1
> >> >> > GtAttr Lookup Rdlink Read
> Write
> >> Rename
> >> >> Access Rddir
> >> >> > 0 0 0
> 222
> >> 257
> >> >> 0 0 0
> >> >> > 0 0 0
> 178
> >> 135
> >> >> 0 0 0
> >> >> > 0 0 0
> 85
> >> 127
> >> >> 0 0 0
> >> >> > 0 0 0
> 0
> >> 0
> >> >> 0 0 0
> >> >> > 0 0 0
> 0
> >> 0
> >> >> 0 0 0
> >> >> > 0 0 0
> 0
> >> 0
> >> >> 0 0 0
> >> >> > 0 0 0
> 0
> >> 0
> >> >> 0 0 0
> >> >> > 0 0 0
> 0
> >> 0
> >> >> 0 0 0
> >> >> >
> >> >> > ... for 75 rows of all zeros
> >> >> >
> >> >> > 0 0 0
> 272
> >> 266
> >> >> 0 0 0
> >> >> > 0 0 0
> 167
> >> 165
> >> >> 0 0 0
> >> >> >
> >> >> > I also tried runs with 15
> simultaneous
> >> processes and
> >> >> 25. 15 processes gave only about a 5
> second
> >> stall but 25
> >> >> gave again the same 75 second stall.
> >> >> >
> >> >> > Further, I tested with 2 mounts to
> the same
> >> server but
> >> >> from ZFS filesytems with the exact same
> >> stall/timeout
> >> >> periods. So, it doesn't appear to
> matter what
> >> the
> >> >> underlying filesystem is - it's something
> in NFS
> >> or
> >> >> networking code.
> >> >> >
> >> >> > Any ideas on what's going on here?
> What's
> >> causing
> >> >> the complete stall period of zero NFS
> activity?
> >> Any flaws
> >> >> with my testing methods?
> >> >> >
> >> >> > Thanks for any and all help/ideas.
> >> >>
> >> >> What network driver are you using? Have
> you tried
> >> >> tcpdumping the packets?
> >> >> -Garrett
> >> >>
> >> >
> >> > I'm using igb currently but have also used
> em. I
> >> have not tried tcpdumping the packets yet on this
> test.
> >> Any suggestions on things to look out for (I'm
> not that
> >> familiar with that whole process).
> >> >
> >> > Which brings up another point - I'm using
> TCP
> >> connections for NFS, not UDP.
> >>
> >> Is the net.inet.tcp.tso sysctl enabled or
> >> not? What about rxcsum and txcsum?
> >> Thanks,
> >> -Garrett
> >>
> >
> > I haven't intentionally/explicitly set any of this so
> it's "default":
> >
> > # sysctl net.inet.tcp.tso
> > net.inet.tcp.tso: 1
> >
> >
> > igb0:
> flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST>
> metric 0 mtu 1500
> >
> options=13b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,TSO4>
> > ether 00:30:48:c3:26:94
> > inet 192.168.10.133 netmask 0xffffff00
> broadcast 192.168.10.255
> > media: Ethernet autoselect (1000baseT
> <full-duplex>)
> > status: active
>
> Devise all of the available permutations that you need to
> use to test
> this out; there are a total of 3 variables, so 9
> permutations, but
> you've already `tested one', so that makes the permutation
> count 8.
> Example:
>
> TXCSUM=off, RXCSUM=on, TSO=on
> TXCSUM=on, RXCSUM=off, TSO=on
> TXCSUM=on, RXCSUM=off, TSO=off
>
> ...
>
> Try executing the permutations on the client first, keeping
> the server
> constant, then make the client constant and make the server
> variable,
> and finally do both to the server and client.
>
> Be sure to take measurements for each permutation to ensure
> that
> things make functional sense.
>
> The reason why I'm suggesting this is that there were
> issues with
> em(4) [and igb(4) too I think since it uses common code],
> with various
> hardware offload bits on 8.0-RELEASE (IIRC disabling txcsum
> did the
> trick, but you may have to do more than that in order to
> get things to
> work).
>
> Here's a similar thread with a different driver:
> http://lists.freebsd.org/pipermail/freebsd-current/2009-June/008264.html
> (just to illustrate the thought process used to determine
> the source
> of failure).
>
> Thanks,
> -Garrett
>
Thanks for the detailed test plan!
Is it also fair to then assume that if I update the NFS client machine to the latest 8-Stable that should also fix this issue? (Both will then be running the latest 8-stable code). These are not in production so I can test or upgrade with no issues.
Thanks again.
--Alan
More information about the freebsd-stable
mailing list