9.1-current disk throughput stalls ?

Mon Jun 3 20:32:09 UTC 2013

On Mon, Jun 03, 2013 at 09:38:45AM -0600, Ross Alexander wrote:
> I wonder if anyone here has insight on a disk throughput problem
> that's come up over the last week or two.  Now, I habitually run an
> 'svn up' and then rebuild world + kernel every Saturday morning on the
> home machines.  It's all scripted and logged; I've been doing this for
> years and the process is very cut and dried.  Saturday AM, I started
> it as usual - today it was still running, but only about 15% done.
> Normally it completes in 39 minutes, +/- 1 minute.
> 
> What I've noticed is that disk performance on disk intensive stuff has
> gotten very flaky over the last two or three weeks.  A buildworld, to
> pick an example, will run nicely for three to five minutes and then
> bog down.  The disks stay busy, but forward progress slows to a crawl
> and then apparently stops.  Individual cleandirs are taking five to
> ten seconds each on an otherwise unloaded machine.  It feels like
> a vax-11/780 with 30 users and RA-80s, if anyone here remembers those
> days :).
> 
> Here's a 'systat -vms':
> 
>     5 users    Load  0.30  0.30  0.27                  Jun  3 09:07
> 
> Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
>         Tot   Share      Tot    Share    Free           in   out     in   out
> Act   84032   13908  1949112    40736  15071k  count
> All  671192   16300 1076410k    61416          pages
> Proc:                                                            Interrupts
>   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        cow     630 total
>             113      3573   29  113  630   83   26     26 zfod        hdac1 16
>                                                           ozfod       xhci0 ehci
>  0.9%Sys   0.2%Intr  0.3%User  0.0%Nice 98.6%Idle        %ozfod       ohci0 ohci
> |    |    |    |    |    |    |    |    |    |    |       daefr    93 emu10kx0
> +                                                         prcfr   178 hpet0:t0
>                                            dtbuf      596 totfr       hdac0 259
> Namei     Name-cache   Dir-cache    329578 desvn          react   359 ahci0 260
>    Calls    hits   %    hits   %     17505 numvn          pdwak       re0 261
>      475     294  62                 14841 frevn          pdpgs
>                                                           intrn
> Disks  ada0  ada1 pass0 pass1                      796676 wire
> KB/t   5.42  5.96  0.00  0.00                       65484 act
> tps     197   192     0     0                       45332 inact
> MB/s   1.04  1.12  0.00  0.00                             cache
> %busy    74    82     0     0                    15071692 free
>                                                           buf
> 
> This is taken during the early stages of a builworld.  The cleandir
> job steps are *crawling* along.  Rattling the keyboard (USB or serial,
> although an SSH sessions seems to work sometimes as well) gets the
> buildworld doing some useful work again.  Meanwhile, the apps load
> (which is two instances of WSPR, an instance of baudline, KDE, and a
> vncserver), which is soundcard I/O bound and does little to no disk
> I/O) runs along perfectly happily.
> 
> The oldest kernel I have that shows the syndrome is -
> 
>     FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498:
>     Sat May 11 00:03:15 MDT 2013
>     toor at aukward.bogons:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> H/W info -
> 
>     hw.machine: amd64
>     hw.model: AMD Phenom(tm) II X4 965 Processor
>     hw.ncpu: 4
>     hw.physmem: 16883937280
>     hw.clockrate: 3411
>     kern.sched.name: ULE
> 
>     ahci0: <ATI IXP700 AHCI SATA controller> port 0xa000-0xa007,0x9000-0x9003,\
>         0x8000-0x8007,0x7000-0x7003,0x6000-0x600f mem 0xfe6ffc00-0xfe6fffff \
> 	irq 19 at device 17.0 on pci0
>     ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported
>     ahcich0: <AHCI channel> at channel 0 on ahci0
>     [...]
>     ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
>     ada0: <WDC WD1200JD-22HBC0 08.02D08> ATA-6 SATA 1.x device
>     ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
>     ada0: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
>     ada0: Previously was known as ad4
>     ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
>     ada1: <WDC WD1200JD-22HBC0 08.02D08> ATA-6 SATA 1.x device
>     ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
>     ada1: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
>     ada1: Previously was known as ad8
> 
> I'm not paging, I don't have wild interrupt loads (checked with
> 'vmstat -i'), the ZFS pool is not in the middle of a scrub, but the
> machine has bad trivial response and buildworld doesn't get finished.
> I am seeing very similar behaviour on three other 9.1-current
> machines, all of which are AHCI/SATA setups, using both Seagate and WD
> disks (of random sizes and ages).  All these boxes ran fine a month
> ago.
> 
> BTW, when I do the rattle-keyboard-to-get-disks-going trick, the NFS
> daemon reports that the system clock slews badly - machine time drops
> behind wall clock time.  Something is locking the clock update off.
> 
> (Hmmm, I see I'm running a pre-5000/feature flags ZFS pool, FWTW.
> I'll run zpool upgrade, my bad.)

1. There is no such thing as 9.1-CURRENT.  Either you meant 9.1-STABLE
(what should be called stable/9) or -CURRENT (what should be called
head).

2. Is there some reason you excluded details of your ZFS setup?  "zpool
status" would be a good start.

3. Do any of your filesystems/pools have ZFS compression enabled, or
have in the past?

4. Do any of your filesystems/pools have ZFS dedup enabled, or have in
the past?

5. Does the problem go away after a reboot?

6. Can you provide smartctl -x output for both ada0 and ada1?  You will
need to install ports/sysutils/smartmontools for this.  The reason I'm
asking for this is there may be one of your disks which is causing I/O
transactions to stall for the entire pool (i.e. "single point of
annoyance").

7. Can you remove ZFS from the picture entirely (use UFS only) and
re-test?  My guess is that this is ZFS behaviour, particularly the ARC
being flushed to disk, and your disks are old/slow.  (Meaning: you have
16GB RAM + 4 core CPU but with very old disks).

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |