9.1-current disk throughput stalls ?
Ross Alexander
rwa at athabascau.ca
Mon Jun 3 16:08:02 UTC 2013
Folks,
I wonder if anyone here has insight on a disk throughput problem
that's come up over the last week or two. Now, I habitually run an
'svn up' and then rebuild world + kernel every Saturday morning on the
home machines. It's all scripted and logged; I've been doing this for
years and the process is very cut and dried. Saturday AM, I started
it as usual - today it was still running, but only about 15% done.
Normally it completes in 39 minutes, +/- 1 minute.
What I've noticed is that disk performance on disk intensive stuff has
gotten very flaky over the last two or three weeks. A buildworld, to
pick an example, will run nicely for three to five minutes and then
bog down. The disks stay busy, but forward progress slows to a crawl
and then apparently stops. Individual cleandirs are taking five to
ten seconds each on an otherwise unloaded machine. It feels like
a vax-11/780 with 30 users and RA-80s, if anyone here remembers those
days :).
Here's a 'systat -vms':
5 users Load 0.30 0.30 0.27 Jun 3 09:07
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 84032 13908 1949112 40736 15071k count
All 671192 16300 1076410k 61416 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt cow 630 total
113 3573 29 113 630 83 26 26 zfod hdac1 16
ozfod xhci0 ehci
0.9%Sys 0.2%Intr 0.3%User 0.0%Nice 98.6%Idle %ozfod ohci0 ohci
| | | | | | | | | | | daefr 93 emu10kx0
+ prcfr 178 hpet0:t0
dtbuf 596 totfr hdac0 259
Namei Name-cache Dir-cache 329578 desvn react 359 ahci0 260
Calls hits % hits % 17505 numvn pdwak re0 261
475 294 62 14841 frevn pdpgs
intrn
Disks ada0 ada1 pass0 pass1 796676 wire
KB/t 5.42 5.96 0.00 0.00 65484 act
tps 197 192 0 0 45332 inact
MB/s 1.04 1.12 0.00 0.00 cache
%busy 74 82 0 0 15071692 free
buf
This is taken during the early stages of a builworld. The cleandir
job steps are *crawling* along. Rattling the keyboard (USB or serial,
although an SSH sessions seems to work sometimes as well) gets the
buildworld doing some useful work again. Meanwhile, the apps load
(which is two instances of WSPR, an instance of baudline, KDE, and a
vncserver), which is soundcard I/O bound and does little to no disk
I/O) runs along perfectly happily.
The oldest kernel I have that shows the syndrome is -
FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498:
Sat May 11 00:03:15 MDT 2013
toor at aukward.bogons:/usr/obj/usr/src/sys/GENERIC amd64
H/W info -
hw.machine: amd64
hw.model: AMD Phenom(tm) II X4 965 Processor
hw.ncpu: 4
hw.physmem: 16883937280
hw.clockrate: 3411
kern.sched.name: ULE
ahci0: <ATI IXP700 AHCI SATA controller> port 0xa000-0xa007,0x9000-0x9003,\
0x8000-0x8007,0x7000-0x7003,0x6000-0x600f mem 0xfe6ffc00-0xfe6fffff \
irq 19 at device 17.0 on pci0
ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
[...]
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD1200JD-22HBC0 08.02D08> ATA-6 SATA 1.x device
ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
ada0: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
ada1: <WDC WD1200JD-22HBC0 08.02D08> ATA-6 SATA 1.x device
ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
ada1: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad8
I'm not paging, I don't have wild interrupt loads (checked with
'vmstat -i'), the ZFS pool is not in the middle of a scrub, but the
machine has bad trivial response and buildworld doesn't get finished.
I am seeing very similar behaviour on three other 9.1-current
machines, all of which are AHCI/SATA setups, using both Seagate and WD
disks (of random sizes and ages). All these boxes ran fine a month
ago.
BTW, when I do the rattle-keyboard-to-get-disks-going trick, the NFS
daemon reports that the system clock slews badly - machine time drops
behind wall clock time. Something is locking the clock update off.
(Hmmm, I see I'm running a pre-5000/feature flags ZFS pool, FWTW.
I'll run zpool upgrade, my bad.)
regards,
Ross
--
Ross Alexander, (780) 675-6823 / (780) 689-0749, rwa at athabascau.ca
"Always do right. This will gratify some people,
and astound the rest." -- Samuel Clemens
--
This communication is intended for the use of the recipient to whom it
is addressed, and may contain confidential, personal, and or privileged
information. Please contact us immediately if you are not the intended
recipient of this communication, and do not copy, distribute, or take
action relying on it. Any communications received in error, or
subsequent reply, should be deleted or destroyed.
---
More information about the freebsd-stable
mailing list