Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

Dennis Kögel dk at neveragain.de
Wed Jun 19 19:44:42 UTC 2013


Am 19.06.2013 um 17:16 schrieb Jeremy Chadwick <jdc at koitsu.org>:
> Which model of the ARC1320 are you using (there are 2).

It has four internal connectors, so it should be the ARC-1320ix-16.

No port multipliers.

>>> Also when you see hangs can you access the disk directly or not
>>> e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ?
>> 
>> Interesting idea. The dd then hangs right until everything else resumes as well.
>> 
>> ^T during hang says: load: 12.39  cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% 1632k
> 
> Is this ***while** you have immense amounts of ZFS write I/O going to
> those drives (your zpool iostat was showing ~250-300MB/sec to the pool)?
> [...]

It's important to note that the interrupt spikes (and the I/O hangs) happen just as frequently on an idle system.
Having a bunch of dd processes writiing + iostat just visualizes it better.

So, with or without actual write load: dd with if=/dev/daX (arcsas device) hangs when the interrupt counters for uhci0 soar for these ~10 seconds phases, as shown above.

Noteworthy: dd'ing from if=/dev/ada1 (onboard controller) during such a hang phase returns immediately, i.e. works fine. (ada1 is part of ZFS -- the other 'zroot' pool -- but is not an arcsas device, so a driver issue sounds more likely).

> Can you please try putting this in /boot/loader.conf + reboot and
> see if the behaviour for you changes?
> 
> vfs.zfs.no_write_throttle="1"

This produces quite interesting burst numbers, but does not affect the problem behaviour at all.

Am 19.06.2013 um 17:10 schrieb Steven Hartland <killing at multiplay.co.uk>:
> You might want to try adding a seperate disk (different type)
> to the controller which isn't used and perform the same test to
> try and eliminate disk's as the source of the issue.

That's currently not an option, as the zpool already contains data; but I tried against a disk on another controller, see above.

> Also see what "gstat -d" shows during this? Do you see a big spike
> of activity either side?

The picture is pretty much the same as with zpool iostat: Healthy values, all disks from 70-100% busy; during a hang phase, every column just drops to zero -- except for L(q), which remains frozen at some low value for the duration of the hang (e.g. 4 or 10).
Sample outputs here: http://pub.neveragain.de/arcsas/gstat.txt

Thanks,
D.


More information about the freebsd-stable mailing list