slowdown of zfs (tx->tx)

Ronald Klop ronald-freebsd8 at klop.yi.org
Wed Jan 9 18:36:14 UTC 2013


On Wed, 09 Jan 2013 17:26:13 +0100, Nicolas Rachinsky  
<fbsd-mas-0 at ml.turing-complete.org> wrote:

> * Artem Belevich <art at freebsd.org> [2013-01-08 12:47 -0800]:
>> On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky
>> <fbsd-mas-0 at ml.turing-complete.org> wrote:
>> >       NAME                      STATE     READ WRITE CKSUM
>> >         pool1                     DEGRADED     0     0     0
>> >           raidz2-0                DEGRADED     0     0     0
>> >             ada5                  ONLINE       0     0     0
>> >             ada8                  ONLINE       0     0     0
>> >             ada2                  ONLINE       0     0     0
>> >             ada3                  ONLINE       0     0     0
>> >             11846390416703086268  UNAVAIL      0     0     0  was  
>> /dev/dsk/ada1
>> >             ada6                  ONLINE       0     0     0
>> >             ada0                  ONLINE       0     0     1
>> >             ada7                  ONLINE       0     0     0
>> >             ada4                  ONLINE       0     0     3
>>
>> You seem to have some checksum errors which does suggest hardware  
>> troubles.
>
> I somehow missed these. Is there any way to learn when these checksum
> errors happen?
>
>> For starters, check smart info for all drives and see if they have any
>> relocated sectors.
>
> There are some disks with relocated sectors, but for both ada0 and
> ada4 Reallocated_Sector_Ct is 0.
>
>> Use gstat during your workload to see if any of the drives takes much
>> longer than others to handle its job.
>
> There is one disk sticking out a bit.
>
>> > There is almost no disk activity during this time.
>>
>> What kind of disk activity *is* there?
>
> What would be interesting?
>
>
>> > sync is disabled for the whole pool.
>>
>> If that's the case (assyming you're talking about sync=disabled zfs
>> property), then synchronous writes are probably not the cause of
>> slowdown. My guess would be either failing HDD or something funky with
>> cabling or sata controller.
>
> Yes, sync=disabled for pool1.
>
>
> Ok, I will start swapping hardware (sadly the machine is quite a drive
> away).
>
> Thank you very much for your help.
>
> Nicolas


If you are driving anyway replace this one:

>> >             11846390416703086268  UNAVAIL      0     0     0  was  
>> /dev/dsk/ada1

If the pool is healthy checksum errors will be noticed earlier by the  
sysadmin.

Ronald.


More information about the freebsd-fs mailing list