Non-responsive 8.0-RC1

Peter Jeremy peterjeremy at acm.org
Mon Nov 30 08:13:45 UTC 2009


On 2009-Nov-29 08:56:55 +0100, Thomas Backman <serenity at exscape.org> wrote:
>
>On Nov 28, 2009, at 10:22 PM, Peter Jeremy wrote:
>
>> My main server is running 8.0/amd64 from between RC1 and RC2 and I've
>> recently had a couple of long-duration hangs on it during which time
>> processes doing I/O will stop responding.

I forgot to mention that I checked SMART state on the disks and also
did a 'zpool scrub' after the first occurrence - no problems showed up.

It actually "hung" again just after I sent the original mail.  This
time I managed to get console access and could check the kernel state.
This showed that a number of processes were blocked on ZFS locks.
The most commonly reported state was 'tx->tx_quiesce_done_cv)'.

It had been up for about 30 days before I noticed any problems and
seems to have been getting more obvious so it is also possible that
it's related to uptime - either a resource leak somewhere (though
there was nothing obvious) or memory fragmentation.

>Hmm, I know there was some fix to the scheduler re: thread priority,
>and it wouldn't surprise me if it was after your revision.

After looking around in the kernel, I'm now confident that it's not
a priority-inversion issue as the BOINC processes all appeared to be
running normally and not holding locks.

>My advice would be to upgrade to -RELEASE if possible. If not, at
>least check whether your build should be affected.

I have updated to a recent 8-stable and will see what happens.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20091130/9fd668d8/attachment.pgp


More information about the freebsd-current mailing list