file system deadlock - the whole story?
Scott Long
scottl at samsco.org
Wed Jul 19 15:27:25 UTC 2006
User Freebsd wrote:
> On Wed, 19 Jul 2006, Robert Watson wrote:
>
>> On Wed, 19 Jul 2006, User Freebsd wrote:
>>
>>>> Yes, this was going to be my next question -- if you're seeing
>>>> wedges under load and there's a common controller in use, maybe
>>>> we're looking at a driver bug. Bugs of those sort typically look a
>>>> lot like what you describe: an I/O is "lost" and so eveything that
>>>> depends on the I/O wedges waiting for it, leading to a lot of
>>>> processes hanging around waiting for vnode locks, etc.
>>>
>>>
>>> 'k, but how do we debug *that*? :( If it was one, I'd suspect
>>> hardware ... but *three*, and only acting up *after* upgrading to
>>> FreeBSD 6.x, and only acting up under load ...
>>
>>
>> There are two normal approaches:
>>
>> (1) Switch controllers and see if the problem goes away, then blame the
>> controller that was replaced. :-)
>>
>> (2) Debug the driver when the system is in the wedged state. When
>> Scott Long
>> helped me out with an identical problem with the 3ware driver a few
>> years
>> ago, he basically added debugging output for the driver in the
>> debugger to
>> list the state of outstanding I/Os, count the number of in-bound,
>> out-bound I/Os, etc, to try and find where the missing one was
>> leaked. My
>> impression is that once he had confirmed the presence of the
>> problem, it
>> was fairly easy to fix, but that confirming it required quite a bit of
>> paperwork.
>
>
> 'k, first question is with the core file provide any insight into this?
> ie. provide further confirmation that it looks like the driver vs file
> system?
>
> second question, who is currently maintaining the iir driver? I've CC'd
> Achim in this, as he's listed in the man page as being the maintainer ...
>
> Now, uranus has all the various kernel debugging enabled right now, and
> a serial console, so we're good for the debugging side of things ... and
> I believe that I can fairly easily "recreate" the issue by just moving a
> whack of vServers onto that machine to give it the load that seems to
> kill it ... *and* uranus is one of my newer machines, so the card that
> is in it is fairly new ... but, since I have a full BIOS serial console
> working on it, I should be able to get full model # and firmware
> version, which I take it will help some?
>
What exact version of FreeBSD are you dealing with?
Scott
More information about the freebsd-stable
mailing list