file system deadlock - the whole story?

Scott Long scottl at samsco.org
Wed Jul 19 15:27:25 UTC 2006


User Freebsd wrote:
> On Wed, 19 Jul 2006, Robert Watson wrote:
> 
>> On Wed, 19 Jul 2006, User Freebsd wrote:
>>
>>>> Yes, this was going to be my next question -- if you're seeing 
>>>> wedges under load and there's a common controller in use, maybe 
>>>> we're looking at a driver bug.  Bugs of those sort typically look a 
>>>> lot like what you describe: an I/O is "lost" and so eveything that 
>>>> depends on the I/O wedges waiting for it, leading to a lot of 
>>>> processes hanging around waiting for vnode locks, etc.
>>>
>>>
>>> 'k, but how do we debug *that*? :( If it was one, I'd suspect 
>>> hardware ... but *three*, and only acting up *after* upgrading to 
>>> FreeBSD 6.x, and only acting up under load ...
>>
>>
>> There are two normal approaches:
>>
>> (1) Switch controllers and see if the problem goes away, then blame the
>>    controller that was replaced. :-)
>>
>> (2) Debug the driver when the system is in the wedged state.  When 
>> Scott Long
>>    helped me out with an identical problem with the 3ware driver a few 
>> years
>>    ago, he basically added debugging output for the driver in the 
>> debugger to
>>    list the state of outstanding I/Os, count the number of in-bound,
>>    out-bound I/Os, etc, to try and find where the missing one was 
>> leaked. My
>>    impression is that once he had confirmed the presence of the 
>> problem, it
>>    was fairly easy to fix, but that confirming it required quite a bit of
>>    paperwork.
> 
> 
> 'k, first question is with the core file provide any insight into this? 
> ie. provide further confirmation that it looks like the driver vs file 
> system?
> 
> second question, who is currently maintaining the iir driver?  I've CC'd 
> Achim in this, as he's listed in the man page as being the maintainer ...
> 
> Now, uranus has all the various kernel debugging enabled right now, and 
> a serial console, so we're good for the debugging side of things ... and 
> I believe that I can fairly easily "recreate" the issue by just moving a 
> whack of vServers onto that machine to give it the load that seems to 
> kill it ... *and* uranus is one of my newer machines, so the card that 
> is in it is fairly new ... but, since I have a full BIOS serial console 
> working on it, I should be able to get full model # and firmware 
> version, which I take it will help some?
> 

What exact version of FreeBSD are you dealing with?

Scott



More information about the freebsd-stable mailing list