FreeBSD 6.3 deadlock (vm_map?) with DDB output
James Gritton
jamie at gritton.org
Mon Jun 23 19:16:47 UTC 2008
John Baldwin wrote:
> On Thursday 19 June 2008 11:57:51 am James Gritton wrote:
>
>> John Baldwin wrote:
>>
>>> On Sunday 15 June 2008 07:23:19 am Stef Walter wrote:
>>>
>>>
>>>> I've been trying to track down a deadlock on some newish production
>>>> servers running FreeBSD 6.3-RELEASE-p2. The deadlock occurs on a
>>>> specific (although mundane) hardware configuration, and each of several
>>>> servers running this hardware deadlock about once per week.
>>>>
>>>> Although I suspect that this is not hardware related, from a (naive)
>>>> perusal of the attached stack traces.
>>>>
>>>> Forgive me if my interpretation of this is all wrong, but I'm pretty
>>>> desperate for help. So here's my basic understanding of the deadlock:
>>>>
>>>> These processes seem to be waiting on the page queue mutex:
>>>> sendmail (in vm_mmap > vm_map_find > vm_map_insert > vm_map_pmap_enter)
>>>> bsnmpd (in malloc, uma_large_malloc > page_alloc > kmem_malloc)
>>>> httpd (in trap > trap_pfault > vm_fault)
>>>> [g_up] (in g_vfs_done > bufdone)
>>>>
>>>> The page queue mutex is held by rsync process:
>>>> rsync (in trap > trap_pfault > vm_fault > pmap_enter)
>>>>
>>>> Rsync kernel process (in pmap_enter) was interrupted while holding the
>>>> page queue lock?
>>>>
>>>>
>>>> Giant is enabled in loader.conf due to the needs of the pf firewall when
>>>> dealing with user credentials lookups. I do not believe that Giant plays
>>>> into this deadlock. Kernel config attached.
>>>>
>>>> Any and all help or info is welcome. Thanks in advance.
>>>>
>>>>
>>> Try this change:
>>>
>>> jhb 2007-10-27 22:07:40 UTC
>>>
>>> FreeBSD src repository
>>>
>>> Modified files:
>>> sys/kern sched_4bsd.c
>>> Log:
>>> Change the roundrobin implementation in the 4BSD scheduler to trigger a
>>> userland preemption directly from hardclock() via sched_clock() when a
>>> thread uses up a full quantum instead of using a periodic timeout to
>>>
> cause
>
>>> a userland preemption every so often. This fixes a potential deadlock
>>> when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held
>>> by a thread pinned or bound to another CPU. The current thread on that
>>> CPU will never be preempted while softclock is blocked.
>>>
>>> Note that ULE already drives its round-robin userland preemption from
>>> sched_clock() as well and always enables IPI_PREEMPT.
>>>
>>> MFC after: 1 week
>>>
>>> Revision Changes Path
>>> 1.108 +8 -29 src/sys/kern/sched_4bsd.c
>>>
>>> We use it at work on 6.x. W/o this fix, round-robin stops working on 4BSD
>>> when softclock() (swi4: clock) blocks on a lock like Giant.
>>>
>>>
>> I've been seeing similar troubles on 6.2 and I'll have to give this a
>> try as we upgrade to 6.3. I notice "MFC after: 1 week" in the log; it's
>> been a week - any chance of seeing this fix rolled into 6.x?
>>
>
> If people confirm it fixes issues I will MFC it. There was some pushback when
> I first committed it so I waited on the MFC.
I can confirm that on 6.3 I can recreate the deadlock without the patch,
and can't recreate it with the patch.
More information about the freebsd-stable
mailing list