xorg-dev + intel driver + KMS

Fri Sep 9 22:10:59 UTC 2011

Hi, Konstantin,

On 09.09.2011 02:38, Kostik Belousov wrote:
> If you are not interested in the story, just try 9.1 patch.
 >
> If you are, please stay with me. Apparently, your pagedaemon is sleeping
> in 915unm state, that made me very much worrying. I did not understand
> how this could happen, because I thought that this is caused by
> pagedaemon dropping the last reference on the gem object device pager.
> And pagedaemon must not see pages belonging to device pagers, the pages
> must not appear on any queue.
>
> I added assertions to make sure to get the panic if a fictitious page
> is found on queues, which did not fired. But, I was able to reproduce
> the situation with pagedaemon hang, by running gem_stress and performing
> active swapping in parallel. I forgot that I finally implemented the low
> memory handler for gem, which is called from pagedaemon and which also
> does purging on the gem buffers.
>
> After that, it was relatively easy to track the issue. See the comment
> at the beginning of i915_gem_pager_fault() about interaction with
> i915_gem_release_mmap() which describes the cause of the hang:
>
> 	/*
> 	 * Remove the placeholder page inserted by vm_fault() from the
> 	 * object before dropping the object lock. If
> 	 * i915_gem_release_mmap() is active in parallel on this gem
> 	 * object, then it owns the drm device sx and might find the
> 	 * placeholder already. Then, since the page is busy,
> 	 * i915_gem_release_mmap() sleeps waiting for the busy state
> 	 * of the page cleared. We will be not able to acquire drm
> 	 * device lock until i915_gem_release_mmap() is able to make a
> 	 * progress.
> 	 */
>
> For me, the patched driver survived while doing 'sort /dev/zero' and
> gem_stress in parallel.

great!
I confirm that with all.9.1.patch system remains stable even under high 
memory pressure.

I tried your test (thanks, it is actually exactly what I had been 
looking for quite a long time: i.e. exact STR of the issue). Running 
"gem_stress" and "sort /dev/zero" in parallel turned my system into 
unusable state within less then 10 seconds. Repeated test 3 times in a 
row. The outcome was the same in all cases: X server hanged (reset was 
the only way out to get machine operational).

After applying all.9.1.patch I ran the same test again and system 
remained stable and even pretty responsive. Both (gem_stress and sort 
/dev/zero) were running for a while and after a couple of minutes sort 
process was killed by system (with "out of swap space" error).

Will keep an eye on it so should I notice more issues will let you know.
Thanks! I really appreciate it!

--
WBR,
Andrey Kosachenko