VirtualBox 4.2.4 on FreeBSD 9.1-PRERELEASE problem: VMs behave very different when pinned to different cores

Fri Nov 23 22:18:00 UTC 2012

On Fri, Nov 23, 2012 at 9:06 PM, Andriy Gapon <avg at freebsd.org> wrote:
>
> I've cc-ed Alexander who is deeply familiar with both the scheduler and the timer
> code.
> I think that it would be nice to get ktr(4) information suitable for use with
> schedgraph (please google for these keywords).

I collected two samples and put them here: http://1888.spb.ru/samples.zip
sched-cpu0.ktr is for a VM running on CPU #0 and sched-cpu1.ktr is for
a VM running on CPU #1
They seem to be very different.

>  Also, version of your kernel,

kern.version: FreeBSD 9.1-PRERELEASE #4: Fri Nov 23 22:38:47 MSK 2012
Sources were grabbed on Nov, 16.

> output of sysctls kern.eventtimer and kern.sched.

kern.eventtimer.choice: LAPIC(600) HPET(550) HPET1(440) HPET2(440)
i8254(100) RTC(0)
kern.eventtimer.et.LAPIC.flags: 7
kern.eventtimer.et.LAPIC.frequency: 50002806
kern.eventtimer.et.LAPIC.quality: 600
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.HPET.flags: 7
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.quality: 550
kern.eventtimer.et.HPET1.flags: 3
kern.eventtimer.et.HPET1.frequency: 14318180
kern.eventtimer.et.HPET1.quality: 440
kern.eventtimer.et.HPET2.flags: 3
kern.eventtimer.et.HPET2.frequency: 14318180
kern.eventtimer.et.HPET2.quality: 440
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.activetick: 1
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2
kern.sched.cpusetsize: 8
kern.sched.preemption: 1
kern.sched.topology_spec: <groups>
kern.sched.steal_thresh: 2
kern.sched.steal_idle: 1
kern.sched.balance_interval: 127
kern.sched.balance: 1
kern.sched.affinity: 1
kern.sched.idlespinthresh: 16
kern.sched.idlespins: 10000
kern.sched.static_boost: 152
kern.sched.preempt_thresh: 80
kern.sched.interact: 30
kern.sched.slice: 12
kern.sched.quantum: 94488
kern.sched.name: ULE

I tried kern.eventtimer.periodic=1 and
kern.timecounter.hardware=ACPI-fast but that did not help.

> BTW, do you use the default ULE scheduler?

Yep.
I tried SCHED_4BSD and the situation became much better but not ideal.
%si was around 3-7% on the guest and I had to boot with noacpi and
disable the tickless kernel on the guest to lower it.
At least I was able to run a VM on CPU #0 and all cores became equal.

> Also, is your kernel DTrace enabled?

Yep.

Thank you!

--
SY,
Alex

>
> on 23/11/2012 17:52 Alex Chistyakov said the following:
>> On Fri, Nov 23, 2012 at 6:20 PM, Bernhard Fröhlich <decke at freebsd.org> wrote:
>>> On Fri, Nov 23, 2012 at 2:15 PM, Alex Chistyakov <alexclear at gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I am back with another problem. As I discovered previously setting a
>>>> CPU affinity explicitly helps to get decent performance on guests, but
>>>> the problem is that guest performance is very different on core #0 and
>>>> cores #5 or #7. Basically when I use 'cpuset -l 0 VBoxHeadless -s
>>>> "Name" -v on' to start the VM it is barely usable at all. The best
>>>> performance results are on cores #4 and #5 (I believe they are the
>>>> same physical core due to HT). #7 and #8 are twice as slow as #5, #0
>>>> and #1 are the slowest and other cores lay in the middle.
>>>> If I disable a tickless kernel on the guest running on #4 or #5 it
>>>> becomes as slow as a guest running on #7 so I suspect this is a
>>>> timer-related issue.
>>>> I also discovered that there are quite a lot of system interrupts on
>>>> slow guests (%si is about 10-15) but Munin does not render them on its
>>>> CPU graphs for some reason.
>>>> All my VMs are on cores #4 and #5 right now but I want to utilize
>>>> other cores too. I am not sure what to do next, this looks like a
>>>> VirtualBox bug. What can be done to solve this?
>>>
>>> I do not want to sound ignorant but what do you expect? Each VBox
>>> VM consists of somewhere around 15 threads and some of them are the
>>> vCPUs. You bind them all to the same CPU so they will fight for CPU time
>>> on that single core and latency will get unpredictable as well as
>>> performance. And then you add more and more craziness by running
>>> it on cpu0 and a HT enabled CPU ...
>>
>> Your point regarding HTT is perfectly valid so I just disabled it in
>> BIOS. Unfortunately it did not help.
>> When I run a single VM on CPU #0 I get the following load pattern on the host:
>>
>> last pid:  2744;  load averages:  0.93,  0.63,  0.31
>>
>>               up 0+00:05:25  19:37:17
>> 368 processes: 8 running, 344 sleeping, 16 waiting
>> CPU 0: 14.7% user,  0.0% nice, 85.3% system,  0.0% interrupt,  0.0% idle
>> CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> CPU 4:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> CPU 5:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> Mem: 410M Active, 21M Inact, 921M Wired, 72K Cache, 60G Free
>> ARC: 136M Total, 58M MRU, 67M MFU, 272K Anon, 2029K Header, 8958K Other
>> Swap: 20G Total, 20G Free
>>
>> And when I run it on CPU #4 the situation is completely different:
>>
>> last pid:  2787;  load averages:  0.05,  0.37,  0.31
>>
>>               up 0+00:11:45  19:43:37
>> 368 processes: 9 running, 343 sleeping, 16 waiting
>> CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> CPU 4:  1.8% user,  0.0% nice, 11.0% system,  0.0% interrupt, 87.2% idle
>> CPU 5:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> Mem: 412M Active, 20M Inact, 1337M Wired, 72K Cache, 60G Free
>> ARC: 319M Total, 136M MRU, 171M MFU, 272K Anon, 2524K Header, 9340K Other
>> Swap: 20G Total, 20G Free
>>
>> Regarding pinning the VM to a certain core - yes, I agree with you,
>> it's better not to pin VMs explicitly but I was forced to do this. If
>> I do not pin the VM explicitly it gets scheduled to the "bad" core
>> sooner or later and the whole VM gets unresponsive. And I was able to
>> run as many as 6 VMs on HTT cores #4/#5 quite successfully. These VMs
>> were staging machines without too much load on them but I wanted to
>> put some production resources on this host too - that's why I wanted
>> to know how to utilize other cores safely.
>
>
> --
> Andriy Gapon