hwpmc: NMI cpuxx ... going to debugger
shreyank amartya
shreyankfbsd at gmail.com
Tue Sep 3 18:03:47 UTC 2019
Hi,
I'm running a profiler which utilizes hwpmc and am frequently running into
this situation: after a process context switch out I get an NMI. The NMI
handler is unable to service it due to the fact that the pmc hardware is
marked as free at the end of the process_csw_out() (hwpmc_mod.c) routine.
Here are some logs I've collected when this happens:
1750 231 49360676470980 MDP:CFG:1: cpu=231 ri=0 pm=0xfffff801e7b2ea00
1749 231 49360676470360 CSW:SWI:1: cpu=231 proc=0xfffff813e80ff000
(5930, java) pp=0xfffff81077179000
1748 249 49360676421260 CSW:SWO:3: cpu=249 ri=33
1748 249 49360676421260 CSW:SWO:3: cpu=249 ri=33
1747 249 49360676420320 MDP:SWO:1: pc=0xfffff8010a06c400
pp=0xfffff81077179000 enable-msr=0
1746 249 49360676419000 CSW:SWO:1: cpu=249 proc=0xfffff813e80ff000
(5930, java) pp=0xfffff81077179000
1745 231 49360675789440 MDP:INT:2: retval=0 isnull=16 ovrflw=0
1744 231 49360675789240 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0
1743 231 49360675788260 CSW:SWO:3: cpu=231 ri=33
1742 231 49360675788080 MDP:SWO:1: pc=0xfffff80106fe1400
pp=0xfffff81077179000 enable-msr=0
1741 231 49360675787880 MDP:CFG:1: cpu=231 ri=0 pm=0x0
1740 231 49360675787020 CSW:SWO:2: cpu=231 ri=17 val=-164 (samp)
1739 231 49360675786940 MDP:REA:2: amd-read (post-munge) id=0 -> -164
1738 231 49360675786880 MDP:REA:2: amd-read (pre-munge) id=0 -> 164
1737 231 49360675786720 MDP:REA:1: amd-read id=0 class=2
1736 231 49360675785180 MDP:STO:1: amd-stop ri=0
1735 231 49360675784160 CSW:SWO:1: cpu=231 proc=0xfffff813e80ff000
(5930, java) pp=0xfffff81077179000
1734 231 49360675617080 MDP:INT:2: retval=1 isnull=0 ovrflw=0
1733 231 49360675615240 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0
1732 231 49360675505160 MDP:SWI:1: pc=0xfffff80106fe1400
pp=0xfffff81077179000 enable-msr=0
1731 231 49360675503600 MDP:STA:2: amd-start config=0x530076
1730 231 49360675503060 MDP:STA:1: amd-start cpu=231 ri=0
1729 231 49360675500720 MDP:WRI:1: amd-write cpu=231 ri=0
v=fffffffffffd97d4
1728 231 49360675500240 CSW:SWI:1: cpu=231 ri=17 new=157740
1727 231 49360675498680 MDP:CFG:1: cpu=231 ri=0 pm=0xfffff801e7b2ea00
1726 231 49360675498060 CSW:SWI:1: cpu=231 proc=0xfffff813e80ff000
(5930, java) pp=0xfffff81077179000
1725 154 49360675363180 CSW:SWO:3: cpu=154 ri=33
1724 154 49360675362020 MDP:SWO:1: pc=0xfffff80106d59600
pp=0xfffff81077179000 enable-msr=0
1723 154 49360675360220 CSW:SWO:1: cpu=154 proc=0xfffff813e80ff000
(5930, java) pp=0xfffff81077179000
1722 231 49360675009200 CSW:SWO:3: cpu=231 ri=33
1721 231 49360675009020 MDP:SWO:1: pc=0xfffff80106fe1400
pp=0xfffff81077179000 enable-msr=0
1720 231 49360675008320 MDP:CFG:1: cpu=231 ri=0 pm=0x0
1719 231 49360675006220 CSW:SWO:2: cpu=231 ri=17 val=157740 (samp)
1718 231 49360675005700 MDP:REA:2: amd-read (post-munge) id=0 -> 157740
1717 231 49360675005240 MDP:REA:2: amd-read (pre-munge) id=0 ->
281474976552916
1716 231 49360675004140 MDP:REA:1: amd-read id=0 class=2
1715 231 49360675002660 MDP:STO:1: amd-stop ri=0
1714 231 49360675001460 CSW:SWO:1: cpu=231 proc=0xfffff813e80ff000
(5930, java) pp=0xfffff81077179000
1713 231 49360674941580 MDP:INT:2: retval=1 isnull=0 ovrflw=0
1712 231 49360674939500 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0
1711 231 49360674793500 MDP:SWI:1: pc=0xfffff80106fe1400
pp=0xfffff81077179000 enable-msr=0
1710 231 49360674792680 MDP:STA:2: amd-start config=0x530076
1709 220 49360674792340 CSW:SWO:3: cpu=220 ri=33
1708 220 49360674791060 MDP:SWO:1: pc=0xfffff80106f8a600
pp=0xfffff81077179000 enable-msr=0
1707 231 49360674790840 MDP:STA:1: amd-start cpu=231 ri=0
The stray NMI is entry 1744 & 1745 (isnull=16). In entry 1741 when the
context switch out occurs, the hardware is marked free (pm=0x0). The NMI
handler is unable to find the hardware and ends up returning 0. Entry 1740
reports the val=-164, I'm not sure about the significance but the value
always seems to be negative when this problem occurs.
Once i get a stray NMI, I do not get any further interrupts which is
results in loss of samples in the generated pmclog.
I'm using hwpmc from the most recent stable/12 branch which is same as
current (with a stable/12 kernel). I'm running it on a java application
with a single event in sampling mode.
Any idea on how to resolve/avoid this problem?
Any pointers or suggestion are appreciated.
Thanks
Shreyank
More information about the freebsd-hackers
mailing list