Memory allocation performance
Max Laier
max at love2party.net
Sat Feb 2 15:07:29 PST 2008
Am Sa, 2.02.2008, 23:05, schrieb Alexander Motin:
> Robert Watson wrote:
>> Hence my request for drilling down a bit on profiling -- the question
>> I'm asking is whether profiling shows things running or taking time that
>> shouldn't be.
>
> I have not yet understood why does it happend, but hwpmc shows huge
> amount of "p4-resource-stall"s in UMA functions:
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 45.2 2303.00 2303.00 0 100.00% uma_zfree_arg [1]
> 41.2 4402.00 2099.00 0 100.00% uma_zalloc_arg [2]
> 1.4 4472.00 70.00 0 100.00%
> uma_zone_exhausted_nolock [3]
> 0.9 4520.00 48.00 0 100.00% ng_snd_item [4]
> 0.8 4562.00 42.00 0 100.00% __qdivrem [5]
> 0.8 4603.00 41.00 0 100.00% ether_input [6]
> 0.6 4633.00 30.00 0 100.00% ng_ppp_prepend [7]
>
> Probably it explains why "p4-global-power-events" shows many hits into
> them
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 20.0 37984.00 37984.00 0 100.00% uma_zfree_arg [1]
> 17.8 71818.00 33834.00 0 100.00% uma_zalloc_arg [2]
> 4.0 79483.00 7665.00 0 100.00% ng_snd_item [3]
> 3.0 85256.00 5773.00 0 100.00% __mcount [4]
> 2.3 89677.00 4421.00 0 100.00% bcmp [5]
> 2.2 93853.00 4176.00 0 100.00% generic_bcopy [6]
>
> , while "p4-instr-retired" does not.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 11.1 5351.00 5351.00 0 100.00% ng_apply_item [1]
> 7.9 9178.00 3827.00 0 100.00%
> legacy_pcib_alloc_msi [2]
> 4.1 11182.00 2004.00 0 100.00% init386 [3]
> 4.0 13108.00 1926.00 0 100.00% rn_match [4]
> 3.5 14811.00 1703.00 0 100.00% uma_zalloc_arg [5]
> 2.6 16046.00 1235.00 0 100.00% SHA256_Transform
> [6]
> 2.2 17130.00 1084.00 0 100.00% ng_add_hook [7]
> 2.0 18111.00 981.00 0 100.00% ng_rmhook_self [8]
> 2.0 19054.00 943.00 0 100.00% em_encap [9]
>
> For this moment I have invent two possible explanation. One is that due
> to UMA's cyclic block allocation order it does not fits CPU caches and
> another that it is somehow related to critical_exit(), which possibly
> can cause context switch. Does anybody have better explanation how such
> small and simple in this part function can cause such results?
I didn't see bzero accounted for in any of the traces in this thread -
makes me wonder if that might mean that it's counted within uma_zalloc?
Maybe we are calling it twice by accident? I wasn't quite able to figure
out the logic of M_ZERO vs. UMA_ZONE_MALLOC etc. ... just a crazy idea.
--
/"\ Best regards, | mlaier at freebsd.org
\ / Max Laier | ICQ #67774661
X http://pf4freebsd.love2party.net/ | mlaier at EFnet
/ \ ASCII Ribbon Campaign | Against HTML Mail and News
More information about the freebsd-performance
mailing list