Memory allocation performance/statistics patches
Robert Watson
rwatson at FreeBSD.org
Sun Apr 17 07:30:56 PDT 2005
Attached please find three patches:
(1) uma.diff, which modifies the UMA slab allocator to use critical
sections instead of mutexes to protect per-CPU caches.
(2) malloc.diff, which modifies the malloc memory allocator to use
critical sections and per-CPU data instead of mutexes to store
per-malloc-type statistics, coalescing for the purposes of the sysctl
used to generate vmstat -m output.
(3) mbuf.diff, which modifies the mbuf allocator to use per-CPU data and
critical sections for statistics, instead of synchronization-free
statistics which could result in substantial inconsistency on SMP
systems.
These changes are facilitated by John Baldwin's recent re-introduction of
critical section optimizations that permit critical sections to be
implemented "in software", rather than using the hardware interrupt
disable mechanism, which is quite expensive on modern processors
(especially Xeon P4 CPUs). While not identical, this is similar to the
softspl behavior in 4.x, and Linux's preemption disable mechanisms (and
various other post-Vax systems :-)).
The reason this is interesting is that it allows synchronization of
per-CPU data to be performed at a much lower cost than previously, and
consistently across UP and SMP systems. Prior to these changes, the use
of critical sections and per-CPU data as an alternative to mutexes would
lead to an improvement on SMP, but not on UP. So, that said, here's what
I'd like us to look at:
- Patches (1) and (2) are intended to improve performance by reducing the
overhead of maintaining cache consistency and statistics for UMA and
malloc(9), and may universally impact performance (in a small way) due
to the breadth of their use through the kernel.
- Patch (3) is intended to restore consistency to statistics in the
presence of SMP and preemption, at the possible cost of some
performance.
I'd like to confirm that for the first two patches, for interesting
workloads, performance generally improves, and that stability doesn't
degrade. For the third partch, I'd like to quantify the cost of the
changes for interesting workloads, and likewise confirm no loss of
stability.
Because these will have a relatively small impact, a fair amount of
caution is required in testing. We may be talking about a percent or two,
maybe four, difference in benchmark performance, and many benchmarks have
a higher variance than that.
A couple of observations for those interested:
- The INVARIANTS panic with UMA seen in some earlier patch versions is
believed to be corrected.
- Right now, because I use arrays of foo[MAXCPUS], I'm concerned that
different CPUs will be writing to the same cache line as they're
adjacent in memory. Moving to per-CPU chunks of memory to hold this
stuff is desirable, but I think first we need to identify a model by
which to do that cleanly. I'm not currently enamored of the 'struct
pcpu' model, since it makes us very sensitive to ABI changes, as well as
not offering a model by which modules can register new per-cpu data
cleanly. I'm also inconsistent about how I dereference into the arrays,
and intend to move to using 'curcpu' throughout.
- Because mutexes are no longer used in UMA, and not for the others
either, stats read across different CPUs that are coalesced may be
slightly inconsistent. I'm not all that concerned about it, but it's
worth thinking on.
- Malloc stats for realloc() are still broken if you apply this patch.
- High watermarks are no longer maintained for malloc since they require a
global notion of "high" that is tracked continuously (i.e., at each
change), and there's no longer a global view except when the observer
kicks in (sysctl). You can imagine various models to restore some
notion of a high watermark, but I'm not currently sure which is the
best. The high watermark notion is desirable though.
So this is a request for:
(1) Stability testing of these patches. Put them on a machine, make them
hurt. If things go South, try applying the patches one by one until
it's clear which is the source.
(2) Performance testing of these patches. Subject to the challenges in
testing them. If you are interested, please test each patch
separately to evaluate its impact on your system. Then apply all
together and see how it evens out. You may find that the mbuf
allocator patch outweighs the benefits of the other two patches, if
so, that is interesting and something to work on!
I've done some micro-benchmarking using tools like netblast,
syscall_timing, etc, but I'm interested particularly in the impact on
macrobenchmarks.
Thanks!
Robert N M Watson
-------------- next part --------------
--- //depot/vendor/freebsd/src/sys/vm/uma_core.c 2005/02/24 06:30:36
+++ //depot/user/rwatson/percpu/sys/vm/uma_core.c 2005/04/06 10:33:02
@@ -1,4 +1,5 @@
/*-
+ * Copyright (c) 2004-2005 Robert N. M. Watson
* Copyright (c) 2004, 2005,
* Bosko Milekic <bmilekic at FreeBSD.org>. All rights reserved.
* Copyright (c) 2002, 2003, 2004, 2005,
@@ -119,9 +120,6 @@
/* This mutex protects the keg list */
static struct mtx uma_mtx;
-/* These are the pcpu cache locks */
-static struct mtx uma_pcpu_mtx[MAXCPU];
-
/* Linked list of boot time pages */
static LIST_HEAD(,uma_slab) uma_boot_pages =
LIST_HEAD_INITIALIZER(&uma_boot_pages);
@@ -384,48 +382,19 @@
zone_timeout(uma_zone_t zone)
{
uma_keg_t keg;
- uma_cache_t cache;
u_int64_t alloc;
- int cpu;
keg = zone->uz_keg;
alloc = 0;
/*
- * Aggregate per cpu cache statistics back to the zone.
- *
- * XXX This should be done in the sysctl handler.
- *
- * I may rewrite this to set a flag in the per cpu cache instead of
- * locking. If the flag is not cleared on the next round I will have
- * to lock and do it here instead so that the statistics don't get too
- * far out of sync.
- */
- if (!(keg->uk_flags & UMA_ZFLAG_INTERNAL)) {
- for (cpu = 0; cpu <= mp_maxid; cpu++) {
- if (CPU_ABSENT(cpu))
- continue;
- CPU_LOCK(cpu);
- cache = &zone->uz_cpu[cpu];
- /* Add them up, and reset */
- alloc += cache->uc_allocs;
- cache->uc_allocs = 0;
- CPU_UNLOCK(cpu);
- }
- }
-
- /* Now push these stats back into the zone.. */
- ZONE_LOCK(zone);
- zone->uz_allocs += alloc;
-
- /*
* Expand the zone hash table.
*
* This is done if the number of slabs is larger than the hash size.
* What I'm trying to do here is completely reduce collisions. This
* may be a little aggressive. Should I allow for two collisions max?
*/
-
+ ZONE_LOCK(zone);
if (keg->uk_flags & UMA_ZONE_HASH &&
keg->uk_pages / keg->uk_ppera >= keg->uk_hash.uh_hashsize) {
struct uma_hash newhash;
@@ -613,6 +582,10 @@
/*
* Drains the per cpu caches for a zone.
*
+ * NOTE: This may only be called while the zone is being turn down, and not
+ * during normal operation. This is necessary in order that we do not have
+ * to migrate CPUs to drain the per-CPU caches.
+ *
* Arguments:
* zone The zone to drain, must be unlocked.
*
@@ -626,12 +599,20 @@
int cpu;
/*
- * We have to lock each cpu cache before locking the zone
+ * XXX: It is safe to not lock the per-CPU caches, because we're
+ * tearing down the zone anyway. I.e., there will be no further use
+ * of the caches at this point.
+ *
+ * XXX: It would good to be able to assert that the zone is being
+ * torn down to prevent improper use of cache_drain().
+ *
+ * XXX: We lock the zone before passing into bucket_cache_drain() as
+ * it is used elsewhere. Should the tear-down path be made special
+ * there in some form?
*/
for (cpu = 0; cpu <= mp_maxid; cpu++) {
if (CPU_ABSENT(cpu))
continue;
- CPU_LOCK(cpu);
cache = &zone->uz_cpu[cpu];
bucket_drain(zone, cache->uc_allocbucket);
bucket_drain(zone, cache->uc_freebucket);
@@ -644,11 +625,6 @@
ZONE_LOCK(zone);
bucket_cache_drain(zone);
ZONE_UNLOCK(zone);
- for (cpu = 0; cpu <= mp_maxid; cpu++) {
- if (CPU_ABSENT(cpu))
- continue;
- CPU_UNLOCK(cpu);
- }
}
/*
@@ -828,7 +804,8 @@
&flags, wait);
if (mem == NULL) {
if (keg->uk_flags & UMA_ZONE_OFFPAGE)
- uma_zfree_internal(keg->uk_slabzone, slab, NULL, 0);
+ uma_zfree_internal(keg->uk_slabzone, slab, NULL,
+ SKIP_NONE);
ZONE_LOCK(zone);
return (NULL);
}
@@ -1643,10 +1620,6 @@
#ifdef UMA_DEBUG
printf("Initializing pcpu cache locks.\n");
#endif
- /* Initialize the pcpu cache lock set once and for all */
- for (i = 0; i <= mp_maxid; i++)
- CPU_LOCK_INIT(i);
-
#ifdef UMA_DEBUG
printf("Creating slab and hash zones.\n");
#endif
@@ -1793,6 +1766,9 @@
uma_cache_t cache;
uma_bucket_t bucket;
int cpu;
+#ifdef INVARIANTS
+ int count;
+#endif
int badness;
/* This is the fast path allocation */
@@ -1827,12 +1803,33 @@
}
}
+ /*
+ * If possible, allocate from the per-CPU cache. There are two
+ * requirements for safe access to the per-CPU cache: (1) the thread
+ * accessing the cache must not be preempted or yield during access,
+ * and (2) the thread must not migrate CPUs without switching which
+ * cache it accesses. We rely on a critical section to prevent
+ * preemption and migration. We release the critical section in
+ * order to acquire the zone mutex if we are unable to allocate from
+ * the current cache; when we re-acquire the critical section, we
+ * must detect and handle migration if it has occurred.
+ */
+#ifdef INVARIANTS
+ count = 0;
+#endif
zalloc_restart:
+ critical_enter();
cpu = PCPU_GET(cpuid);
- CPU_LOCK(cpu);
cache = &zone->uz_cpu[cpu];
zalloc_start:
+#ifdef INVARIANTS
+ count++;
+ KASSERT(count < 10, ("uma_zalloc_arg: count == 10"));
+#endif
+#if 0
+ critical_assert();
+#endif
bucket = cache->uc_allocbucket;
if (bucket) {
@@ -1845,12 +1842,12 @@
KASSERT(item != NULL,
("uma_zalloc: Bucket pointer mangled."));
cache->uc_allocs++;
+ critical_exit();
#ifdef INVARIANTS
ZONE_LOCK(zone);
uma_dbg_alloc(zone, NULL, item);
ZONE_UNLOCK(zone);
#endif
- CPU_UNLOCK(cpu);
if (zone->uz_ctor != NULL) {
if (zone->uz_ctor(item, zone->uz_keg->uk_size,
udata, flags) != 0) {
@@ -1880,7 +1877,33 @@
}
}
}
+ /*
+ * Attempt to retrieve the item from the per-CPU cache has failed, so
+ * we must go back to the zone. This requires the zone lock, so we
+ * must drop the critical section, then re-acquire it when we go back
+ * to the cache. Since the critical section is released, we may be
+ * preempted or migrate. As such, make sure not to maintain any
+ * thread-local state specific to the cache from prior to releasing
+ * the critical section.
+ */
+ critical_exit();
ZONE_LOCK(zone);
+ critical_enter();
+ cpu = PCPU_GET(cpuid);
+ cache = &zone->uz_cpu[cpu];
+ bucket = cache->uc_allocbucket;
+ if (bucket != NULL) {
+ if (bucket != NULL && bucket->ub_cnt > 0) {
+ ZONE_UNLOCK(zone);
+ goto zalloc_start;
+ }
+ bucket = cache->uc_freebucket;
+ if (bucket != NULL && bucket->ub_cnt > 0) {
+ ZONE_UNLOCK(zone);
+ goto zalloc_start;
+ }
+ }
+
/* Since we have locked the zone we may as well send back our stats */
zone->uz_allocs += cache->uc_allocs;
cache->uc_allocs = 0;
@@ -1904,8 +1927,8 @@
ZONE_UNLOCK(zone);
goto zalloc_start;
}
- /* We are no longer associated with this cpu!!! */
- CPU_UNLOCK(cpu);
+ /* We are no longer associated with this CPU. */
+ critical_exit();
/* Bump up our uz_count so we get here less */
if (zone->uz_count < BUCKET_MAX)
@@ -2228,10 +2251,10 @@
uma_bucket_t bucket;
int bflags;
int cpu;
- enum zfreeskip skip;
+#ifdef INVARIANTS
+ int count;
+#endif
- /* This is the fast path free */
- skip = SKIP_NONE;
keg = zone->uz_keg;
#ifdef UMA_DEBUG_ALLOC_1
@@ -2240,25 +2263,50 @@
CTR2(KTR_UMA, "uma_zfree_arg thread %x zone %s", curthread,
zone->uz_name);
+ if (zone->uz_dtor)
+ zone->uz_dtor(item, keg->uk_size, udata);
+#ifdef INVARIANTS
+ ZONE_LOCK(zone);
+ if (keg->uk_flags & UMA_ZONE_MALLOC)
+ uma_dbg_free(zone, udata, item);
+ else
+ uma_dbg_free(zone, NULL, item);
+ ZONE_UNLOCK(zone);
+#endif
/*
* The race here is acceptable. If we miss it we'll just have to wait
* a little longer for the limits to be reset.
*/
-
if (keg->uk_flags & UMA_ZFLAG_FULL)
goto zfree_internal;
- if (zone->uz_dtor) {
- zone->uz_dtor(item, keg->uk_size, udata);
- skip = SKIP_DTOR;
- }
-
+#ifdef INVARIANTS
+ count = 0;
+#endif
+ /*
+ * If possible, free to the per-CPU cache. There are two
+ * requirements for safe access to the per-CPU cache: (1) the thread
+ * accessing the cache must not be preempted or yield during access,
+ * and (2) the thread must not migrate CPUs without switching which
+ * cache it accesses. We rely on a critical section to prevent
+ * preemption and migration. We release the critical section in
+ * order to acquire the zone mutex if we are unable to free to the
+ * current cache; when we re-acquire the critical section, we must
+ * detect and handle migration if it has occurred.
+ */
zfree_restart:
+ critical_enter();
cpu = PCPU_GET(cpuid);
- CPU_LOCK(cpu);
cache = &zone->uz_cpu[cpu];
zfree_start:
+#ifdef INVARIANTS
+ count++;
+ KASSERT(count < 10, ("uma_zfree_arg: count == 10"));
+#endif
+#if 0
+ critical_assert();
+#endif
bucket = cache->uc_freebucket;
if (bucket) {
@@ -2272,15 +2320,7 @@
("uma_zfree: Freeing to non free bucket index."));
bucket->ub_bucket[bucket->ub_cnt] = item;
bucket->ub_cnt++;
-#ifdef INVARIANTS
- ZONE_LOCK(zone);
- if (keg->uk_flags & UMA_ZONE_MALLOC)
- uma_dbg_free(zone, udata, item);
- else
- uma_dbg_free(zone, NULL, item);
- ZONE_UNLOCK(zone);
-#endif
- CPU_UNLOCK(cpu);
+ critical_exit();
return;
} else if (cache->uc_allocbucket) {
#ifdef UMA_DEBUG_ALLOC
@@ -2304,9 +2344,32 @@
*
* 1) The buckets are NULL
* 2) The alloc and free buckets are both somewhat full.
+ *
+ * We must go back the zone, which requires acquiring the zone lock,
+ * which in turn means we must release and re-acquire the critical
+ * section. Since the critical section is released, we may be
+ * preempted or migrate. As such, make sure not to maintain any
+ * thread-local state specific to the cache from prior to releasing
+ * the critical section.
*/
-
+ critical_exit();
ZONE_LOCK(zone);
+ critical_enter();
+ cpu = PCPU_GET(cpuid);
+ cache = &zone->uz_cpu[cpu];
+ if (cache->uc_freebucket != NULL) {
+ if (cache->uc_freebucket->ub_cnt <
+ cache->uc_freebucket->ub_entries) {
+ ZONE_UNLOCK(zone);
+ goto zfree_start;
+ }
+ if (cache->uc_allocbucket != NULL &&
+ (cache->uc_allocbucket->ub_cnt <
+ cache->uc_freebucket->ub_cnt)) {
+ ZONE_UNLOCK(zone);
+ goto zfree_start;
+ }
+ }
bucket = cache->uc_freebucket;
cache->uc_freebucket = NULL;
@@ -2328,8 +2391,8 @@
cache->uc_freebucket = bucket;
goto zfree_start;
}
- /* We're done with this CPU now */
- CPU_UNLOCK(cpu);
+ /* We are no longer associated with this CPU. */
+ critical_exit();
/* And the zone.. */
ZONE_UNLOCK(zone);
@@ -2353,27 +2416,9 @@
/*
* If nothing else caught this, we'll just do an internal free.
*/
-
zfree_internal:
+ uma_zfree_internal(zone, item, udata, SKIP_DTOR);
-#ifdef INVARIANTS
- /*
- * If we need to skip the dtor and the uma_dbg_free in
- * uma_zfree_internal because we've already called the dtor
- * above, but we ended up here, then we need to make sure
- * that we take care of the uma_dbg_free immediately.
- */
- if (skip) {
- ZONE_LOCK(zone);
- if (keg->uk_flags & UMA_ZONE_MALLOC)
- uma_dbg_free(zone, udata, item);
- else
- uma_dbg_free(zone, NULL, item);
- ZONE_UNLOCK(zone);
- }
-#endif
- uma_zfree_internal(zone, item, udata, skip);
-
return;
}
@@ -2655,7 +2700,7 @@
slab->us_flags = flags | UMA_SLAB_MALLOC;
slab->us_size = size;
} else {
- uma_zfree_internal(slabzone, slab, NULL, 0);
+ uma_zfree_internal(slabzone, slab, NULL, SKIP_NONE);
}
return (mem);
@@ -2666,7 +2711,7 @@
{
vsetobj((vm_offset_t)slab->us_data, kmem_object);
page_free(slab->us_data, slab->us_size, slab->us_flags);
- uma_zfree_internal(slabzone, slab, NULL, 0);
+ uma_zfree_internal(slabzone, slab, NULL, SKIP_NONE);
}
void
@@ -2743,6 +2788,7 @@
int cachefree;
uma_bucket_t bucket;
uma_cache_t cache;
+ u_int64_t alloc;
cnt = 0;
mtx_lock(&uma_mtx);
@@ -2766,15 +2812,9 @@
LIST_FOREACH(z, &zk->uk_zones, uz_link) {
if (cnt == 0) /* list may have changed size */
break;
- if (!(zk->uk_flags & UMA_ZFLAG_INTERNAL)) {
- for (cpu = 0; cpu <= mp_maxid; cpu++) {
- if (CPU_ABSENT(cpu))
- continue;
- CPU_LOCK(cpu);
- }
- }
ZONE_LOCK(z);
cachefree = 0;
+ alloc = 0;
if (!(zk->uk_flags & UMA_ZFLAG_INTERNAL)) {
for (cpu = 0; cpu <= mp_maxid; cpu++) {
if (CPU_ABSENT(cpu))
@@ -2784,9 +2824,12 @@
cachefree += cache->uc_allocbucket->ub_cnt;
if (cache->uc_freebucket != NULL)
cachefree += cache->uc_freebucket->ub_cnt;
- CPU_UNLOCK(cpu);
+ alloc += cache->uc_allocs;
+ cache->uc_allocs = 0;
}
}
+ alloc += z->uz_allocs;
+
LIST_FOREACH(bucket, &z->uz_full_bucket, ub_link) {
cachefree += bucket->ub_cnt;
}
@@ -2797,7 +2840,7 @@
zk->uk_maxpages * zk->uk_ipers,
(zk->uk_ipers * (zk->uk_pages / zk->uk_ppera)) - totalfree,
totalfree,
- (unsigned long long)z->uz_allocs);
+ (unsigned long long)alloc);
ZONE_UNLOCK(z);
for (p = offset + 12; p > offset && *p == ' '; --p)
/* nothing */ ;
--- //depot/vendor/freebsd/src/sys/vm/uma_int.h 2005/02/16 21:50:29
+++ //depot/user/rwatson/percpu/sys/vm/uma_int.h 2005/03/15 19:57:24
@@ -342,16 +342,6 @@
#define ZONE_LOCK(z) mtx_lock((z)->uz_lock)
#define ZONE_UNLOCK(z) mtx_unlock((z)->uz_lock)
-#define CPU_LOCK_INIT(cpu) \
- mtx_init(&uma_pcpu_mtx[(cpu)], "UMA pcpu", "UMA pcpu", \
- MTX_DEF | MTX_DUPOK)
-
-#define CPU_LOCK(cpu) \
- mtx_lock(&uma_pcpu_mtx[(cpu)])
-
-#define CPU_UNLOCK(cpu) \
- mtx_unlock(&uma_pcpu_mtx[(cpu)])
-
/*
* Find a slab within a hash table. This is used for OFFPAGE zones to lookup
* the slab structure.
-------------- next part --------------
--- //depot/vendor/freebsd/src/sys/kern/kern_mbuf.c 2005/02/16 21:50:29
+++ //depot/user/rwatson/percpu/sys/kern/kern_mbuf.c 2005/04/15 11:11:26
@@ -1,6 +1,7 @@
/*-
- * Copyright (c) 2004, 2005,
- * Bosko Milekic <bmilekic at FreeBSD.org>. All rights reserved.
+ * Copyright (c) 2004, 2005 Bosko Milekic <bmilekic at FreeBSD.org>
+ * Copyright (c) 2005 Robert N. M. Watson
+ * All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -31,6 +32,9 @@
#include "opt_mac.h"
#include "opt_param.h"
+/* Need mbstat_percpu definition from mbuf.h. */
+#define WANT_MBSTAT_PERCPU
+
#include <sys/param.h>
#include <sys/mac.h>
#include <sys/malloc.h>
@@ -39,6 +43,7 @@
#include <sys/domain.h>
#include <sys/eventhandler.h>
#include <sys/kernel.h>
+#include <sys/proc.h>
#include <sys/protosw.h>
#include <sys/smp.h>
#include <sys/sysctl.h>
@@ -79,7 +84,18 @@
*/
int nmbclusters;
+
+/*
+ * mbstat is the mbuf statistics structure exposed to userspace.
+ *
+ * mbstat_percpu is the per-CPU statistics structure in which many of the
+ * mbstat measurements are gathered before being combined for exposure to
+ * userspace. mbstat_percpu is read lockless, so subject to small
+ * consistency races. It is modified holding a critical section to avoid
+ * read-modify-write races in the presence of preemption.
+ */
struct mbstat mbstat;
+struct mbstat_percpu mbstat_percpu[MAXCPU];
static void
tunable_mbinit(void *dummy)
@@ -91,11 +107,13 @@
}
SYSINIT(tunable_mbinit, SI_SUB_TUNABLES, SI_ORDER_ANY, tunable_mbinit, NULL);
+static int sysctl_kern_ipc_mbstat(SYSCTL_HANDLER_ARGS);
+
SYSCTL_DECL(_kern_ipc);
SYSCTL_INT(_kern_ipc, OID_AUTO, nmbclusters, CTLFLAG_RW, &nmbclusters, 0,
"Maximum number of mbuf clusters allowed");
-SYSCTL_STRUCT(_kern_ipc, OID_AUTO, mbstat, CTLFLAG_RD, &mbstat, mbstat,
- "Mbuf general information and statistics");
+SYSCTL_PROC(_kern_ipc, OID_AUTO, mbstat, CTLFLAG_RD, NULL, 0,
+ sysctl_kern_ipc_mbstat, "", "Mbuf general information and statistics");
/*
* Zones from which we allocate.
@@ -170,8 +188,69 @@
mbstat.m_mcfail = mbstat.m_mpfail = 0;
mbstat.sf_iocnt = 0;
mbstat.sf_allocwait = mbstat.sf_allocfail = 0;
+
+ /* mbstat_percpu is zero'd by BSS. */
}
+static int
+sysctl_kern_ipc_mbstat(SYSCTL_HANDLER_ARGS)
+{
+ struct mbstat_percpu *mbp, mbp_local;
+ u_char cpu;
+
+ bzero(&mbp_local, sizeof(mbp_local));
+ for (cpu = 0; cpu < MAXCPU; cpu++) {
+ mbp = &mbstat_percpu[cpu];
+ mbp_local.mbp_mbuf_allocs += mbp->mbp_mbuf_allocs;
+ mbp_local.mbp_mbuf_frees += mbp->mbp_mbuf_frees;
+ mbp_local.mbp_mbuf_fails += mbp->mbp_mbuf_fails;
+ mbp_local.mbp_mbuf_drains += mbp->mbp_mbuf_drains;
+ mbp_local.mbp_clust_allocs += mbp->mbp_clust_allocs;
+ mbp_local.mbp_clust_frees += mbp->mbp_clust_frees;
+
+ mbp_local.mbp_copy_fails += mbp->mbp_copy_fails;
+ mbp_local.mbp_pullup_fails += mbp->mbp_pullup_fails;
+
+ mbp_local.sfp_iocnt += mbp->sfp_iocnt;
+ mbp_local.sfp_alloc_fails += mbp->sfp_alloc_fails;
+ mbp_local.sfp_alloc_waits += mbp->sfp_alloc_waits;
+ }
+
+ /*
+ * If, due to races, the number of frees for mbufs or clusters is
+ * greater than the number of allocs, adjust alloc stats to 0. This
+ * isn't quite accurate, but for the time being, we consider the
+ * performance win of races worth the occasional inaccuracy.
+ */
+ if (mbp_local.mbp_mbuf_allocs > mbp_local.mbp_mbuf_frees)
+ mbstat.m_mbufs = mbp_local.mbp_mbuf_allocs -
+ mbp_local.mbp_mbuf_frees;
+ else
+ mbstat.m_mbufs = 0;
+
+ if (mbp_local.mbp_clust_allocs > mbp_local.mbp_clust_frees)
+ mbstat.m_mclusts = mbp_local.mbp_clust_allocs -
+ mbp_local.mbp_clust_frees;
+ else
+ mbstat.m_mclusts = 0;
+
+ mbstat.m_drain = mbp_local.mbp_mbuf_drains;
+ mbstat.m_mcfail = mbp_local.mbp_copy_fails;
+ mbstat.m_mpfail = mbp_local.mbp_pullup_fails;
+
+ mbstat.sf_iocnt = mbp_local.sfp_iocnt;
+ mbstat.sf_allocfail = mbp_local.sfp_alloc_fails;
+ /*
+ * sf_allocwait is protected by per-architecture mutex sf_buf_lock,
+ * which is held whenever sf_allocwait is updated, so don't use the
+ * per-cpu version here
+ *
+ * mbstat.sf_allocwait = mbp_local.sfp_alloc_waits;
+ */
+
+ return (SYSCTL_OUT(req, &mbstat, sizeof(mbstat)));
+}
+
/*
* Constructor for Mbuf master zone.
*
@@ -212,7 +291,10 @@
#endif
} else
m->m_data = m->m_dat;
- mbstat.m_mbufs += 1; /* XXX */
+
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_mbuf_allocs++;
+ critical_exit();
return (0);
}
@@ -227,7 +309,9 @@
m = (struct mbuf *)mem;
if ((m->m_flags & M_PKTHDR) != 0)
m_tag_delete_chain(m, NULL);
- mbstat.m_mbufs -= 1; /* XXX */
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_mbuf_frees++;
+ critical_exit();
}
/* XXX Only because of stats */
@@ -235,12 +319,16 @@
mb_dtor_pack(void *mem, int size, void *arg)
{
struct mbuf *m;
+ u_char cpu;
m = (struct mbuf *)mem;
if ((m->m_flags & M_PKTHDR) != 0)
m_tag_delete_chain(m, NULL);
- mbstat.m_mbufs -= 1; /* XXX */
- mbstat.m_mclusts -= 1; /* XXX */
+ critical_enter();
+ cpu = curcpu;
+ mbstat_percpu[cpu].mbp_mbuf_frees++;
+ mbstat_percpu[cpu].mbp_clust_frees++;
+ critical_exit();
}
/*
@@ -263,7 +351,9 @@
m->m_ext.ext_size = MCLBYTES;
m->m_ext.ext_type = EXT_CLUSTER;
m->m_ext.ref_cnt = NULL; /* Lazy counter assign. */
- mbstat.m_mclusts += 1; /* XXX */
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_clust_allocs++;
+ critical_exit();
return (0);
}
@@ -271,7 +361,10 @@
static void
mb_dtor_clust(void *mem, int size, void *arg)
{
- mbstat.m_mclusts -= 1; /* XXX */
+
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_clust_frees++;
+ critical_exit();
}
/*
@@ -288,7 +381,9 @@
uma_zalloc_arg(zone_clust, m, how);
if (m->m_ext.ext_buf == NULL)
return (ENOMEM);
- mbstat.m_mclusts -= 1; /* XXX */
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_clust_frees++;
+ critical_exit();
return (0);
}
@@ -304,7 +399,9 @@
m = (struct mbuf *)mem;
uma_zfree_arg(zone_clust, m->m_ext.ext_buf, NULL);
m->m_ext.ext_buf = NULL;
- mbstat.m_mclusts += 1; /* XXX */
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_clust_allocs++;
+ critical_exit();
}
/*
@@ -320,6 +417,7 @@
#endif
int flags;
short type;
+ u_char cpu;
m = (struct mbuf *)mem;
args = (struct mb_args *)arg;
@@ -348,8 +446,11 @@
return (error);
#endif
}
- mbstat.m_mbufs += 1; /* XXX */
- mbstat.m_mclusts += 1; /* XXX */
+ critical_enter();
+ cpu = curcpu;
+ mbstat_percpu[cpu].mbp_mbuf_allocs++;
+ mbstat_percpu[cpu].mbp_clust_allocs++;
+ critical_exit();
return (0);
}
@@ -369,7 +470,9 @@
WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK | WARN_PANIC, NULL,
"mb_reclaim()");
- mbstat.m_drain++;
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_mbuf_drains++;
+ critical_exit();
for (dp = domains; dp != NULL; dp = dp->dom_next)
for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++)
if (pr->pr_drain != NULL)
--- //depot/vendor/freebsd/src/sys/kern/uipc_mbuf.c 2005/03/17 19:35:19
+++ //depot/user/rwatson/percpu/sys/kern/uipc_mbuf.c 2005/04/15 10:55:44
@@ -36,6 +36,9 @@
#include "opt_param.h"
#include "opt_mbuf_stress_test.h"
+/* Need mbstat_percpu definition from mbuf.h. */
+#define WANT_MBSTAT_PERCPU
+
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
@@ -44,8 +47,10 @@
#include <sys/mac.h>
#include <sys/malloc.h>
#include <sys/mbuf.h>
+#include <sys/pcpu.h>
#include <sys/sysctl.h>
#include <sys/domain.h>
+#include <sys/proc.h>
#include <sys/protosw.h>
#include <sys/uio.h>
@@ -428,13 +433,18 @@
m = m->m_next;
np = &n->m_next;
}
- if (top == NULL)
- mbstat.m_mcfail++; /* XXX: No consistency. */
+ if (top == NULL) {
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_copy_fails++;
+ critical_exit();
+ }
return (top);
nospace:
m_freem(top);
- mbstat.m_mcfail++; /* XXX: No consistency. */
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_copy_fails++;
+ critical_exit();
return (NULL);
}
@@ -497,7 +507,9 @@
return top;
nospace:
m_freem(top);
- mbstat.m_mcfail++; /* XXX: No consistency. */
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_copy_fails++;
+ critical_exit();
return (NULL);
}
@@ -600,7 +612,9 @@
nospace:
m_freem(top);
- mbstat.m_mcfail++; /* XXX: No consistency. */
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_copy_fails++;
+ critical_exit();
return (NULL);
}
@@ -762,7 +776,9 @@
return (m);
bad:
m_freem(n);
- mbstat.m_mpfail++; /* XXX: No consistency. */
+ critical_enter();
+ mbstat_percpu[curcpu].mbp_pullup_fails++;
+ critical_exit();
return (NULL);
}
--- //depot/vendor/freebsd/src/sys/kern/uipc_syscalls.c 2005/03/31 04:35:16
+++ //depot/user/rwatson/percpu/sys/kern/uipc_syscalls.c 2005/04/15 10:55:44
@@ -39,6 +39,9 @@
#include "opt_ktrace.h"
#include "opt_mac.h"
+/* Need mbstat_percpu definition from mbuf.h. */
+#define WANT_MBSTAT_PERCPU
+
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
@@ -1926,7 +1929,9 @@
vm_page_io_finish(pg);
if (!error)
VM_OBJECT_UNLOCK(obj);
- mbstat.sf_iocnt++;
+ critical_enter();
+ mbstat_percpu[curcpu].sfp_iocnt++;
+ critical_exit();
}
if (error) {
@@ -1954,7 +1959,9 @@
* but this wait can be interrupted.
*/
if ((sf = sf_buf_alloc(pg, SFB_CATCH)) == NULL) {
- mbstat.sf_allocfail++;
+ critical_enter();
+ mbstat_percpu[curcpu].sfp_alloc_fails++;
+ critical_exit();
vm_page_lock_queues();
vm_page_unwire(pg, 0);
if (pg->wire_count == 0 && pg->object == NULL)
--- //depot/vendor/freebsd/src/sys/sys/mbuf.h 2005/03/17 19:35:19
+++ //depot/user/rwatson/percpu/sys/sys/mbuf.h 2005/04/15 10:55:44
@@ -243,6 +243,29 @@
#define MT_NTYPES 16 /* number of mbuf types for mbtypes[] */
/*
+ * Per-CPU mbuf allocator statistics, which are collated to construct the
+ * global statistics. They are read lockless, but written to while in a
+ * critical section to prevent read-modify-write races.
+ *
+ * XXXRW: As with comments below, maybe sendfile stats should be elsesewhere.
+ */
+struct mbstat_percpu {
+ u_long mbp_mbuf_allocs; /* mbufs alloc'd on CPU. */
+ u_long mbp_mbuf_frees; /* mbufs freed on CPU. */
+ u_long mbp_mbuf_fails; /* mbuf alloc failures on CPU. */
+ u_long mbp_mbuf_drains; /* mbuf drains on CPU .*/
+ u_long mbp_clust_allocs; /* clusters alloc'd on CPU. */
+ u_long mbp_clust_frees; /* clusters freed on CPU. */
+
+ u_long mbp_copy_fails; /* mbuf copy failures on CPU. */
+ u_long mbp_pullup_fails; /* mbuf pullup failures on CPU. */
+
+ u_long sfp_iocnt; /* sendfile I/O's on CPU. */
+ u_long sfp_alloc_fails; /* sendfile alloc failures on CPU. */
+ u_long sfp_alloc_waits; /* sendfile alloc waits on CPU. */
+};
+
+/*
* General mbuf allocator statistics structure.
*/
struct mbstat {
@@ -550,6 +573,15 @@
extern struct mbstat mbstat; /* General mbuf stats/infos */
extern int nmbclusters; /* Maximum number of clusters */
+/*
+ * Avoid exposing PERCPU definition outside of a very limited set of files,
+ * so that the compile-time value of PERCPU doesn't become part of the
+ * exposed kernel ABI.
+ */
+#ifdef WANT_MBSTAT_PERCPU
+extern struct mbstat_percpu mbstat_percpu[MAXCPU];
+#endif
+
struct uio;
void m_adj(struct mbuf *, int);
--- //depot/vendor/freebsd/src/sys/sys/pcpu.h 2005/01/07 02:32:16
+++ //depot/user/rwatson/percpu/sys/sys/pcpu.h 2005/04/15 10:55:44
@@ -81,6 +81,7 @@
extern struct cpuhead cpuhead;
#define CURPROC (curthread->td_proc)
+#define curcpu (curthread->td_oncpu)
#define curkse (curthread->td_kse)
#define curksegrp (curthread->td_ksegrp)
#define curproc (curthread->td_proc)
-------------- next part --------------
--- //depot/vendor/freebsd/src/sys/kern/kern_malloc.c 2005/04/12 23:55:38
+++ //depot/user/rwatson/percpu/sys/kern/kern_malloc.c 2005/04/14 22:38:16
@@ -1,4 +1,5 @@
/*-
+ * Copyright (c) 2005 Robert N. M. Watson
* Copyright (c) 1987, 1991, 1993
* The Regents of the University of California. All rights reserved.
*
@@ -44,6 +45,7 @@
#include <sys/mutex.h>
#include <sys/vmmeter.h>
#include <sys/proc.h>
+#include <sys/sbuf.h>
#include <sys/sysctl.h>
#include <sys/time.h>
@@ -133,6 +135,33 @@
{0, NULL},
};
+/*
+ * Two malloc type structures are present: malloc_type, which is used by a
+ * type owner to declare the type, and malloc_type_internal, which holds
+ * malloc-owned statistics and other ABI-sensitive fields, such as the set of
+ * malloc statistics indexed by the compile-time MAXCPU constant.
+ *
+ * The malloc_type ks_next field is protected by malloc_mtx. Other fields in
+ * malloc_type are static after initialization so unsynchronized.
+ *
+ * Statistics in malloc_type_stats are written only when holding a critical
+ * section, but read lock-free resulting in possible (minor) races, which the
+ * monitoring app should take into account.
+ */
+struct malloc_type_stats {
+ u_long mts_memalloced; /* Bytes allocated on CPU. */
+ u_long mts_memfreed; /* Bytes freed on CPU. */
+ u_long mts_numallocs; /* Number of allocates on CPU. */
+ u_long mts_numfrees; /* Number of frees on CPU. */
+ u_long mts_size; /* Bitmask of sizes allocated on CPU. */
+};
+
+struct malloc_type_internal {
+ struct malloc_type_stats mti_stats[MAXCPU];
+};
+
+uma_zone_t mt_zone;
+
#ifdef DEBUG_MEMGUARD
u_int vm_memguard_divisor;
SYSCTL_UINT(_vm, OID_AUTO, memguard_divisor, CTLFLAG_RD, &vm_memguard_divisor,
@@ -197,41 +226,48 @@
* Add this to the informational malloc_type bucket.
*/
static void
-malloc_type_zone_allocated(struct malloc_type *ksp, unsigned long size,
+malloc_type_zone_allocated(struct malloc_type *type, unsigned long size,
int zindx)
{
- mtx_lock(&ksp->ks_mtx);
- ksp->ks_calls++;
+ struct malloc_type_internal *mti;
+ struct malloc_type_stats *mts;
+ u_char cpu;
+
+ critical_enter();
+ cpu = curthread->td_oncpu;
+ mti = (struct malloc_type_internal *)(type->ks_handle);
+ mts = &mti->mti_stats[cpu];
+ mts->mts_memalloced += size;
+ mts->mts_numallocs++;
if (zindx != -1)
- ksp->ks_size |= 1 << zindx;
- if (size != 0) {
- ksp->ks_memuse += size;
- ksp->ks_inuse++;
- if (ksp->ks_memuse > ksp->ks_maxused)
- ksp->ks_maxused = ksp->ks_memuse;
- }
- mtx_unlock(&ksp->ks_mtx);
+ mts->mts_size |= 1 << zindx;
+ critical_exit();
}
void
-malloc_type_allocated(struct malloc_type *ksp, unsigned long size)
+malloc_type_allocated(struct malloc_type *type, unsigned long size)
{
- malloc_type_zone_allocated(ksp, size, -1);
+
+ malloc_type_zone_allocated(type, size, -1);
}
/*
* Remove this allocation from the informational malloc_type bucket.
*/
void
-malloc_type_freed(struct malloc_type *ksp, unsigned long size)
+malloc_type_freed(struct malloc_type *type, unsigned long size)
{
- mtx_lock(&ksp->ks_mtx);
- KASSERT(size <= ksp->ks_memuse,
- ("malloc(9)/free(9) confusion.\n%s",
- "Probably freeing with wrong type, but maybe not here."));
- ksp->ks_memuse -= size;
- ksp->ks_inuse--;
- mtx_unlock(&ksp->ks_mtx);
+ struct malloc_type_internal *mti;
+ struct malloc_type_stats *mts;
+ u_char cpu;
+
+ critical_enter();
+ cpu = curthread->td_oncpu;
+ mti = (struct malloc_type_internal *)type->ks_handle;
+ mts = &mti->mti_stats[cpu];
+ mts->mts_memfreed += size;
+ mts->mts_numfrees++;
+ critical_exit();
}
/*
@@ -351,9 +387,6 @@
}
#endif
- KASSERT(type->ks_memuse > 0,
- ("malloc(9)/free(9) confusion.\n%s",
- "Probably freeing with wrong type, but maybe not here."));
size = 0;
slab = vtoslab((vm_offset_t)addr & (~UMA_SLAB_MASK));
@@ -405,6 +438,11 @@
if (addr == NULL)
return (malloc(size, type, flags));
+ /*
+ * XXX: Should report free of old memory and alloc of new memory to
+ * per-CPU stats.
+ */
+
#ifdef DEBUG_MEMGUARD
/* XXX: CHANGEME! */
if (type == M_SUBPROC) {
@@ -543,6 +581,13 @@
uma_startup2();
+ mt_zone = uma_zcreate("mt_zone", sizeof(struct malloc_type_internal),
+#ifdef INVARIANTS
+ mtrash_ctor, mtrash_dtor, mtrash_init, mtrash_fini,
+#else
+ NULL, NULL, NULL, NULL,
+#endif
+ UMA_ALIGN_PTR, UMA_ZONE_MALLOC);
for (i = 0, indx = 0; kmemzones[indx].kz_size != 0; indx++) {
int size = kmemzones[indx].kz_size;
char *name = kmemzones[indx].kz_name;
@@ -562,127 +607,142 @@
}
void
-malloc_init(void *data)
+malloc_init(void *type)
{
- struct malloc_type *type = (struct malloc_type *)data;
+ struct malloc_type_internal *mti;
+ struct malloc_type *mt;
- mtx_lock(&malloc_mtx);
- if (type->ks_magic != M_MAGIC)
- panic("malloc type lacks magic");
+ KASSERT(cnt.v_page_count != 0, ("malloc_register before vm_init"));
- if (cnt.v_page_count == 0)
- panic("malloc_init not allowed before vm init");
+ mt = type;
+ mti = uma_zalloc(mt_zone, M_WAITOK | M_ZERO);
+ mt->ks_handle = mti;
- if (type->ks_next != NULL)
- return;
-
- type->ks_next = kmemstatistics;
+ mtx_lock(&malloc_mtx);
+ mt->ks_next = kmemstatistics;
kmemstatistics = type;
- mtx_init(&type->ks_mtx, type->ks_shortdesc, "Malloc Stats", MTX_DEF);
mtx_unlock(&malloc_mtx);
}
void
-malloc_uninit(void *data)
+malloc_uninit(void *type)
{
- struct malloc_type *type = (struct malloc_type *)data;
- struct malloc_type *t;
+ struct malloc_type_internal *mti;
+ struct malloc_type *mt, *temp;
+ mt = type;
+ KASSERT(mt->ks_handle != NULL, ("malloc_deregister: cookie NULL"));
mtx_lock(&malloc_mtx);
- mtx_lock(&type->ks_mtx);
- if (type->ks_magic != M_MAGIC)
- panic("malloc type lacks magic");
-
- if (cnt.v_page_count == 0)
- panic("malloc_uninit not allowed before vm init");
-
- if (type == kmemstatistics)
- kmemstatistics = type->ks_next;
- else {
- for (t = kmemstatistics; t->ks_next != NULL; t = t->ks_next) {
- if (t->ks_next == type) {
- t->ks_next = type->ks_next;
- break;
- }
+ mti = mt->ks_handle;
+ mt->ks_handle = NULL;
+ if (mt != kmemstatistics) {
+ for (temp = kmemstatistics; temp != NULL;
+ temp = temp->ks_next) {
+ if (temp->ks_next == mt)
+ temp->ks_next = mt->ks_next;
}
- }
- type->ks_next = NULL;
- mtx_destroy(&type->ks_mtx);
+ } else
+ kmemstatistics = mt->ks_next;
mtx_unlock(&malloc_mtx);
+ uma_zfree(mt_zone, type);
}
static int
sysctl_kern_malloc(SYSCTL_HANDLER_ARGS)
{
+ struct malloc_type_stats *mts, mts_local;
+ struct malloc_type_internal *mti;
+ long temp_allocs, temp_bytes;
struct malloc_type *type;
int linesize = 128;
- int curline;
+ struct sbuf sbuf;
int bufsize;
int first;
int error;
char *buf;
- char *p;
int cnt;
- int len;
int i;
cnt = 0;
+ /* Guess at how much room is needed. */
mtx_lock(&malloc_mtx);
for (type = kmemstatistics; type != NULL; type = type->ks_next)
cnt++;
+ mtx_unlock(&malloc_mtx);
- mtx_unlock(&malloc_mtx);
bufsize = linesize * (cnt + 1);
- p = buf = (char *)malloc(bufsize, M_TEMP, M_WAITOK|M_ZERO);
+ buf = (char *)malloc(bufsize, M_TEMP, M_WAITOK|M_ZERO);
+ sbuf_new(&sbuf, buf, bufsize, SBUF_FIXEDLEN);
+
mtx_lock(&malloc_mtx);
- len = snprintf(p, linesize,
+
+ sbuf_printf(&sbuf,
"\n Type InUse MemUse HighUse Requests Size(s)\n");
- p += len;
-
for (type = kmemstatistics; cnt != 0 && type != NULL;
type = type->ks_next, cnt--) {
- if (type->ks_calls == 0)
+ mti = type->ks_handle;
+ bzero(&mts_local, sizeof(mts_local));
+ for (i = 0; i < MAXCPU; i++) {
+ mts = &mti->mti_stats[i];
+ mts_local.mts_memalloced += mts->mts_memalloced;
+ mts_local.mts_memfreed += mts->mts_memfreed;
+ mts_local.mts_numallocs += mts->mts_numallocs;
+ mts_local.mts_numfrees += mts->mts_numfrees;
+ mts_local.mts_size |= mts->mts_size;
+ }
+ if (mts_local.mts_numallocs == 0)
continue;
- curline = linesize - 2; /* Leave room for the \n */
- len = snprintf(p, curline, "%13s%6lu%6luK%7luK%9llu",
- type->ks_shortdesc,
- type->ks_inuse,
- (type->ks_memuse + 1023) / 1024,
- (type->ks_maxused + 1023) / 1024,
- (long long unsigned)type->ks_calls);
- curline -= len;
- p += len;
+ /*
+ * Due to races in per-CPU statistics gather, it's possible to
+ * get a slightly negative number here. If we do, approximate
+ * with 0.
+ */
+ if (mts_local.mts_numallocs > mts_local.mts_numfrees)
+ temp_allocs = mts_local.mts_numallocs -
+ mts_local.mts_numfrees;
+ else
+ temp_allocs = 0;
+
+ /*
+ * Ditto for bytes allocated.
+ */
+ if (mts_local.mts_memalloced > mts_local.mts_memfreed)
+ temp_bytes = mts_local.mts_memalloced -
+ mts_local.mts_memfreed;
+ else
+ temp_bytes = 0;
+
+ sbuf_printf(&sbuf, "%13s%6lu%6luK%7luK%9lu",
+ type->ks_shortdesc,
+ temp_allocs,
+ (temp_bytes + 1023) / 1024,
+ 0L, /* XXX: Not available currently. */
+ mts_local.mts_numallocs);
first = 1;
for (i = 0; i < sizeof(kmemzones) / sizeof(kmemzones[0]) - 1;
i++) {
- if (type->ks_size & (1 << i)) {
+ if (mts_local.mts_size & (1 << i)) {
if (first)
- len = snprintf(p, curline, " ");
+ sbuf_printf(&sbuf, " ");
else
- len = snprintf(p, curline, ",");
- curline -= len;
- p += len;
-
- len = snprintf(p, curline,
- "%s", kmemzones[i].kz_name);
- curline -= len;
- p += len;
-
+ sbuf_printf(&sbuf, ",");
+ sbuf_printf(&sbuf, "%s",
+ kmemzones[i].kz_name);
first = 0;
}
}
-
- len = snprintf(p, 2, "\n");
- p += len;
+ sbuf_printf(&sbuf, "\n");
}
+ sbuf_finish(&sbuf);
+ mtx_unlock(&malloc_mtx);
- mtx_unlock(&malloc_mtx);
- error = SYSCTL_OUT(req, buf, p - buf);
+ error = SYSCTL_OUT(req, sbuf_data(&sbuf), sbuf_len(&sbuf));
+ sbuf_delete(&sbuf);
free(buf, M_TEMP);
return (error);
}
@@ -696,6 +756,7 @@
sysctl_kern_mprof(SYSCTL_HANDLER_ARGS)
{
int linesize = 64;
+ struct sbuf sbuf;
uint64_t count;
uint64_t waste;
uint64_t mem;
@@ -704,7 +765,6 @@
char *buf;
int rsize;
int size;
- char *p;
int len;
int i;
@@ -714,34 +774,30 @@
waste = 0;
mem = 0;
- p = buf = (char *)malloc(bufsize, M_TEMP, M_WAITOK|M_ZERO);
- len = snprintf(p, bufsize,
+ buf = (char *)malloc(bufsize, M_TEMP, M_WAITOK|M_ZERO);
+ sbuf_new(&sbuf, buf, bufsize, SBUF_FIXEDLEN);
+ sbuf_printf(&sbuf,
"\n Size Requests Real Size\n");
- bufsize -= len;
- p += len;
-
for (i = 0; i < KMEM_ZSIZE; i++) {
size = i << KMEM_ZSHIFT;
rsize = kmemzones[kmemsize[i]].kz_size;
count = (long long unsigned)krequests[i];
- len = snprintf(p, bufsize, "%6d%28llu%11d\n",
- size, (unsigned long long)count, rsize);
- bufsize -= len;
- p += len;
+ sbuf_printf(&sbuf, "%6d%28llu%11d\n", size,
+ (unsigned long long)count, rsize);
if ((rsize * count) > (size * count))
waste += (rsize * count) - (size * count);
mem += (rsize * count);
}
-
- len = snprintf(p, bufsize,
+ sbuf_printf(&sbuf,
"\nTotal memory used:\t%30llu\nTotal Memory wasted:\t%30llu\n",
(unsigned long long)mem, (unsigned long long)waste);
- p += len;
+ sbuf_finish(&sbuf);
- error = SYSCTL_OUT(req, buf, p - buf);
+ error = SYSCTL_OUT(req, sbuf_data(&sbuf), sbuf_len(&sbuf));
+ sbuf_delete(&sbuf);
free(buf, M_TEMP);
return (error);
}
--- //depot/vendor/freebsd/src/sys/sys/malloc.h 2005/01/07 02:32:16
+++ //depot/user/rwatson/percpu/sys/sys/malloc.h 2005/04/14 12:54:00
@@ -50,25 +50,51 @@
#define M_MAGIC 877983977 /* time when first defined :-) */
+/*
+ * ABI-compatible version of the old 'struct malloc_type', only all stats are
+ * now malloc-managed in malloc-owned memory rather than in caller memory, so
+ * as to avoid ABI issues. The ks_next pointer is reused as a pointer to the
+ * internal data handle.
+ *
+ * XXXRW: Why is this not ifdef _KERNEL?
+ *
+ * XXXRW: Use of ks_shortdesc has leaked out of kern_malloc.c.
+ */
struct malloc_type {
- struct malloc_type *ks_next; /* next in list */
- u_long ks_memuse; /* total memory held in bytes */
- u_long ks_size; /* sizes of this thing that are allocated */
- u_long ks_inuse; /* # of packets of this type currently in use */
- uint64_t ks_calls; /* total packets of this type ever allocated */
- u_long ks_maxused; /* maximum number ever used */
- u_long ks_magic; /* if it's not magic, don't touch it */
- const char *ks_shortdesc; /* short description */
- struct mtx ks_mtx; /* lock for stats */
+ struct malloc_type *ks_next; /* Next in global chain. */
+ u_long _ks_size; /* No longer used. */
+ u_long _ks_inuse; /* No longer used. */
+ uint64_t _ks_calls; /* No longer used. */
+ u_long _ks_maxused; /* No longer used. */
+ u_long ks_magic; /* Detect programmer error. */
+ const char *ks_shortdesc; /* Printable type name. */
+
+ /*
+ * struct malloc_type was terminated with a struct mtx, which is no
+ * longer required. For ABI reasons, continue to flesh out the full
+ * size of the old structure, but reuse the _lo_class field for our
+ * internal data handle.
+ */
+ void *ks_handle; /* Priv. data, was lo_class. */
+ const char *_lo_name;
+ const char *_lo_type;
+ u_int _lo_flags;
+ void *_lo_list_next;
+ struct witness *_lo_witness;
+ uintptr_t _mtx_lock;
+ u_int _mtx_recurse;
};
#ifdef _KERNEL
-#define MALLOC_DEFINE(type, shortdesc, longdesc) \
- struct malloc_type type[1] = { \
- { NULL, 0, 0, 0, 0, 0, M_MAGIC, shortdesc, {} } \
- }; \
- SYSINIT(type##_init, SI_SUB_KMEM, SI_ORDER_SECOND, malloc_init, type); \
- SYSUNINIT(type##_uninit, SI_SUB_KMEM, SI_ORDER_ANY, malloc_uninit, type)
+#define MALLOC_DEFINE(type, shortdesc, longdesc) \
+ struct malloc_type type[1] = { \
+ { NULL, 0, 0, 0, 0, M_MAGIC, shortdesc, NULL, NULL, \
+ NULL, 0, NULL, NULL, 0, 0 } \
+ }; \
+ SYSINIT(type##_init, SI_SUB_KMEM, SI_ORDER_SECOND, malloc_init, \
+ type); \
+ SYSUNINIT(type##_uninit, SI_SUB_KMEM, SI_ORDER_ANY, \
+ malloc_uninit, type);
#define MALLOC_DECLARE(type) \
extern struct malloc_type type[1]
@@ -112,6 +138,7 @@
int flags);
void *reallocf(void *addr, unsigned long size, struct malloc_type *type,
int flags);
+
#endif /* _KERNEL */
#endif /* !_SYS_MALLOC_H_ */
More information about the freebsd-performance
mailing list