[Patch] Kgmon/Gprof On/Off Switch
Bruce Evans
brde at optusnet.com.au
Wed Jul 7 00:40:37 UTC 2010
On Tue, 6 Jul 2010, Sean Bruno wrote:
> On Thu, 2010-07-01 at 16:46 -0700, Sean Bruno wrote:
>> Found this lying around the Yahoo tree this week. Basically it allows
>> you to activate, reset and deactivate profiling with the '-f" flag.
I want something like this, but this implementation has too many style
bugs and obscure or missing locking.
>> Kind of nice to have if you want the ability to turn on profiling for
>> debugging live systems.
We already have this via `kgmon -hBb', but this patch implements
something different. Normal profiling requires configuring the kernel
with "config -p[p]", but that adds a lot of overhead, even when profiling
is not turned on. The patch implements dynamic configuration of flat
profiling only (since dynamic call graphs are too hard to implement,
but this restriction is not mentioned anywhere in the patch or the
accompanying mail.
Userland profiling has the same lossage -- you have to configure
programs specially by compiling with -pg, but that adds a lot of
overhead, even when profiling is not turned on. Moreover, in userland
there is no library support for turning profiling on and off or for
dumping and clearing the statistics except on program start and
completion, so the much larger overhead of always maintaining the call
graph is normally always present.
In FreeBSD-1, I (ab)used a bug in the implementation to turn on flat
profiling (only) in the kernel. This avoids the large call graph
overhead (it still costs a function call and some other branches on
every call and maybe on every return, but the branches for this are
predictable so the overhead is not very large. This should be made
a standard feature, and perhaps this patch implements it as a bug
without really trying, as in FreeBSD-1 (implementing it just takes
setting gp->state to a value that gives flat profiling while keeping
mcount() null if full profiling is configured.
% Index: usr.sbin/kgmon/kgmon.8
% ===================================================================
% --- usr.sbin/kgmon/kgmon.8 (revision 209745)
% +++ usr.sbin/kgmon/kgmon.8 (working copy)
% @@ -70,6 +70,9 @@
% Dump the contents of the profile buffers into a
% .Pa gmon.out
% file.
% +.It Fl f
% +Free profiling buffer.
% +Also stops profiling.
% .It Fl r
% Reset all the profile buffers.
% If the
Freeing the profiling buffer alone isn't very useful. The memory wastage
from always allocating it at boot time and never freeing it would be
rarely noticed, and would fix some races or simplify avoidance of races.
However, apart from the race problems, ordinary statically configured
profiling should free things too, since unlike in userland it is normal
to have profiling turned off most of the time even when it is statically
configured.
The above doesn't really describe what -f does. In normal profiling,
there are multiple profiling buffers and freeing just the flat profiling
one makes no sense. -f in fact frees the profiling buffer only if the
kernel is _not_ configured for profiling (as is required to not corrupt
the set of profiling buffers if the kernel is configured for profiling).
The profiling buffer is automatically allocated on first use iff it
is not already allocated, and of course -f has no effect if no buffer
is allocated. Perhaps -r should automatically deallocate, so -f would
not be needed. Kernel profiling has this feature of allowing multiple
dumps of the buffer(s) before reset, so reset is a good time to deallocate.
Style bugs in the above: f is unsorted between p and r.
% Index: usr.sbin/kgmon/kgmon.c
% ===================================================================
% --- usr.sbin/kgmon/kgmon.c (revision 209745)
% +++ usr.sbin/kgmon/kgmon.c (working copy)
% @@ -72,7 +72,7 @@
% struct gmonparam gpm;
% };
%
% -int Bflag, bflag, hflag, kflag, rflag, pflag;
% +int Bflag, bflag, hflag, kflag, rflag, pflag, fflag;
Style bugs: now f is unsorted after r and p. p was already unsorted after r.
% int debug = 0;
% int getprof(struct kvmvars *);
% int getprofhz(struct kvmvars *);
% @@ -82,6 +82,7 @@
% void dumpstate(struct kvmvars *kvp);
% void reset(struct kvmvars *kvp);
% static void usage(void);
% +void freebuf(struct kvmvars *kvp);
Style bugs: f is unsorted after u; all the old declarations are indented
with 1 tab while the new one is indented with spaces.
%
% int
% main(int argc, char **argv)
% @@ -93,7 +94,7 @@
% seteuid(getuid());
% kmemf = NULL;
% system = NULL;
% - while ((ch = getopt(argc, argv, "M:N:Bbhpr")) != -1) {
% + while ((ch = getopt(argc, argv, "M:N:Bbfhpr")) != -1) {
% switch((char)ch) {
%
% case 'M':
% @@ -113,6 +114,10 @@
% bflag = 1;
% break;
%
% + case 'f':
% + fflag = 1;
% + break;
% +
% case 'h':
% hflag = 1;
% break;
Now f is correctly sorted in the above 2 places. But it is missing from
the usage message.
% @@ -158,6 +163,10 @@
% dumpstate(&kvmvars);
% if (rflag)
% reset(&kvmvars);
% + if (fflag) {
% + freebuf(&kvmvars);
% + disp = GMON_PROF_OFF;
% + }
% if (accessmode == O_RDWR)
% setprof(&kvmvars, disp);
% (void)fprintf(stdout, "kgmon: kernel profiling is %s.\n",
% @@ -403,36 +412,44 @@
% /*
% * Write out the arc info.
% */
% - if ((froms = (u_short *)malloc(kvp->gpm.fromssize)) == NULL)
% - errx(8, "cannot allocate froms space");
% - if (kflag) {
% - i = kvm_read(kvp->kd, (u_long)kvp->gpm.froms, (void *)froms,
% - kvp->gpm.fromssize);
% - } else {
% - mib[2] = GPROF_FROMS;
% - i = kvp->gpm.fromssize;
% - if (sysctl(mib, 3, froms, &i, NULL, 0) < 0)
% - i = 0;
% - }
% - if (i != kvp->gpm.fromssize)
% - errx(9, "read froms: read %lu, got %ld: %s",
% - kvp->gpm.fromssize, (long)i,
% - kflag ? kvm_geterr(kvp->kd) : strerror(errno));
% - if ((tos = (struct tostruct *)malloc(kvp->gpm.tossize)) == NULL)
% - errx(10, "cannot allocate tos space");
% - if (kflag) {
% - i = kvm_read(kvp->kd, (u_long)kvp->gpm.tos, (void *)tos,
% - kvp->gpm.tossize);
% - } else {
% - mib[2] = GPROF_TOS;
% - i = kvp->gpm.tossize;
% - if (sysctl(mib, 3, tos, &i, NULL, 0) < 0)
% - i = 0;
% - }
% - if (i != kvp->gpm.tossize)
% - errx(11, "read tos: read %lu, got %ld: %s",
% - kvp->gpm.tossize, (long)i,
% - kflag ? kvm_geterr(kvp->kd) : strerror(errno));
% + if (kvp->gpm.fromssize > 0) {
% + if ((froms = (u_short *)malloc(kvp->gpm.fromssize)) == NULL)
% + errx(8, "cannot allocate froms space");
% + if (kflag) {
% + i = kvm_read(kvp->kd, (u_long)kvp->gpm.froms,
% + (void *)froms,
% + kvp->gpm.fromssize);
% + } else {
% + mib[2] = GPROF_FROMS;
% + i = kvp->gpm.fromssize;
% + if (sysctl(mib, 3, froms, &i, NULL, 0) < 0)
% + i = 0;
% + }
% + if (i != kvp->gpm.fromssize)
% + errx(9, "read froms: read %lu, got %ld: %s",
% + kvp->gpm.fromssize, (long)i,
% + kflag ? kvm_geterr(kvp->kd) : strerror(errno));
% + }
% + if (kvp->gpm.tossize > 0) {
% + if ((tos = (struct tostruct *)malloc(kvp->gpm.tossize)) ==
% + NULL)
% + errx(10, "cannot allocate tos space");
% + if (kflag) {
% + i = kvm_read(kvp->kd, (u_long)kvp->gpm.tos,
% + (void *)tos,
% + kvp->gpm.tossize);
% + } else {
% + mib[2] = GPROF_TOS;
% + i = kvp->gpm.tossize;
% + if (sysctl(mib, 3, tos, &i, NULL, 0) < 0)
% + i = 0;
% + }
% + if (i != kvp->gpm.tossize)
% + errx(11, "read tos: read %lu, got %ld: %s",
% + kvp->gpm.tossize, (long)i,
% + kflag ? kvm_geterr(kvp->kd) : strerror(errno));
% + }
% +
The above is unreadable due to extra indentation and/or massive tab lossage.
Most but not all tabs are corrupted to spaces.
I think it requires the extra indentation as written, since `to' and `from'
sizes of 0 now become non-errors, so they are skipped over by putting them
in a conditional clause instead of aborted on. These cases would also
happen with full profiling configured but only flat profiling enabled. I'm
not sure how my FreeBSD-1 hack handled this.
% if (debug)
% warnx("lowpc 0x%lx, textsize 0x%lx",
% (unsigned long)kvp->gpm.lowpc, kvp->gpm.textsize);
% @@ -509,11 +526,13 @@
% if (kvm_write(kvp->kd, (u_long)kvp->gpm.kcount, zbuf,
% kvp->gpm.kcountsize) != kvp->gpm.kcountsize)
% errx(13, "tickbuf zero: %s", kvm_geterr(kvp->kd));
% - if (kvm_write(kvp->kd, (u_long)kvp->gpm.froms, zbuf,
% - kvp->gpm.fromssize) != kvp->gpm.fromssize)
% + if (kvp->gpm.fromssize > 0 &&
% + kvm_write(kvp->kd, (u_long)kvp->gpm.froms, zbuf,
% + kvp->gpm.fromssize) != kvp->gpm.fromssize)
% errx(14, "froms zero: %s", kvm_geterr(kvp->kd));
% - if (kvm_write(kvp->kd, (u_long)kvp->gpm.tos, zbuf,
% - kvp->gpm.tossize) != kvp->gpm.tossize)
% + if (kvp->gpm.tossize > 0 &&
% + kvm_write(kvp->kd, (u_long)kvp->gpm.tos, zbuf,
% + kvp->gpm.tossize) != kvp->gpm.tossize)
% errx(15, "tos zero: %s", kvm_geterr(kvp->kd));
% return;
% }
Now the indentation is not extra but tab lossage is still massive.
% @@ -524,11 +543,33 @@
% if (sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.kcountsize) < 0)
% err(13, "tickbuf zero");
% mib[2] = GPROF_FROMS;
% - if (sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.fromssize) < 0)
% + if (kvp->gpm.fromssize > 0 &&
% + sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.fromssize) < 0)
% err(14, "froms zero");
% mib[2] = GPROF_TOS;
% - if (sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.tossize) < 0)
% - err(15, "tos zero");
% + if (kvp->gpm.tossize > 0 &&
% + sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.tossize) < 0)
Tab lossage on every changed line, as usual.
% (void)seteuid(getuid());
% free(zbuf);
% }
% +
% +/*
% + * Free the kernel profiling date structures.
% + */
% +void
% +freebuf(kvp)
% + struct kvmvars *kvp;
% +{
% + int mib[3];
% +
% + setprof(kvp, GMON_PROF_OFF);
% + if (kflag)
% + return;
% + (void)seteuid(0);
% + mib[0] = CTL_KERN;
% + mib[1] = KERN_PROF;
% + mib[2] = GPROF_FREEBUF;
% + if (sysctl(mib, 3, NULL, 0, NULL, 0) < 0)
% + err(16, "Freeing profiling buffer failed");
% + (void)seteuid(getuid());
% +}
No tab in sight in new code.
Error message are not capitalized in KNF. This rule was followed in all
the old error messages in this program.
% Index: sys/kern/kern_clock.c
% ===================================================================
% --- sys/kern/kern_clock.c (revision 209745)
% +++ sys/kern/kern_clock.c (working copy)
% @@ -68,9 +68,7 @@
% #include <sys/limits.h>
% #include <sys/timetc.h>
%
% -#ifdef GPROF
% #include <sys/gmon.h>
% -#endif
%
% #ifdef HWPMC_HOOKS
% #include <sys/pmckern.h>
% @@ -714,10 +712,8 @@
% profclock(int usermode, uintfptr_t pc)
% {
% struct thread *td;
% -#ifdef GPROF
% struct gmonparam *g;
% uintfptr_t i;
% -#endif
%
% td = curthread;
% if (usermode) {
% @@ -730,7 +726,6 @@
% if (td->td_proc->p_flag & P_PROFIL)
% addupc_intr(td, pc, 1);
% }
% -#ifdef GPROF
% else {
% /*
% * Kernel statistics are just like addupc_intr, only easier.
% @@ -743,7 +738,6 @@
% }
% }
% }
% -#endif
% }
%
% /*
Seems to be correct. Everything except initialization is here, and the
changes for it are remarkably simple and low overhead.
% Index: sys/kern/subr_prof.c
% ===================================================================
% --- sys/kern/subr_prof.c (revision 209745)
% +++ sys/kern/subr_prof.c (working copy)
% @@ -44,7 +44,6 @@
%
% #include <machine/cpu.h>
%
% -#ifdef GPROF
% #include <sys/malloc.h>
% #include <sys/gmon.h>
% #undef MCOUNT
% @@ -136,7 +135,46 @@
% free(cp, M_GPROF);
% }
%
% +#ifndef GPROF
% +/* Allocate kcount buffer and initialize state */
Style bugs:
- missing "."
- no description (except the ifdef) that this only applies to the unusual
case of non-static configuration.
% +static int
% +prof_alloc_kcountbuf(void)
The prof_ namespace is not used by profiling code.
% +{
% + char *cp;
% + struct gmonparam *p = &_gmonparam;
% + int error = 0;
Style bug: initialization in declaration. Old profiling code has this
for p but not for errno.
% +
% + mtx_lock(&Giant);
There are lots of race problems in old and new profiling initialization
and reinitialization code, and this has little effect on them. Higher
level profiling code in kern_clock.c is mostly locked by time_lock,
and the parts that use time_lock are correct AFAIK. However, this
locking is not used by profclock(). profclock() used to use sched_lock,
but now uses no explicit locking whatsoever. It is just called from
the low-level timer interrupt handler, and that is required to be a
fast interrupt handler so that it has interrupts disabled on the current
CPU, and there is the pseudo-locking that profclock() is only called
on 1 CPU at intervals of 1/profhz, so the calls shouldn't overlap.
time_lock is a spinlock, so acquiring it locks out profclock() due to
the bugfeature that spinlocks lock out all fast interrupt handlers.
% + if (p->kcount == NULL) {
% + cp = (char *)malloc(p->kcountsize, M_GPROF, M_NOWAIT);
Style bug: useless cast copied from old code. This is not C++.
The locking shouldn't be to use a spinlock, since calling malloc() while
holding a spinlock is or should be a panic. For more ideas on correct
locking, see below.
% + if (cp == 0) {
% + printf("No memory for profiling.\n");
Style bugs:
- null pointer constants are not spelled "0" in the kernel
- kernel error messages are not capitalized
- kernel error messages are not terminated by a ".".
% + error = ENOMEM;
% + goto out;
% + }
% + bzero(cp, p->kcountsize);
% + p->kcount = (HISTCOUNTER *)cp;
More excessive casting, consistent with old code (*blush*).
% + }
% +out:
% + mtx_unlock(&Giant);
% + return error;
Style bug: missing parentheses.
% +}
Locking is not so critical for allocation. You can simply rely on the
profiling buffer not being used while it is an in-between state for the
same reason that it wasn't used before it existed at all -- that the
state variable prevents its use. You just need locking to prevent the
state variable changing underneath you. Locking in the above is neither
necessary nor sufficient for this, no matter what it is (without further
changes to access the state variable in the above). No locking at all
is required in the above.
% +
% static void
% +prof_free_kcountbuf(void)
% +{
% + struct gmonparam *p = &_gmonparam;
% +
% + mtx_lock(&Giant);
% + if (p->kcount != NULL) {
% + free(p->kcount, M_GPROF);
% + p->kcount = (HISTCOUNTER *)NULL;
% + }
% + mtx_unlock(&Giant);
% +}
As above for locking and casts, etc. Now it is critical to change the
state variable to "off" before calling the above (this is done) and to
keep it off (this is not done).
% +#endif
The above has massive tab lossage, as usual.
Old locking bugs:
- note that the boot-time initialization (kmstartup()) uses no locking
except for a couple of critical_enter()s, and that these are broken.
This depends on the state being "off" initially and nothing external
turning it on, which still works as intended AFAIK.. The critical_exit()s
were correct when I wrote them initially as disable_intr()'s. They
are supposed to prevent _all_ interrupts so as to get accurate and
no interference from interrupt routines. critical_enter() only stops
preemption -- it allows all fast interrupt handlers and all low-level
code for normal interrupts to run, so using it here is very broken.
Even if the initialization is so early as to prevent any interrupts,
this is unclear.
- dynamic reinitialization to support modules (kmupetext()) uses Giant
and critical_enter() locking. This depends on things that aren't
true. Giant locking has no effect on profiling (though it may have
had a small effect when it was committed, since IIRC the places that
use time_lock used to use Giant locking; anyway, this effect is almost
useless here). It used to use disable_intr() instead of critical_enter().
This originally worked for UP systems, but for SMP systems it didn't
lock out profclock(). Now, critical_enter() doesn't even lock out
profclock() on UP systems, so changing to using it is even more broken
here.
It is necessary to set the state variable to "off" and keep it off,
similarly to this change.
kmupetext() also uses M_WAITOK malloc()s. This is simpler and
probably good enough, but using M_NOWAIT as in this patch is safer.
Now I wonder what is the size of the largest M_WAITOK malloc() that
can be done by a privileged user. If it is larger than the flat
profiling buffer size in this patch then there is no point in not
waiting in this patch.
% +
% +static void
% kmstartup(dummy)
% void *dummy;
% {
% @@ -152,6 +190,7 @@
% int nullfunc_loop_profiled_time;
% uintfptr_t tmp_addr;
% #endif
% + int bufsize;
%
% /*
% * Round lowpc and highpc to multiples of the density we're using
% @@ -164,6 +203,7 @@
% p->textsize, (uintmax_t)p->lowpc, (uintmax_t)p->highpc);
% p->kcountsize = p->textsize / HISTFRACTION;
% p->hashfraction = HASHFRACTION;
% +#ifdef GPROF
% p->fromssize = p->textsize / HASHFRACTION;
% p->tolimit = p->textsize * ARCDENSITY / 100;
% if (p->tolimit < MINARCS)
% @@ -171,13 +211,30 @@
% else if (p->tolimit > MAXARCS)
% p->tolimit = MAXARCS;
% p->tossize = p->tolimit * sizeof(struct tostruct);
% - cp = (char *)malloc(p->kcountsize + p->fromssize + p->tossize,
% - M_GPROF, M_WAITOK | M_ZERO);
% + bufsize = p->kcountsize + p->fromssize + p->tossize;
Hmm, no tab lossage in small sections.
% +#else
% + p->fromssize = 0;
% + p->tolimit = 0;
% + p->tossize = 0;
% + bufsize = 0;
% + p->kcount = (HISTCOUNTER *)NULL;
Massive tab lossage resumes.
I think p is statically initialized to 0 and thus all of the code in thos
`#else' clause is dead.
% +#endif
% + cp = NULL;
% + if (bufsize > 0)
% + cp = (char *)malloc(bufsize, M_GPROF, M_WAITOK | M_ZERO);
% +#ifdef GPROF
% p->tos = (struct tostruct *)cp;
% cp += p->tossize;
% +#else
% + p->tos = (struct tostruct *)NULL;
% +#endif
% p->kcount = (HISTCOUNTER *)cp;
% cp += p->kcountsize;
% +#ifdef GPROF
% p->froms = (u_short *)cp;
% +#else
% + p->froms = (u_short *)NULL;
% +#endif
% p->histcounter_type = FUNCTION_ALIGNMENT / HISTFRACTION * NBBY;
%
% #ifdef GUPROF
% @@ -351,6 +408,12 @@
% } else if (state == GMON_PROF_ON) {
% gp->state = GMON_PROF_OFF;
% stopguprof(gp);
I don't see any locking here. Are all sysctls still locked by Giant?
Then we have almost enough locking here, and the Giant locking in the
alloc/free code is null. We are missing something to propagate the
state change to other CPUs, even for the current "working" version,
except possibly in the GUPROF case, if stopguprof() does it accidentally
(but it doesn't). This doesn't matter much if all the buffers have
been allocated and won't be freed or moved, but for the patch and for
kmupetext() we are freeing or moving (kmupetext() actually doesn't
even change the state to off on the current CPU like this does).
Giant or almost any locking is enough once we have propagated the change
here: only code near here is allowed to change the state, so Giant or
whatever locking prevents it changing. Later when the state is changed
to on, we should be careful to propagate the change of all the pointers
in *gp to other CPUs before setting the state to on.
% +#ifndef GUPROF
% + /* Allocate kcount buffer and initialize state */
Comment with style bug repeated ad nauseum.
% + error = prof_alloc_kcountbuf();
% + if (error)
% + return error;
Style bug: missing parentheses.
Sorry I made the code and critical locking changes unreadable by pointing
out the style bugs.
% +#endif
% gp->profrate = profhz;
% PROC_LOCK(&proc0);
% startprofclock(&proc0);
% @@ -368,15 +431,24 @@
% } else if (state != gp->state)
% return (EINVAL);
% return (0);
Many examples of KNF with non-missing parentheses in nearby code.
% +#ifndef GPROF
% + if (gp->state != GMON_PROF_OFF)
% + return EINVAL;
Style bug: missing parentheses.
Now we depend on a previous sysctl having set the state to off, and need
locking to keep it off while we are changing things.
% + /* Free kcount buffer */
Now the comment with style bug is not repeated, but it is still of
negative use since the function name is too verbose and says more than
the comment.
% + prof_free_kcountbuf();
% +#endif
% + return 0;
Style bug: missing parentheses.
After return, and hopefully not before, another sysctl may change the
state to on, and we just need something to propagate the pointers in
*gp to other CPUs. This is more critical for alloc. Any locking for
the whole sysctl guarantees this. So does the Giant locking in the
alloc/free functions (and similarly for propagating the state change),
but I think it is an obfuscation of using nested Giant locking for its
side affect of memory ordering (locking of the whole operation is still
needed for exclusive access).
% case GPROF_COUNT:
% return (sysctl_handle_opaque(oidp,
% gp->kcount, gp->kcountsize, req));
% +#ifdef GPROF
% case GPROF_FROMS:
% return (sysctl_handle_opaque(oidp,
% gp->froms, gp->fromssize, req));
% case GPROF_TOS:
% return (sysctl_handle_opaque(oidp,
% gp->tos, gp->tossize, req));
% +#endif
% case GPROF_GMONPARAM:
% return (sysctl_handle_opaque(oidp, gp, sizeof *gp, req));
% default:
% @@ -386,7 +458,6 @@
% }
%
% SYSCTL_NODE(_kern, KERN_PROF, prof, CTLFLAG_RW, sysctl_kern_prof, "");
% -#endif /* GPROF */
%
% /*
% * Profiling system call.
% Index: sys/sys/gmon.h
% ===================================================================
% --- sys/sys/gmon.h (revision 209745)
% +++ sys/sys/gmon.h (working copy)
% @@ -197,6 +197,7 @@
% #define GPROF_FROMS 2 /* struct: from location hash bucket */
% #define GPROF_TOS 3 /* struct: destination/count structure */
% #define GPROF_GMONPARAM 4 /* struct: profiling parameters (see above) */
Examples of normal style in nearby code: tab after #define, and tabs to indent.
% +#define GPROF_FREEBUF 5 /* int: free flat profiling buffer */
Style bugs:
- space after #define
- massive tab lossage to indent, as usual.
%
% #ifdef _KERNEL
%
Bruce
More information about the freebsd-hackers
mailing list