Firefox crash during dtrace attach under -CURRENT

Mark Johnston markj at freebsd.org
Sun Oct 27 19:53:14 UTC 2013


On Fri, Oct 25, 2013 at 03:59:56PM +0100, symbolics at gmx.com wrote:
> On Fri, Oct 25, 2013 at 11:47:06AM +0100, symbolics at gmx.com wrote:
> > On Wed, Oct 23, 2013 at 10:59:02PM -0400, Mark Johnston wrote:
> > > On Wed, Oct 23, 2013 at 09:30:09PM +0100, symbolics at gmx.com wrote:
> > > > Hi,
> > > > 
> > > > http://dtrace.org/blogs/brendan/2011/02/11/dtrace-pid-provider-arguments/
> > > > 
> > > > I tried to follow some of the examples but I crash the Firefox process
> > > > each time. Sometimes DTrace manages to collect a little data before the
> > > > death.
> > > > 
> > > > [...]
> > > > 
> > > > Is this a known problem or should I send a PR?
> > > 
> > > Thanks for reporting this: I was able to reproduce the crash and managed
> > > to find a nasty pair of bugs. Could you test the patch below and let me
> > > know if it fixes the problem for you as well? If you see more crashes,
> > > please include the backtrace and signo from gdb again; it would likely
> > > be a different problem that needs to be debugged and fixed separately.
> > 
> > Hi Mark,
> > 
> > This helps but there still may be some issues. First time I used this
> > I found that when I killed the DTrace process Firefox went down too
> > with a SIGTRAP. I have a possibly unhelpful core from this:
> > 
> 
> Another data point. I attached to mutt and reviewed some of the calls it
> was making. Subsequently I killed DTrace, went to to look at other
> things and a while later when back to check my mail. On attempting to
> change into a different mail folder mutt died with a SIGTRAP. It seems
> like DTrace isn't tidying up after itself?
> 
> (gdb) bt
> #0  0x0000000800722541 in r_debug_state (rd=0x802425480, m=0x7fffffff6c28)
>     at /usr/home/dm/git/freebsd/libexec/rtld-elf/rtld.c:3491
> #1  0x0000000000000000 in ?? ()

Ok, I think I've figured out this one too. As you note, dtrace(1) isn't
cleaning up some of its breakpoints properly when it detaches. In
particular, it's not stopping the victim process before it tries to
remove breakpoints using ptrace(2); however, ptrace requires the target
process to be stopped, else it will return EBUSY. So the breakpoint in
the rtld gets left behind, and it turns out that r_debug_state() is called
every time a process tries to dlopen() a shared object.

mutt was a good example since it seems to dlopen() iconv-related stuff
as I scan through my inbox; one can inspect this with DTrace. :)
i.e. with something like

	'pid$target::dlopen:entry {trace(copyinstr(arg0));}'

With this observation it becomes easy to reproduce the problem using a
test program that does something like

	while (1) {
		dlopen("/lib/libnonexistent.so.100", RTLD_LAZY);
		sleep(1);
	}

A somewhat crude patch which fixes this for me is below; it just adds
code to send SIGSTOP to the target process before trying to remove
breakpoints. Does anyone see any problems with this? Perhaps it should
be libproc's responsibility to ensure that the victim process is stopped
before trying a ptrace(PT_IO, ...) to add/remove breakpoints?

Thanks,
-Mark

diff --git a/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c b/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c
index d40a0ae..6ed78e4 100644
--- a/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c
+++ b/cddl/contrib/opensolaris/lib/libdtrace/common/dt_proc.c
@@ -505,7 +505,7 @@ dt_proc_control(void *arg)
 	dt_proc_t *dpr = datap->dpcd_proc;
 	dt_proc_hash_t *dph = dpr->dpr_hdl->dt_procs;
 	struct ps_prochandle *P = dpr->dpr_proc;
-	int pid = dpr->dpr_pid;
+	int pid = dpr->dpr_pid, status;
 
 #if defined(sun)
 	int pfd = Pctlfd(P);
@@ -702,7 +702,22 @@ pwait_locked:
 	 */
 	(void) pthread_mutex_lock(&dpr->dpr_lock);
 
+#if defined(__FreeBSD__)
+	/*
+	 * On FreeBSD, the victim process must be stopped before ptrace(2) can
+	 * be used to remove breakpoints.
+	 */
+	if (kill(dpr->dpr_pid, SIGSTOP) == 0 &&
+	    wait4(dpr->dpr_pid, &status, WSTOPPED | WEXITED, NULL) != -1 &&
+	    WIFSTOPPED(status)) {
+		dt_proc_bpdestroy(dpr, B_TRUE);
+		kill(dpr->dpr_pid, SIGCONT);
+	} else
+		dt_dprintf("pid %d: failed to remove breakpoints\n",
+		    dpr->dpr_pid);
+#else
 	dt_proc_bpdestroy(dpr, B_TRUE);
+#endif
 	dpr->dpr_done = B_TRUE;
 	dpr->dpr_tid = 0;


More information about the freebsd-dtrace mailing list