Kernel/Compiler bug

Fri Oct 3 14:12:54 UTC 2014

On Fri, Oct 03, 2014 at 10:35:17AM +0300, Konstantin Belousov wrote:
> On Thu, Oct 02, 2014 at 09:54:56PM -0400, Larry Baird wrote:
> > > The easiest thing to do is to record the stack depth for kernel mode
> > > on entry into interrupt.  Interrupt handlers are usually well written
> > > and do not consume a lot of stack.
> > > 
> > > Look at the intr_event_handle(), which is the entry point. The mode can
> > > be deduced from trapframe passed. The kernel stack for the thread is
> > > described by td->td_kstack (base, i.e. bottom) and td->td_kstack_pages
> > > (size), so the top of the stack is at td_kstack + td_kstack_size [*].
> > > The current stack consumption could be taken from reading %rsp register,
> > > or you may take the address of any local variable as well.
> > > 
> > > * - there are pcb and usermode fpu save area at the top of the stack, and
> > > actual kernel stack top is right below fpu save area.  This should not
> > > be important for your measurements, since you are looking at how close
> > > the %rsp gets to the bottom.
> > 
> > This idea worked very well.  Booting a GENERIC 10.1-BETA3 kernel I get a
> > maximum stack used of 5247 bytes. This was with a minimal virtual box
> > configuration. It would be interesting to hear about users with more exotic
> > hardware and or configurations. Not sure if I have the KASSERT correct.
> > 
> I have several notes.  Mostly, it comes from my desire to make the patch
> committable.
>
> A global one is that the profiling of the stack use should be hidden
> under some kernel config option.
My initial interest last night when I whipped up the code, was too prove that
kernel threads have some head room before 10.1 is released.  Now that I have
a better feeling about that, lets see about getting the code cleaned up.

Last night I briefly thought about a kernel config option.  My first thought
was DIAGNOSTIC, but DIAGNOSTIC brings in too many other bits. How does
DEBUG_KERNEL_THREAD_STACK sound for option?

> > 
> > Index: kern_intr.c
> > ===================================================================
> > --- kern_intr.c	(revision 44897)
> > +++ kern_intr.c	(working copy)
> > @@ -1386,6 +1386,12 @@
> >  	}
> >  }
> >  
> > +static int max_kern_thread_stack;
> Add 'usage' somewhere in the name of the var and sysctl ?
I was initially confused by this comment, but now I think I see what you are
asking. How about changing vaiable name to "max_kernal_thread_stack_used"?

> > +
> > +SYSCTL_INT(_kern, OID_AUTO, max_kern_thread_stack, CTLFLAG_RD,
> > +    &max_kern_thread_stack, 0,
> > +    "Maxiumum stack used by a kernel thread");
> > +
> >  /*
> >   * Main interrupt handling body.
> >   *
> > @@ -1407,6 +1413,22 @@
> >  
> >  	td = curthread;
> >  
> > +	/*
> > +	 * Check for maximum stack used bya kernel thread.
> > +	 */
> > +	if (!TRAPF_USERMODE(frame)) {
> Just a note, the test for interruption of the usermode is not strictly
> needed, it only optimizes the execution, since interrupt from usermode
> would have only the trap frame on the stack.  Might be, this should
> be commented.
> 
> > +	    char *top = (char *)(td->td_kstack + td->td_kstack_pages *
> > +		PAGE_SIZE - 1);
> > +	    char *current = (char *)&ih;
> Use the address of top ?  It should be deeper in the stack, and account
> for the normal current function stack use, assuming compiler did not
> flatten out the frame.  Anyway, it assumes that the stack grows down.
I was thinking it as top of memory used by stack verse bottom of stack.

> Also, there are some situations, where the hardware might switch to
> dedicated stack for interrupt handling.  It is impossible right now
> on amd64 and hw interrupts, but might become used in future, or on
> other arches.  It makes sense to check that current value falls into
> the td stack region, before using it.
> 
> > +	    int used = top - current;
> > +
> > +	    if (used > max_kern_thread_stack) {
> > +		max_kern_thread_stack = used;
> This should be a loop with atomic cas, to not loose update from other
> thread.
I briefly thought about using atomic update, but it was getting late and I
wanted to get something out the door. I'll update logic.

> > +		KASSERT(max_kern_thread_stack < KSTACK_PAGES * PAGE_SIZE,
> > +		    "Maximum kernel thread stack exxceeded");
> Assert is not needed, we put a guard page below the thread stack to
> catch the overflow.  You have seen it already with double fault on x86.
Double fault didn't really show where issue was.  I was hoping an assert
would slap a developer upside their head when their code exceeded the stack. (-:

> > +	    }
> > +	}
> > +
> >  	/* An interrupt with no event or handlers is a stray interrupt. */
> >  	if (ie == NULL || TAILQ_EMPTY(&ie->ie_handlers))
> >  		return (EINVAL);

I'll cleanup the code over the weekend and repost.

Larry

-- 
------------------------------------------------------------------------
Larry Baird
Global Technology Associates, Inc. 1992-2012 	| http://www.gta.com
Celebrating Twenty Years of Software Innovation | Orlando, FL
Email: lab at gta.com                 		| TEL 407-380-0220