Boot hang on Xen after r318347/(310418)
Adam McDougall
mcdouga9 at egr.msu.edu
Thu May 25 13:28:57 UTC 2017
On Thu, May 25, 2017 at 10:41:03AM +0100, Roger Pau Monné wrote:
> On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote:
> > Hello,
> >
> > Recently I made a new build of 11-STABLE but encountered a boot hang
> > at this state:
> > http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png
> >
> > It is easy to reproduce, I can just boot from any 11 or 12 ISO that
> > contains the commit.
>
> I have just tested latest HEAD (r318861) and stable/11 (r318854) and
> they both work fine on my environment (a VM with 4 vCPUs and 2GB of
> RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input,
> he has been doing some tests on HEAD and AFAIK he hasn't seen any
> issues.
>
> > I compiled various svn revisions to confirm that r318347 caused the
> > issue and r318346 is fine. With r318347 or later including the latest
> > 11-STABLE, the system will only boot with one virtual CPU in XenServer.
> > Any more cpus and it hangs. I also tried a 12 kernel from head this
> > afternoon and I have the same hang. I had this issue on XenServer 7
> > (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I
> > also did much of my testing with a GENERIC kernel to try to rule out
> > kernel configuration mistakes. When it hangs, the performance
> > monitoring in Xen tells me at least one CPU is pegged. r318674 boots
> > fine on physical hardware without Xen involved.
> >
> > Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing
> > r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to
> > my kernel but it turned the hang into a panic but with any number of
> > CPUs:
> > http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png
>
> I guess this is on stable/11 right? The panic looks easier to debug
> that the hang, so let's start by this one. Can you enable the serial
> console and kernel debug options in order to get a trace? With just
> this it's almost impossible to know what went wrong.
Yes this was on stable/11 amd64.
> If you still have that kernel around (and it's debug symbols), can you
> do:
>
> $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80793344
>
> (The address is the instruction pointer on the crash image, I think I
> got it right)
I'll reproduce this soon and get the results from that command.
> In order to compile a stable/11 kernel with full debugging support you
> will have to add:
>
> # For full debugger support use (turn off in stable branch):
> options BUF_TRACKING # Track buffer history
> options DDB # Support DDB.
> options FULL_BUF_TRACKING # Track more buffer history
> options GDB # Support remote GDB.
> options DEADLKRES # Enable the deadlock resolver
> options INVARIANTS # Enable calls of extra sanity checking
> options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS
> options WITNESS # Enable checks to detect deadlocks and cycles
> options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed
> options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones
>
> To your kernel config file.
I'll work on that soon too when I get a chance, thanks.
>
> Just to be sure, this is an amd64 kernel right?
yes
>
> Roger.
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
More information about the freebsd-stable
mailing list