FreeBSD 8.0-BETA2/amd64 crashes on SMP under load
Anton Shterenlikht
mexas at bristol.ac.uk
Tue Jul 28 16:13:07 UTC 2009
On Tue, Jul 28, 2009 at 08:34:52AM -0700, Marcel Moolenaar wrote:
>
> On Jul 28, 2009, at 7:45 AM, Anton Shterenlikht wrote:
>
> > On Tue, Jul 28, 2009 at 02:22:50PM +0000, O. Hartmann wrote:
> >> Anton Shterenlikht wrote:
> >>> On Mon, Jul 27, 2009 at 10:04:28PM +0100, Anton Shterenlikht wrote:
> >>>> On Mon, Jul 27, 2009 at 09:55:12PM +0200, O. Hartmann wrote:
> >>>>> Kamigishi Rei wrote:
> >>>>>> O. Hartmann wrote:
> >>>>>>> I have the problem of crashing FreeBSD 8.0-BETA2/amd64 under
> >>>>>>> load on
> >>>>>>> all of our SMP boxes. Is there an issue known at the moment?
> >>>>>>> If not, I
> >>>>>>> will prepare the kernel for whitnessing and provide more
> >>>>>>> informations,
> >>>>>>> if you wish.
> >>>>>> A quick question: what is in the crash message, i.e. the
> >>>>>> backtrace?
> >>>>>> And what kind of crash is it - a panic() or a fatal trap?
> >>>>> On the 8-core server box, I sometimes see :
> >>>>>
> >>>>> Fatal trap 12: page fault while in kernel mode
> >>>>> fault code = supervisor read, page not present
> >>>> Not sure if it's related, but on ia64 SMP (2 cpus) with 8.0-
> >>>> current and
> >>>> later with 8.0-beta1 (I havent' built beta2 yet) I'm getting
> >>>> crashes
> >>>> under load every so often. E.g buildworld -j8 is likely to crash
> >>>> the
> >>>> box. No messages, just a sudden freeze, no backtrace or panic,
> >>>> and then reboot.
> >>>>
> >>>> If load is less heavy, e.g. fewer processes and some idle time, the
> >>>> problem doesn't seem to appear.
> >>>>
> >>>> I'm happy to do any further testing, if suggested.
> >>>
> >>> my ia64 8.0-beta1 SMP box died again on
> >>> make -j8 buildworld
> >>> with no panic or log entries.
> >>>
> >>> Is it possible that some kernel variable needs to
> >>> be increased? E.g. kern.maxproc, kern.maxfiles, etc.
> >>> Or perhaps I'm talking complete rubbish..
> >>>
> >>
> >> I suggest you try again with a UP kernel - a suggestion from a
> >> kernel-nnob, sorry. My SMP boxes work now with UP-kernel, but they
> >> are
> >> really slowish although they have modern Intel C2D/Penryn cores.
> >
> > I need SMP for OpenMP codes. It's a shame if SMP is buggy, but
> > I guess all is down to small user base..
>
> I have no problems with SMP. If you don't have a panic, then
> you may have a hardware problem.
yes.. I thought of this myself. I guess I ought to check
the Event Logs available from MP on rx2600. But those messages
are so cryptic..
> Check for MCA records.
# mca
mca: no error records found
# sysctl hw.mca
hw.mca.last: 0
hw.mca.first: 0
hw.mca.count: 0
Faulty DIMMs, as you've suggested, would explain a lot of
my problems..
many thanks
--
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 928 8233
Fax: +44 (0)117 929 4423
More information about the freebsd-ia64
mailing list