Diagnosing reboot under load

Alex Zbyslaw xfb52 at dial.pipex.com
Mon Nov 7 19:03:31 GMT 2005


Micah wrote:

> I'm really beginning to doubt it's the PSU.  Why?  I cannot get the 
> output voltage to drop no matter what load I throw at it.  I plugged 
> in four additional hard drives and ran a system stress test and still 
> the voltages remained rock steady at the values I stated earlier.  I 
> ran it for an hours with the high-low monitor on a Fluke multimeter.  
> The +5 stayed near 5.1 with 5.08 as the bottom, and the +12 stayed 
> near 11.89 with 11.84 as the minimum.  I even had one of the "random 
> segfaults" and the +12 voltage never dropped below 11.84.  I'm not 
> sure how I can get the load any higher without using resistors which 
> most certainly does not simulate the load I'm generating while compiling.
>
> That leaves memory, CPU or mobo.  I ran memtest86+ and it reported no 
> errors.  I'll run it again for an extended period of time while I'm at 
> school to see if it reports anything.  That leaves CPU and mobo.  
> Anyone got any ideas how to test those?  The only system test I can 
> run that does report an error is Lucifer 1.0 (on the ultimate boot 
> cd).  The mprime test and cpuburn do not find any errors.

The usual advice is to run memtest86 overnight, but I'm not convinced it 
will find a fault related to either temperature or load, since memtest 
seems to cause neither.  Still, worth a try.

When I was arsing around with overclocking, I could reliably crash the 
machine (IIRC) like this:
    run cpu burn
    run mprime (or was it a pi generator?  can't recall now...)
    wait for temp to hit max
    kill cpuburn!
    wait < 5 mins for either machine to crash or prime/pi test to have error

I was fairly convinced at the time that it was the memory which didn't 
cope.  This is possibly not far off what happens in a big series of 
stressful compiles.

As for diagnosing faults, you may be down to replacing components one at 
a time and seeing if it makes a difference.  That's easier when the 
machine crashes quickly, so if you can find something which reliably 
crashes it, that's good.  If you have >1 memory stick and the machine 
will run with a single stick, try each stick in turn.  You could also 
try deliberately under-performing the memory and see if that makes it 
reliable.  Was the memory you go on the compatibility list for the mobo?

Hope that helps,

--Alex





More information about the freebsd-questions mailing list