Spontaneous reboots on Intel i5 and FreeBSD 9.0
Warren Block
wblock at wonkity.com
Fri Jan 18 20:23:11 UTC 2013
On Fri, 18 Jan 2013, kpneal at pobox.com wrote:
> On Fri, Jan 18, 2013 at 09:48:05AM -0700, Ian Lepore wrote:
>> I tend to agree, a machine that starts rebooting spontaneously when
>> nothing significant changed and it used to be stable is usually a sign
>> of a failing power supply or memory.
>
> Agreed.
>
>> But I disagree about memtest86. It's probably not completely without
>> value, but to me its value is only negative: if it tells you memory is
>> bad, it is. If it tells you it's good, you know nothing. Over the
>> years I've had 5 dimms fail. memtest86 found the error in one of them,
>> but said all the others were fine in continuous 48-hour tests. I even
>> tried running the tests on multiple systems.
>>
>> The thing that always reliably finds bad memory for me
>> is /usr/ports/math/mprime run in test/benchmark mode. It often takes 24
>> or more hours of runtime, but it will find your bad memory.
>
> I've had "good" luck with gcc showing bad memory. If compiling a new kernel
> produces seg faults then I know I have a hardware problem. I've seen
> compilers at work failing due to bad memory as well.
>
> Some problems only happen with particular access patterns. So if a compiler
> works fine then, like memtest86, it doesn't say anything about the health
> of the hardware.
Most test tools are like that. They might diagnose something as bad,
but they often can't prove it is good. SMART has a reputation for not
finding any problems on disks that are failing, and capacitors that
aren't swollen or leaking still may not be working.
But diagnostic tools can at least give a hint. In my case, memtest
indicated a problem--a big problem. I removed one DIMM at random (there
were only two) and the problems and memtest errors both went away.
Replace the DIMM, and both came back.
More information about the freebsd-stable
mailing list