7.2-STABLE i386 box crashing -- clues?

David Wolfskill david at catwhisker.org
Thu Nov 12 12:59:05 UTC 2009


On Thu, Nov 12, 2009 at 05:27:09PM +1100, Peter Jeremy wrote:
> I can't offer any solutions but I have some more questions...

I appreciate the help!

> ...
> >Every once in a while, it just crashes -- hard.  It loses video output
> >at that point; Ctl+Alt+Esc doesn't appear to change anything; entering
> >(say) "reset" blindly at that point has no apparent effect.
> 
> Roughly how often?

For the current month:

albert(7.2-S)[8] last reboot shutdown
reboot           ~                         Thu Nov 12 03:04
reboot           ~                         Wed Nov 11 20:06
reboot           ~                         Wed Nov 11 14:42
shutdown         ~                         Wed Nov 11 14:40
reboot           ~                         Wed Nov 11 14:35
reboot           ~                         Wed Nov 11 10:05
reboot           ~                         Wed Nov 11 09:09
reboot           ~                         Wed Nov 11 04:25
reboot           ~                         Tue Nov 10 12:49
reboot           ~                         Mon Nov  9 14:52
reboot           ~                         Sun Nov  8 17:42
reboot           ~                         Sat Nov  7 04:22
reboot           ~                         Fri Nov  6 21:43
reboot           ~                         Fri Nov  6 19:00
reboot           ~                         Fri Nov  6 16:20
shutdown         ~                         Fri Nov  6 16:17
reboot           ~                         Fri Nov  6 16:03
reboot           ~                         Fri Nov  6 13:07
reboot           ~                         Fri Nov  6 09:46
reboot           ~                         Thu Nov  5 16:41
reboot           ~                         Thu Nov  5 13:32
reboot           ~                         Thu Nov  5 12:59
reboot           ~                         Thu Nov  5 10:17
reboot           ~                         Thu Nov  5 04:26
reboot           ~                         Wed Nov  4 20:32
reboot           ~                         Wed Nov  4 15:48
reboot           ~                         Wed Nov  4 10:37
reboot           ~                         Tue Nov  3 13:15
reboot           ~                         Tue Nov  3 10:55
reboot           ~                         Tue Nov  3 04:16
reboot           ~                         Mon Nov  2 18:13
reboot           ~                         Sun Nov  1 20:03
shutdown         ~                         Sun Nov  1 20:01
reboot           ~                         Sun Nov  1 17:10
reboot           ~                         Sun Nov  1 13:51
shutdown         ~                         Sun Nov  1 13:48

wtmp begins Sun Nov  1 05:08:18 PST 2009
albert(7.2-S)[9] 

The "solo reboots" are crashes; those paired with "shutdown" entries are
controlled.

> Has anything unusual happened lately?  Brownout, blackout, power surge,
> lightning, heatwave, ...

Nothing linked to the crashes.  I pulled the UPS out of service
some weeks ago because it needs new batteries; I need to get those
ordered.  But the crashes were happening before that, in any case.

> >accordingly, had attached a SCSI host adaptor via PCI riser card.  Since
> >I had nothing actually connected to the card, I pulled it out of the
> >machine before bringing it back up.
> 
> Did you also pull the riser card?  Riser cards don't have a spectacularly
> high reputation.

That's actually what I pulled.  The SCSI card itself is still physically
in the chassis, merely with an air gap between itself at the system
board (because the riser card is now in a closet).

> > (I also fleft around for
> >excessively warm spots; nothing.  All fans spin up, as well.)
> 
> I don't suppose you also studied the capacitors on the motherboard.
> Are any showing any signs of bulges?

I'll take another look for those; I recall that electrolytics exhibit
that as a sign of failure -- thanks for the reminder.

> Have you tried reseating everything?

The memory, yeah (even before replacing it); also swapped the DIMMs.
Only other thing that can be re-seated (desktop system board, so most
everything is built-in) would be the CPU, and I'm not quite sure how
that heat sink works.  I did re-seat some power connectors.

> >Flaky CPU?  Flaky power supply?  How might I tell?
> 
> CPU shouldn't go flaky unless it's been overheated.  In my experience,
> PSUs are the least reliable part of consumer-grade hardware but about
> the only way to check is to swap it.

:-}

> If you've got a DMM, you could check all the rails but there are
> lots of failure modes that won't show up that way.

Yeah, I kinda figured that.  I do have a DMM (used to have a VTVM), but
figured the meter wouldn't show transient dips or whatever too well.

> Have you checked the voltage/temperature screen in the BIOS?  Does
> anything look abnormal?

Did a couple of reality checks in that way as detours during some of the
reboots.  Nothing interesting there at all.  (And I have seen a case in
the past -- though with a 1U box) where that test definitely showed
something wrong (CPU temp climbing about 1C every 30 seconds, IIRC).

> Are you using a PS/2 or USB keyboard?

PS/2 via KVM.  I don't have any USB keyboarda.  :-}

> Are you running X?

Yes; the machine is configured to start xdm on transition to
multi--user, as my spouse used to use it as a desktop.  (She's gone back
to using its predecessor, a 4.11-STABLE machine, in frustration.)

> At this stage, my suggestion would be to try swapping the PSU.

Thanks.  I'll discuss it with the "family CFO."

Peace,
david
-- 
David H. Wolfskill				david at catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hardware/attachments/20091112/d2c3984f/attachment.pgp


More information about the freebsd-hardware mailing list