12.1 RELEASE General Protection Fault (Trap 9)

Christos Chatzaras chris at cretaforce.gr
Wed Jan 22 16:48:25 UTC 2020



> On 22 Jan 2020, at 18:36, matthew at freebsd.org <matthew at FreeBSD.org> wrote:
> 
> On 22/01/2020 15:12, Jason Van Patten wrote:
>> Since sometime before Christmas (as far as I know), my NAS has started randomly crashing, reloading, and saving cores in /var/crash.  It was doing this with 12.0 and now with 12.1.  My gut tells me it's hardware related, but I'm not quite sure.  The various bits and pieces are:
> 
> Given the crashes do not appear to be associated with any particular activity, I think you're on the money with your diagnosis that it is hardware related.
> 
> Did you change any of the hardware on this system recently?  If you've added more disks or such, then you may have overloaded the PSU.  If the PSU can't produce voltages in spec, then you will see random crashes, although I doubt in that case you'ld always see 'General PRotection Fault'.  Unless this is a new machine, or you've changed some of the hardware this is unlikely to be the diagnosis.
> 
> Otherwise, suspect hardware problems.  In rough order of expense, least to most:
> 
>   * Bad heatsink, failed case fan, CPU thermal paste not up to snuff
>     or other cause that may lead to your system overheating
> 
>   * Bad memory
> 
>   * Bad CPU
> 
> The first of these is relatively cheap and easy to handle: make sure you're getting unimpeded airflow through the chassis -- clean any filters, make sure fans are spinning correctly and that heatsinks have good thermal contact, if necessary by renewing any thermal paste. Monitoring the CPU temperature will help here -- if you see the CPU temperature increasing just before everything goes kaput, that's a fairly solid diagnostic. For an i7, you should be able to use the coretemp(4) kernel module and read-off the temperature from the dev.cpu.%d.temperature sysctls.
> 
> Memory problems can frequently be diagnosed by use of a memory checker like sysutils/memtest86+ -- if this says you have a problem, then you do have a problem.  However, it may not catch every possible memory problem so it can wrongly give you an 'all clear'.  It's pretty accurate in practice though.  A more definitive test is to swap out any suspect RAM modules and see if the problem goes away.
> 
> The worst case is a bad CPU.  memtest86+ will diagnose some CPU faults, but it is less effective on CPU problems.  If there is a CPU problem, it will be a pretty subtle one, as typical symptoms of CPU problems are the system won't boot and the BIOS makes horrible beeping noises when you try.
> 
> Even so, this isn't a definitive list.  I've heard tales about trying to diagnose this sort of problem where someone had bit by bit swapped out all of the components of a system except for the case, and the problem still occurred.  Turned out the case was slightly bent and that put enough stress on the motherboard to cause some intermittent electrical connectivity.
> 
> 	Cheers,
> 
> 	Matthew

I had similar crashes and it was bad RAM.

I recommend to check RAM using the userland memtester if downtime is not an option.

Keep in mind that it's better to use memtest86+ as it can check all RAM.


More information about the freebsd-questions mailing list