Re: MCE: Does this look possibly like a slot issue?
- In reply to: Larry Rosenman : "Re: MCE: Does this look possibly like a slot issue?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 21 Jun 2022 20:27:53 UTC
On 2022-06-21 12:23, Larry Rosenman wrote: > On 06/21/2022 1:23 pm, Chris wrote: >> On 2022-06-20 17:23, Larry Rosenman wrote: >>> I'm seeing them constantly: >> FWIW it looks like a sync(ing) problem between your >> RAM && CPU cache. Are are your clocks set correctly >> for your CPU && RAM? Is your CPU too hot? Is the CPU >> cache ECC? >>> >>> root@freenas[~]# mcelog --dmi > > [snip] > > Hrm. IIRC all the BIOS parameters are default (I could be mistaken). It's > a > SuperMicro X8DTN+ motherboard with: > CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2400.22-MHz K8-class > CPU) > Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 > > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > > Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI> > AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> > AMD Features2=0x1<LAHF> > Structured Extended Features3=0x9c000000<IBPB,STIBP,L1DFL,SSBD> > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID > TSC: P-state invariant, performance statistics > real memory = 77309411328 (73728 MB) > avail memory = 75186962432 (71703 MB) > (2 packages, 6 core, 12-threads each) and 18 4GB sticks. > this ONE slot seems to be a problem. > > How would you recommend looking for an issue modulo pulling the 2 cpu > packages? When I ran into these errors it turned out to be a hot CPU as I recall. While I'm familiar with the hardware your using. I have no history with *your* equipment. The first 2 things I'd do given ECC is so sensitive, is replace/swap the PSU with a known good one. The CPU(s) should be re-seated && re-greased. The fans operate as intended? At that point a long session with sysutils/memtest86 or a buildworld session should tell you if everything is AOK. Frankly; as to testing memory; working with a single stick at a time would be more conclusive resulting in a shorter time to conclusion. HTH Chris