Re: MCE: Does this look possibly like a slot issue?
Date: Tue, 21 Jun 2022 18:23:24 UTC
On 2022-06-20 17:23, Larry Rosenman wrote: > I'm seeing them constantly: FWIW it looks like a sync(ing) problem between your RAM && CPU cache. Are are your clocks set correctly for your CPU && RAM? Is your CPU too hot? Is the CPU cache ECC? > > root@freenas[~]# mcelog --dmi > Hardware event. This is not a software error. > MCE 0 > CPU 22 BANK 8 TSC 20aab486464a > MISC ac29890200046444 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 44 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > WARNING: SMBIOS data is often unreliable. Take with a grain of salt! > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 1 > CPU 22 BANK 8 TSC 296dfcc82582 > MISC ac29890200041381 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 81 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 2 > CPU 22 BANK 8 TSC 2a5604a6a070 > MISC ac29890200044281 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory ECC error occurred during scrub > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 81 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 88000040000200cf MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > Hardware event. This is not a software error. > MCE 3 > CPU 22 BANK 8 TSC 31e141418eb8 > MISC ac29890200046a4a ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 4a > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 4 > CPU 22 BANK 8 TSC 3a014afee106 > MISC ac29890200046646 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 46 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 5 > CPU 22 BANK 8 TSC 41d1dbef1a6a > MISC ac29890200046141 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 41 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 6 > CPU 22 BANK 8 TSC 4a1b1ecef446 > MISC ac29890200046a4a ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 4a > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 7 > CPU 22 BANK 8 TSC 527bc27db776 > MISC ac29890200040386 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 86 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 8 > CPU 22 BANK 8 TSC 5aa4ecdd795a > MISC ac29890200046646 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 46 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > root@freenas[~]# > > and I replaced the DIMM yesterday :( > > On 06/20/2022 7:19 pm, Ultima wrote: > >> Hey Larry, >> >> It is possible it's the motherboard itself, but it's rare. The way I >> would determine this is to swap the DIMM module with another >> populated slot on the motherboard and see if the error migrated >> to the new slot or not. Also, this error doesn't necessarily mean >> there is a problem that needs to be addressed. If you have been >> running the system for many months and you see ECC errors a >> handful of times, it can probably be safely ignored. >> >> Best regards, >> Richard Gallamore >> >> On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <ler@lerctr.org> wrote: >> >>> I've gotten a BUNCH of these on my TrueNAS server. I've replaced this >>> DIMM a couple of times, and still the MCE's continue. >>> Is it possible it's Motherboard slot issue? >>> >>> Hardware event. This is not a software error. >>> MCE 8 >>> CPU 22 BANK 8 TSC 5aa4ecdd795a >>> MISC ac29890200046646 ADDR ee2f6e800 >>> TIME 1655762472 Mon Jun 20 17:01:12 2022 >>> MCG status: >>> Memory read ECC error >>> Memory corrected error count (CORE_ERR_CNT): 1 >>> Memory transaction Tracker ID (RTId): 46 >>> Memory DIMM ID of error: 0 >>> Memory channel ID of error: 1 >>> Memory ECC syndrome: ac298902 >>> STATUS 8c0000400001009f MCGSTATUS 0 >>> MCGCAP 1c09 APICID 34 SOCKETID 0 >>> CPUID Vendor Intel Family 6 Model 44 Step 2 >>> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >>> Device Locator: P2-DIMM2C >>> Bank Locator: BANK14 >>> Manufacturer: Hyundai >>> Serial Number: 40F3C20F >>> Asset Tag: >>> Part Number: HMT151R7BFR4C-H9 >>> >>> -- >>> Larry Rosenman http://www.lerctr.org/~ler >>> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >>> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106