Re: MCE: Does this look possibly like a slot issue?
Date: Tue, 21 Jun 2022 00:23:56 UTC
I'm seeing them constantly: root@freenas[~]# mcelog --dmi Hardware event. This is not a software error. MCE 0 CPU 22 BANK 8 TSC 20aab486464a MISC ac29890200046444 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 44 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 WARNING: SMBIOS data is often unreliable. Take with a grain of salt! DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 1 CPU 22 BANK 8 TSC 296dfcc82582 MISC ac29890200041381 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 2 CPU 22 BANK 8 TSC 2a5604a6a070 MISC ac29890200044281 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory ECC error occurred during scrub Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 88000040000200cf MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 3 CPU 22 BANK 8 TSC 31e141418eb8 MISC ac29890200046a4a ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 4a Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 4 CPU 22 BANK 8 TSC 3a014afee106 MISC ac29890200046646 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 46 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 5 CPU 22 BANK 8 TSC 41d1dbef1a6a MISC ac29890200046141 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 41 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 6 CPU 22 BANK 8 TSC 4a1b1ecef446 MISC ac29890200046a4a ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 4a Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 7 CPU 22 BANK 8 TSC 527bc27db776 MISC ac29890200040386 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 86 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 8 CPU 22 BANK 8 TSC 5aa4ecdd795a MISC ac29890200046646 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 46 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 root@freenas[~]# and I replaced the DIMM yesterday :( On 06/20/2022 7:19 pm, Ultima wrote: > Hey Larry, > > It is possible it's the motherboard itself, but it's rare. The way I > would determine this is to swap the DIMM module with another > populated slot on the motherboard and see if the error migrated > to the new slot or not. Also, this error doesn't necessarily mean > there is a problem that needs to be addressed. If you have been > running the system for many months and you see ECC errors a > handful of times, it can probably be safely ignored. > > Best regards, > Richard Gallamore > > On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <ler@lerctr.org> wrote: > >> I've gotten a BUNCH of these on my TrueNAS server. I've replaced this >> DIMM a couple of times, and still the MCE's continue. >> Is it possible it's Motherboard slot issue? >> >> Hardware event. This is not a software error. >> MCE 8 >> CPU 22 BANK 8 TSC 5aa4ecdd795a >> MISC ac29890200046646 ADDR ee2f6e800 >> TIME 1655762472 Mon Jun 20 17:01:12 2022 >> MCG status: >> Memory read ECC error >> Memory corrected error count (CORE_ERR_CNT): 1 >> Memory transaction Tracker ID (RTId): 46 >> Memory DIMM ID of error: 0 >> Memory channel ID of error: 1 >> Memory ECC syndrome: ac298902 >> STATUS 8c0000400001009f MCGSTATUS 0 >> MCGCAP 1c09 APICID 34 SOCKETID 0 >> CPUID Vendor Intel Family 6 Model 44 Step 2 >> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> Device Locator: P2-DIMM2C >> Bank Locator: BANK14 >> Manufacturer: Hyundai >> Serial Number: 40F3C20F >> Asset Tag: >> Part Number: HMT151R7BFR4C-H9 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: ler@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106