Re: MCE: Does this look possibly like a slot issue?
- Reply: Larry Rosenman : "Re: MCE: Does this look possibly like a slot issue?"
- In reply to: Ultima : "Re: MCE: Does this look possibly like a slot issue?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 21 Jun 2022 00:59:52 UTC
SuperMicro X8DTN+ 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2400.20-MHz K8-class CPU) I'll bring it down and swap DIMMS around On 06/20/2022 7:57 pm, Ultima wrote: > Hey Larry, > > One red flag I am seeing is that the error is being produced on > the same CPU/bank with each error you have provided so far. > > Can you try and follow my original recommendation and swap > currently installed DIMM with the problem DIMM slot and see > if anything changes? > > Can you also provide the motherboard model? Also, do you > have multiple CPUs installed in this system? > > Best regards, > Richard Gallamore > > On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman <ler@lerctr.org> wrote: > > Yes and Yes. > > On 06/20/2022 7:37 pm, Ultima wrote: > > Are you sure that the module you replaced it with was good? > Are you sure you replaced the correct module? > > Best regards, > Richard Gallamore > > On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman <ler@lerctr.org> wrote: > > I'm seeing them constantly: > > root@freenas[~]# mcelog --dmi > Hardware event. This is not a software error. > MCE 0 > CPU 22 BANK 8 TSC 20aab486464a > MISC ac29890200046444 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 44 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > WARNING: SMBIOS data is often unreliable. Take with a grain of salt! > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 1 > CPU 22 BANK 8 TSC 296dfcc82582 > MISC ac29890200041381 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 81 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 2 > CPU 22 BANK 8 TSC 2a5604a6a070 > MISC ac29890200044281 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory ECC error occurred during scrub > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 81 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 88000040000200cf MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > Hardware event. This is not a software error. > MCE 3 > CPU 22 BANK 8 TSC 31e141418eb8 > MISC ac29890200046a4a ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 4a > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 4 > CPU 22 BANK 8 TSC 3a014afee106 > MISC ac29890200046646 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 46 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 5 > CPU 22 BANK 8 TSC 41d1dbef1a6a > MISC ac29890200046141 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 41 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 6 > CPU 22 BANK 8 TSC 4a1b1ecef446 > MISC ac29890200046a4a ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 4a > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 7 > CPU 22 BANK 8 TSC 527bc27db776 > MISC ac29890200040386 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 86 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > Hardware event. This is not a software error. > MCE 8 > CPU 22 BANK 8 TSC 5aa4ecdd795a > MISC ac29890200046646 ADDR ee2f6e800 > TIME 1655770989 Mon Jun 20 19:23:09 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 46 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > root@freenas[~]# > > and I replaced the DIMM yesterday :( > > On 06/20/2022 7:19 pm, Ultima wrote: > > Hey Larry, > > It is possible it's the motherboard itself, but it's rare. The way I > would determine this is to swap the DIMM module with another > populated slot on the motherboard and see if the error migrated > to the new slot or not. Also, this error doesn't necessarily mean > there is a problem that needs to be addressed. If you have been > running the system for many months and you see ECC errors a > handful of times, it can probably be safely ignored. > > Best regards, > Richard Gallamore > > On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <ler@lerctr.org> wrote: > I've gotten a BUNCH of these on my TrueNAS server. I've replaced this > DIMM a couple of times, and still the MCE's continue. > Is it possible it's Motherboard slot issue? > > Hardware event. This is not a software error. > MCE 8 > CPU 22 BANK 8 TSC 5aa4ecdd795a > MISC ac29890200046646 ADDR ee2f6e800 > TIME 1655762472 Mon Jun 20 17:01:12 2022 > MCG status: > Memory read ECC error > Memory corrected error count (CORE_ERR_CNT): 1 > Memory transaction Tracker ID (RTId): 46 > Memory DIMM ID of error: 0 > Memory channel ID of error: 1 > Memory ECC syndrome: ac298902 > STATUS 8c0000400001009f MCGSTATUS 0 > MCGCAP 1c09 APICID 34 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 Step 2 > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > Device Locator: P2-DIMM2C > Bank Locator: BANK14 > Manufacturer: Hyundai > Serial Number: 40F3C20F > Asset Tag: > Part Number: HMT151R7BFR4C-H9 > > -- > Larry Rosenman http://www.lerctr.org/~ler > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: ler@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: ler@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: ler@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106