Need Help With MCA Code
Tim Daneliuk
tundra at tundraware.com
Fri Jan 31 17:49:06 UTC 2014
On 01/31/2014 11:22 AM, John Baldwin wrote:
> On Wednesday, January 29, 2014 6:49:21 pm Tim Daneliuk wrote:
>> Resending in hopes that people on one of the other lists will have some insight here:
>>
>> On 01/27/2014 10:50 PM, Tim Daneliuk wrote:
>>> I am running 9.2 stable i386 r261207. As noted earlier:
>>>
>>>> I just replaced mobo/CPU on FBSD server (Gigabyte Z-87-D3HP with
>>>> an Intel i3-4130). I am not overclocking ... but I continue to see this sort of thing:
>>>
>>>> MCA: CPU 0 COR (1) internal parity error
>>>
>>> Dmesg shows:
>>>
>>>> MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 0
>>>> MCA: CPU 0 COR (1) internal parity error
>>>> MCA: Bank 0, Status 0x90000040000f0005
>>>> MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000000_
>>>
>>> I've swapped CPUs (i5). I've fiddled with an endless supply of
>>> mobo settings. I've switched power supplies. I've moved mem
>>> sticks around .... No joy.
>>>
>>> So, I dug through the sources and found this:
>>>
>>>
>>>
>>> mca_log(const struct mca_record *rec)
>>> {
>>> uint16_t mca_error;
>>>
>>> printf("MCA: Bank %d, Status 0x%016llx\n", rec->mr_bank,
>>> (long long)rec->mr_status);
>>> printf("MCA: Global Cap 0x%016llx, Status 0x%016llx\n",
>>> (long long)rec->mr_mcg_cap, (long long)rec->mr_mcg_status);
>>> printf("MCA: Vendor \"%s\", ID 0x%x, APIC ID %d\n", cpu_vendor,
>>> rec->mr_cpu_id, rec->mr_apic_id);
>>> printf("MCA: CPU %d ", rec->mr_cpu);
>>> if (rec->mr_status & MC_STATUS_UC)
>>> printf("UNCOR ");
>>> else {
>>> printf("COR ");
>>> if (rec->mr_mcg_cap & MCG_CAP_CMCI_P)
>>> printf("(%lld) ", ((long long)rec->mr_status &
>>> MC_STATUS_COR_COUNT) >> 38);
>>> }
>>>
>>>
>>> It looks like the trailing else clause is kicking out the error but I am
>>> unclear what the error means, beyond the fact that it appears to be a parity
>>> error somewhere within the CPU's internal memory (cache?). Is this error
>>> getting corrected? Is this benign, Should I get a different mobo?
>>>
>>> Um .... Haaaaalp :)
>>
>>
>> I have now tried different motherboards, CPUs, memory, and power supplies and
>> this error is still showing up now and then.
>>
>> This points strongly to either FreeBSD bogus reporting, or these errors being
>> benign. It's hard to believe that the exact same error might occur with
>> completely different hardware ... unless it's being caused by the case.
>
> Are they all the same model CPU? Since it is a corrected error you can
> probably ignore it, but it is not bogus reporting. FreeBSD only reports
> these errors because they show up in registers on your CPU.
>
It's looking like this is an artifact of running 9.2-STABLE i386 on that hardware.
I just installed 10-STABLE x64 and am beating the hardware to death and have yet
to see an MCA check.
It *is* possible the 9.2 install is boogered up (I went to grad school to learn how
to say that), so I am pursuing a full rebuild of the server. While painful, this
will also finally move this machine to x64 which is long overdue.
--
----------------------------------------------------------------------------
Tim Daneliuk tundra at tundraware.com
PGP Key: http://www.tundraware.com/PGP/
More information about the freebsd-hardware
mailing list