HP DL 585 / ACPI ID / ECC Memory / Panic
Nikolaj Hansen
nikolaj.hansen at barnabas.dk
Thu May 12 15:18:48 UTC 2016
Hi,
I recently added a zfs disk array to my old HP 585 G1 Server.
Immediately there was kernel panics and I have spent quite a bit of time
figuring out what was really wrong.
The system has 4 cpu cards with opteron double core processors. Each
card has 4x2 gigabyte memory 4x2x4 = 32 gigabyte of total system mem.
The memory is DDR400 ECC mem.
The panic was very easily reproducable. I just had to issue enough reads
to the system up until the faulty mem was accessed.
Strangely I can run memtest86+ with the DDR setting on and I find no
error what so ever.
Adding
hint.lapic.2.disabled=1 > /boot/loader.conf
Immediately mitigates the error for FreeBSD. So here is my conclusion:
If you can make the system stable by disabling one core on one cpu card:
1) The other cards / mem must be ok.
2) The mainboard must be ok since one of the cores on the cpu is still
running / not barfing panics.
3) the cpu core with acpi 2 is probably also ok. it is on the same chip
as a non disabled core.
4) It is likely down to a rotten DIMM.
In place of mindlessly trying to find the culprit by switching dimms I
would really like to identify the CPU, card and mem module from the os.
Info here:
http://pastebin.com/jqufNKck
Thank you for your time and help.
--
Med venlig hilsen / with regards
Nikolaj Hansen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3753 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160512/59331393/attachment.bin>
More information about the freebsd-stable
mailing list