ECC support
Bob Bishop
rb at gid.co.uk
Wed Sep 16 07:52:07 UTC 2015
Hi,
Arriving late to this thread, a few observations:
- Obviously the more RAM you have, the more errors you are going to see. In other words, ECC makes increasing sense as RAM sizes get larger. All server-class hardware should have it.
- DRAM has to be refreshed. In sensible designs, ECC scrub is integrated with refresh to minimise overhead. It doesn’t have to be very frequent, maybe every 24 hours.
- On server-class hardware, the platform management (BMC or whatever) should be picking up, logging, and possibly alarming on ECC errors regardless of the OS.
- You might think that as memory density increases (ie bit cell size shrinks), error rates would increase. Apparently this wasn’t so up to 2009 at least, see:
http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
which reports on a study of these issues across Google’s estate at the time. I don’t know of any more recent similar work.
--
Bob Bishop
rb at gid.co.uk
More information about the freebsd-hardware
mailing list