mpr(4) SAS3008 Repeated Crashing

Borja Marcos borjam at sarenet.es
Fri Mar 4 08:02:35 UTC 2016


> On 03 Mar 2016, at 18:09, Scott Long <scott4long at yahoo.com> wrote:
> 
> 
> SYNC CACHE seems to have been involved this time, and while it’s sometimes a source of trouble with SATA disks, I’m very hesitant to blame it.  Given the seemingly random nature of your problems, I’m not as certain anymore to rule out a fault of the disk enclosure.  This looks to be a different disk than your last report, and your statement that a sibling system exhibits no problems is very interesting.  Maybe there’s an issue with the power supply, and the disks are getting under-voltage conditions periodically.  If you can run smartctl against the disks, the output might be useful.  Also, if you’re able, could you make sure that both this system and the one that is working well are being fed with sufficient and similar AC power?  And if the power supply modules in your enclosures are swappable, maybe swap them between systems and see if the problem follows the module?  If that doesn’t fix it then I’ll think of ways to provide more instrumentation.

The affected disks are completely random. I didn’t copy a lot of instances to avoid too much litter, but each time it’s a different disk.

Both systems are in the same datacenter, and yes, the power infrastructure is working. Swapping modules can be done if
the dealer sends us another one because I prefer not to mess with a working system.

The fact that it’s a different disk each time, and that the other system works perfectly is what makes me quite certain that it’s a hardware problem. Either some trouble
with the backplane or a power problem.

I am tempted to go the oscilloscope route (monitoring the internal power rails). But if the problem is in the power distribution of the backplane itself
I’ll need to destroy a broken disk to build a backplane power probe :)




Borja.



More information about the freebsd-scsi mailing list