Botched NCQ on SSD - cannot disable?

Warner Losh imp at bsdimp.com
Thu May 21 18:54:32 UTC 2015


> On May 21, 2015, at 12:42 PM, Neffi <nefftd at gmail.com> wrote:
> 
> I was discussing this issue in freenode/#freebsd and I was recommended to shoot an email to you fellows about it.
> 
> I've got an Samsung 840 EVO SSD (model MZ-7TE250BW), which uses Samsung's own controller from what I can gather. I had issues of mass data corruption when used under Linux, and several programs crashing unexpectedly when used under FreeBSD. I've gone through 2 drives under warranty with the same issue before customer service suggested to disable drive queuing.
> 
> After some research it seems as though this drive (and several other common SSDs) report that they support NCQ, but in fact are botched and will have all sorts of problems with NCQ enabled ranging from poor performance, to I/O stalls to data corruption.
> 
> Sure enough the logs on Linux spit out something along the lines of:
> 
> > ata1: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10 frozen
> > ata1.00: failed command: READ FPDMA QUEUED
> 
> This happens several times when used on Linux, in the few hours leading up to total filesystem corruption.
> 
> The recommendation in the Linux world is to disable NCQ on these drives, for which there is an easy boot-time tunable for it. This fixes the issue. No more data corruption.
> 
> There doesn't seem to be a tunable for this anywhere on FreeBSD. camcontrol(8) mentions setting the tags used, but only between some hardcoded limits, with a default of 2 -- not sufficient to disable NCQ on the drive. It looks like presently the only option is to manually patch the quirks for this drive in the kernel and recompile before I can even install the system to the drive.

One option is to use drives that don’t suck so bad.

If you are using the AHCI controller, it has quirks for some cards that don’t properly fill in the NCQ tags, but so far that’s a tiny list of mostly older gear. What’s the host controller you are using.

Also, just because the command that hung on the drive is an NCQ command, that doesn’t mean disabling NCQ commands will keep you safe. That’s just the first one that’s issued after the firmware wedges (or could be: that’s a very common scenario for this kind of failure mode).

There’s a quirk for the 840 EVO, but that’s just to force 4k sector size.

While I haven’t used this generation of Samsung SSDs, I’d be highly surprised if this issue was really a problem in the drive instead of some cabling issue, or other environmental issue leading the the wedge.

It’s true there’s no way to totally disable NCQ, but if the drive is hanging with NCQ depth of 2, I’d be highly surprised if it is actually NCQ causing this...

Warner

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20150521/16f08615/attachment.sig>


More information about the freebsd-hackers mailing list