One-line executive summary: If you face slowness, hangs, filesystem corruption and/or lots of dmesg errors about ata try adding ” libata.force=noncq ” to your linux kernel boot options.
I recently bought a Crucial branded 128 GB solid state disk (SSD) model CT128M225 of the M225 range because they’re the hot new thing to get.
The disk would work well with my copy of Windows 7 RC but to my dismay Linux would spew a lot of disk error messages as you can see here: http://pastebin.ubuntu.com/347122/
I tried the usual STFW-ing and asking around in the Linux, Crucial.com and a few tech forums but to no avail. Apparently I’m one of the first to run an SSD on Linux, no way! There was some info about SMART errors with SSDs but the suggested workaround(s) didn’t work for me.
I’d almost resigned to the fact that either my SSD model would not work with current Linux versions or *shock* perhaps my particular disk was defective but then I took another closer look at the logs.
Searched again but this time for one of the error messages “failed command READ FPDMA QUEUED” and saw references to NCQ (native command queuing). Did some more digging around and hit some info about the Linux kernel’s ATA library option ” libata.force=noncq ” which when I tried seemed to resolve these issues!
Some other users (of Kingston, OCZ and Intel SSDs) mentioned they did not face such issues and apparently Intel has a good solid NCQ implementation from online docs.
Curious as to whether the root cause of the problems was a defect in my SSD’s firmware or a bug in the Linux kernel I visited the online linux cross reference site here http://lxr.linux.no/ and did some browsing. Found the file drivers/ata/libata-core.c where I noticed a line referencing OCZ SSD and a parameter called ATA_HORKAGE_NONCQ.
It didn’t take too long to put two and two together and eventually spent the rest of the day today patching three lines of code into the kernel to automatically detect my SSD model range and disable NCQ so as to avoid these problems in the first place. Hopefully it will benefit other unsuspecting or less tech savvy users for which this could be a serious problem.
It was a coincidence that I had recently upgraded my slow 512 kbps DSL connection to 8 mbps in preparation to attend some online Ubuntu classroom sessions about bug fixing etc. called “Ubuntu Developer Week” and this allowed me to quickly download all the development tools and source code, make the patch, build the source and binary package and upload it.
I sent an email to the Linux kernel mailing list with the trivial patch – but looks like it’s going to be tough to get these hardcore developers to accept it: http://lkml.org/lkml/2010/1/26/185
Anyway, I visited my nearly 5 year old Ubuntu Launchpad account site at https://launchpad.net/~vishalrao which was gathering dust, signed the Ubuntu Code of Conduct, added my OpenPGP signing key, created a “kernels” PPA (personal package archive) and uploaded my patched kernel source. Lets see if it builds or not! Check back here: https://launchpad.net/~vishalrao/+archive/kernels