Zero to Driver: Continued

Over the weekend I kept tinkering with this NVME driver, because sometimes I am not good at work/life balance.

Tidying up and an optimization

Removed the no-longer-used nvme_io_txn() and dependencies on the hexdump utility library, leftover from early tests. Adjusted completion queue processing to only ring the doorbell that tells the hardware that the queue tail has advanced after draining the queue (thank you, Doug Gale, for pointing out this inefficiency in a comment on the CL).

Display more information, tidy the code that does it

NVME controllers are very flexible and I’m dumping a bunch of information about how whatever controller the driver is talking to is configured. Now we dump even more, but tidy the code that does it up a bit, adjust the formatting to be a bit more consistent, and prepare to stash some of that information in the driver’s nvme_device_t structure so we can reference it from code that may want to take advantage of optional features in the future:

Here’s some typical output from the driver, for a Crucial M.2 NVME SSD:

nvme: version 1.2.0
nvme: page size: (MPSMIN): 4096 (MPSMAX): 4096
nvme: doorbell stride: 4
nvme: timeout: 65536 ms
nvme: boot partition support (BPS): N
nvme: supports NVM command set (CSS:NVM): Y
nvme: subsystem reset supported (NSSRS): N
nvme: weighted-round-robin (AMS:WRR): Y
nvme: vendor-specific arbitration (AMS:VS): N
nvme: contiquous queues required (CQR): Y
nvme: maximum queue entries supported (MQES): 65536
nvme: model:         'Force MP500'
nvme: serial number: '17457994000122410152'
nvme: firmware:      'E7FM04.5'
nvme: max outstanding commands: 0
nvme: max namespaces: 1
nvme: scatter gather lists (SGL): N 00000000
nvme: max data transfer: 2097152 bytes
nvme: sanitize caps: 0
nvme: abort command limit (ACL): 4
nvme: asynch event req limit (AERL): 4
nvme: firmware: slots: 1 reset: Y slot1ro: N
nvme: host buffer: min/preferred: 0/0 pages
nvme: capacity: total/unalloc: 0/0
nvme: volatile write cache (VWC): Y
nvme: atomic write unit (AWUN)/(AWUPF): 256/1 blks
nvme: feature: FORMAT_NVM
nvme: feature: SECURITY_SEND_RECV
nvme: feature: SAVE_SELECT_NONZERO
nvme: ns: atomic write unit (AWUN)/(AWUPF): 256/1 blks
nvme: ns: NABSN/NABO/NABSPF/NOIOB: 255/0/0/0
nvme: ns: LBA FMT 00: RP=1 LBADS=2^9b MS=0b
nvme: ns: LBA FMT 01: RP=0 LBADS=2^12b MS=0b
nvme: ns: LBA FMT #0 active
nvme: ns: data protection: caps/set: 0x00/0
nvme: ns: size/cap/util: 234441648/234441648/234441648 blks

Later this will be much reduced and only displayed if verbose debug chatter is requested.

Some cleanup, make Plextor devices work

Added code to cancel in-flight transactions when the driver shuts down in nvme_release(). Added some (disabled) code to do a SHUTDOWN operation before RESET – a Plextor SSD was erroring out when configuring the IO completion queue and my initial theory was maybe it didn’t like the abrupt reset. Turned out I was erroneously setting the namespace ID in the CREATE QUEUE command – neither Qemu, nor 6 other different NVME controllers cared about this, but the Plextor controller is more particular about spec adherence here!

So this change also tidies up the various setup commands and leaves the namespace ID as 0 for the several commands where it should be zero.


I still haven’t sorted out why the legacy PCI IRQs are not working properly with Qemu. I put together a patch for Qemu to support PCI MSI interrupts which works around that problem from the other side, removed the polling hack I had in the driver for use on Qemu, and filed a bug against the owner of our PCI subsystem so he can investigate the legacy IRQ interaction when there’s time.

The patch to Qemu (which we should tidy up and send upstream as we’ve done with other Qemu patches) is currently applied to our local Qemu tree over here: