Hello (Again), Blog?

Okay, this thing didn’t get a whole lot of use after those NVME posts.

Let’s update it to a new, simpler, completely javascript free theme, so it won’t unexpectedly consume 100% cpu in peoples’ browsers (sorry, people!).

It’s a year later and I still blame Joe.

Zero to Driver: Continued

Over the weekend I kept tinkering with this NVME driver, because sometimes I am not good at work/life balance.

Tidying up and an optimization

Removed the no-longer-used nvme_io_txn() and dependencies on the hexdump utility library, leftover from early tests. Adjusted completion queue processing to only ring the doorbell that tells the hardware that the queue tail has advanced after draining the queue (thank you, Doug Gale, for pointing out this inefficiency in a comment on the CL).

Display more information, tidy the code that does it

NVME controllers are very flexible and I’m dumping a bunch of information about how whatever controller the driver is talking to is configured. Now we dump even more, but tidy the code that does it up a bit, adjust the formatting to be a bit more consistent, and prepare to stash some of that information in the driver’s nvme_device_t structure so we can reference it from code that may want to take advantage of optional features in the future:

Here’s some typical output from the driver, for a Crucial M.2 NVME SSD:

nvme: version 1.2.0
nvme: page size: (MPSMIN): 4096 (MPSMAX): 4096
nvme: doorbell stride: 4
nvme: timeout: 65536 ms
nvme: boot partition support (BPS): N
nvme: supports NVM command set (CSS:NVM): Y
nvme: subsystem reset supported (NSSRS): N
nvme: weighted-round-robin (AMS:WRR): Y
nvme: vendor-specific arbitration (AMS:VS): N
nvme: contiquous queues required (CQR): Y
nvme: maximum queue entries supported (MQES): 65536
nvme: model:         'Force MP500'
nvme: serial number: '17457994000122410152'
nvme: firmware:      'E7FM04.5'
nvme: max outstanding commands: 0
nvme: max namespaces: 1
nvme: scatter gather lists (SGL): N 00000000
nvme: max data transfer: 2097152 bytes
nvme: sanitize caps: 0
nvme: abort command limit (ACL): 4
nvme: asynch event req limit (AERL): 4
nvme: firmware: slots: 1 reset: Y slot1ro: N
nvme: host buffer: min/preferred: 0/0 pages
nvme: capacity: total/unalloc: 0/0
nvme: volatile write cache (VWC): Y
nvme: atomic write unit (AWUN)/(AWUPF): 256/1 blks
nvme: feature: FORMAT_NVM
nvme: feature: SECURITY_SEND_RECV
nvme: feature: SAVE_SELECT_NONZERO
nvme: ns: atomic write unit (AWUN)/(AWUPF): 256/1 blks
nvme: ns: NABSN/NABO/NABSPF/NOIOB: 255/0/0/0
nvme: ns: LBA FMT 00: RP=1 LBADS=2^9b MS=0b
nvme: ns: LBA FMT 01: RP=0 LBADS=2^12b MS=0b
nvme: ns: LBA FMT #0 active
nvme: ns: data protection: caps/set: 0x00/0
nvme: ns: size/cap/util: 234441648/234441648/234441648 blks

Later this will be much reduced and only displayed if verbose debug chatter is requested.

Some cleanup, make Plextor devices work

Added code to cancel in-flight transactions when the driver shuts down in nvme_release(). Added some (disabled) code to do a SHUTDOWN operation before RESET – a Plextor SSD was erroring out when configuring the IO completion queue and my initial theory was maybe it didn’t like the abrupt reset. Turned out I was erroneously setting the namespace ID in the CREATE QUEUE command – neither Qemu, nor 6 other different NVME controllers cared about this, but the Plextor controller is more particular about spec adherence here!

So this change also tidies up the various setup commands and leaves the namespace ID as 0 for the several commands where it should be zero.


I still haven’t sorted out why the legacy PCI IRQs are not working properly with Qemu. I put together a patch for Qemu to support PCI MSI interrupts which works around that problem from the other side, removed the polling hack I had in the driver for use on Qemu, and filed a bug against the owner of our PCI subsystem so he can investigate the legacy IRQ interaction when there’s time.

The patch to Qemu (which we should tidy up and send upstream as we’ve done with other Qemu patches) is currently applied to our local Qemu tree over here:

Zero to Driver

This week I wrote a minimal NVME storage driver for Zircon. As usual, I used Gerrit as a place to backup my work-in-progress from time to time. The end results are a (possibly interesting) window into how I go from a zero to functional driver. The first section presents the very first shell of a driver I checked in. Each following section shows the diffs from the version preceeding it to that version of the driver. These focus on nvme.c, the guts of the driver, but nvme-hw.h (the header file with registers and structures and such) and rules.mk (the Zircon build system file) are also present.

NVME M.2 modules

Where to start?

I like to read the documentation with an editor open to a header file where I type up constants and structures and macros and such for register access, data structures, and whathaveyou. It helps me wrap my head around the hardware.

A minimal shell of a driver that simply dumps parameters from the device and resets it. Useful to start getting some data from real hardware (since NVME has a lot of controller-specific parameters to look at):

Make it do something, anything

First interaction with the hardware! Submit an IDENTIFY command to the Admin Submission Queue and observe a reply from the Admin Completion Queue. Hexdump the results for inspection:

Start making it a little more real

Factor Admin Queue processing out into dedicated functions, provide a convenience function for transactions, wire up interrupts so we don’t have to spin on the completion status. Decode some of the information from the IDENTIFY command and display it. Issue an IDENTIFY NAMESPACE command as well. Actually publish a device instead of just failing.

Time to stop polling and actually use interrupts

Setup an IO submission and completion queue as well (preparation to doing actual disk IO) and fiddle with IRQ setup a bit while trying to figure out why IRQs work on HW but not in Qemu.

Now things get a bit more complicated

Some #if 0'd code down in nvme_init() where I experimented with IO READ ops to verify that I understood how the command structure and prp list worked. Added a QEMU_IRQ_HACK to use polling instead of IRQs so I could test with Qemu as well. Start sketching out IO operation processing, with the concept of breaking iotxns down into utxns that are 1:1 with nvme io operations. Introduce #defines for a bunch of magic numbers, some more comments, and an IO processing thread. Wire up the device ops nvme_get_size(), nvme_ioctl(), and nvme_queue_iotxn() which will be needed for this to act as a real block device.

Make it actually work

Until now, anything trying to open the driver or interact with it would fail or hang. It was a bunch of code that poked at the hardware when loaded but didn’t do anything beyond that.

Build out the IO processing with io_process_cpls() to handle completion messages from the HW and io_process_txns() to handle subdividing iotxns into utxns and issuing IO commands to the hardware. Not done yet, and not code reviewed, but the iochk multithreaded disk io exerciser runs against devices published by this driver without failing or causing the driver to crash, so yay!

Some clean up

Fix a bug where the io thread would spin instead of wait when there was no io pending. Add some simple stat counters (which helped detect this bug).

The specifications do not necessarily reflect the truth on the ground

Especially in the case where the peripheral is complex and has a bunch of optional features, tunables, etc, it’s worth exploring what actual hardware is capable of before depending on a feature nobody supports. For example, NVME allows the queues for submitting commands to be physically discontiguous, but no hardware I’ve seen so far supports that. Similarly it supports a (required) simple scatter/gather page-list (PRP) and an (optional) fancier scatter/gather format that’s much more flexible (SGL). Turns out no hardware I’ve seen supports SGLs either.

I’ve been collecting various parameters that different NVME controllers report, which are useful to see since if you just go by what the spec says is possible you get a very different picture of what hardware might be like…

NVME device features

If you want to learn more…

The NVME specs themselves live over here: http://nvmexpress.org/resources/specifications/


Hello, Blog?

Well, first post, I guess!

I’m giving Hugo a try as a way to host my occasional ramblings.

It’s not perfect, but the basic static site generator thing makes sense to me and is pretty straightforward. I edit markdown locally, preview with its built in webserver, and then push the generated static content to my machine in the cloud.

I notice that the template (I think) is referencing javascript and crap from various CDNs, which I’m not thrilled about, but will sort out how to fix that later.

Why this? Why now? I blame my friend Joe:

<nebkor> this is totally selfish, and I understand, but I wish you had a regular blog instead of g+
<swetland> me too!
<swetland> g+ is terrible for this
<swetland> but everything else is terrible too. and then I go down the rathole of writing a CMS
<nebkor> *sigh*
<nebkor> yeah
<nebkor> I mean
<geist> geocities man