NVMe

From OSDev Wiki
Jump to: navigation, search

Contents

Overview

  • NVMe controllers can be found as PCI devices with class code 1 and subclass code 8.
  • Its registers are accessible through BAR 0 (it should be 64-bit memory IO).
  • The controller processes commands submitted to it from "submission queues". The driver prepares commands in the queue's circular buffer in memory, and then updates the tail pointer register for the queue.
  • The controller may process commands in any order it likes.
  • When the controller has finished processing a command, it appends an entry to a "completion queue". The completion queue to use is specified when a submission queue is created. The controller sends an interrupt when a completion queue has available commands. The driver processes all new entries in the queue's circular buffer, and then updates the head pointer register for the queue.
  • At reset, only one submission queue and one completion queue exists. These are the admin queues. The driver sets their base addresses in the ASQ and ACQ registers.
  • The admin queues can process admin commands, such as creating IO queues (used to submit IO commands, like read/write sectors), and query information about the controller and drives (called "namespaces") connected to it.
  • The admin queues have identifiers of 0.

BAR0 registers

Offset Name Description
0x00-0x07 CAP Controller capabilities.
0x08-0x0B VS Version.
0x0C-0x0F INTMS Interrupt mask set.
0x10-0x13 INTMC Interrupt mask clear.
0x14-0x17 CC Controller configuration.
0x1C-0x1F CSTS Controller status.
0x24-0x27 AQA Admin queue attributes.
0x28-0x2F ASQ Admin submission queue.
0x30-0x37 ACQ Admin completion queue.
0x1000+(2X)*Y SQxTDBL Submission queue X tail doorbell.
0x1000+(2X+1)*Y CQxHDBL Completion queue X head doorbell.

Y is the doorbell stride, specified in the controller capabilities register.

Data structures

Submission queue entry

A submission queue entry - a command - is 64 bytes, arranged in 16 DWORDs.

DWORD Contents
0 Command DWORD 0 (see below)
1 NSID (namespace identifier). If n/a, set to 0.
2-3 Reserved.
4-5 Metadata pointer.
6-9 Data pointer. 2 PRPs (see next section).
10-15 Command specific.

Format of the Command DWORD 0:

Bits Contents
0-7 Opcode.
8-9 Fused operation. 0 indicates normal operation.
10-13 Reserved.
14-15 PRP or SGL selection. 0 indicates PRPs.
16-31 Command identifier. This is put in the completion queue entry.

PRP

A PRP (physical region page) is a 64-bit physical memory address. It must be DWORD aligned. A list of PRPs is used in a data transfer to specific, where data is transferred from/to in memory. A PRP list is subject to the follow rules:

  • The size of the region specified by a given PRP is the minimum of: the amount of data that can be transferred without crossing a page boundary; and the amount of data remaining to be transferred.
  • Only the first entry in a PRP list can be page misaligned.
  • If a PRP list is not long enough to cover the entire transfer, then the last entry chains to a page containing more PRP entries.

Completion queue entry

A completion queue entry is 16 bytes.

Bits Contents
0-31 Command specific.
32-63 Reserved.
64-79 Submission queue head pointer.
80-95 Submission queue identifier.
96-111 Command identifier.
112 Phase bit. Toggled when entry written.
113-127 Status field. 0 on success.

Where new entries end in the completion queue can be determined by inspecting the phase bit.

Commands

Admin commands

Create IO submission queue

  • Opcode is 0x01.
  • The base address of the queue should be put in the DWORDs 6 and 7 of the commands.
  • Command DWORD 10 contains the queue identifier in the low word, and the queue size in the high word. The queue size should be given as one less than the actual value.
  • Command DWORD 11 contains flags in the low word, and the completion queue identifier in the high word (where completion entries for this submission queue will be posted). Flag (1 << 0) indicates the queue is physically contiguous (recommended; non-contiguous are not supported by all controllers).

Create IO completion queue

  • Opcode is 0x05.
  • The base address of the queue should be put in the DWORDs 6 and 7 of the commands.
  • Command DWORD 10 contains the queue identifier in the low word, and the queue size in the high word. The queue size should be given as one less than the actual value.
  • Command DWORD 11 contains flags in the low word, and the interrupt vector in the high word. Flag (1 << 0) indicates the queue is physically contiguous (recommended; non-contiguous are not supported by all controllers), and flag (1 << 1) enables interrupts.

Identify

  • Opcode is 0x06.
  • The base address of the output (a single page) should be put in the DWORDs 6 and 7 of the command.
  • The low byte of command DWORD 10 indicates what is to be identified: 0 - a namespace, 1 - the controller, 2 - the namespace list.
  • If identifying a namespace, set DWORD 1 to the namespace ID.

IO commands

Read

  • Opcode is 0x02.
  • DWORD 1 contains the NSID.
  • DWORDs 6-9 contain the PRP list for the data transfer.
  • DWORDs 10-11 contain the starting LBA.
  • The low word of DWORD 12 contains the number of blocks to transfer. This should be given as one less than the actual value.

Write

  • Opcode is 0x01.
  • DWORD 1 contains the NSID.
  • DWORDs 6-9 contain the PRP list for the data transfer.
  • DWORDs 10-11 contain the starting LBA.
  • The low word of DWORD 12 contains the number of blocks to transfer. This should be given as one less than the actual value.

Checklist

Initialisation

  • Find PCI function with class code 0x01 and subclass code 0x08.
  • Enable interrupts, bus-mastering DMA, and memory space access in the PCI configuration space for the function.
  • Map BAR0.
  • Check the controller version is supported.
  • Check the capabilities register for support of the NVMe command set.
  • Check the capabilities register for support of the host's page size.
  • Reset the controller.
  • Set the controller configuration, and admin queue base addresses.
  • Start the controller.
  • Enable interrupts and register a handler.
  • Send the identify command to the controller. Check it is an IO controller. Record the maximum transfer size.
  • Reset the software progress marker, if implemented.
  • Create the first IO completion queue, and the first IO submission queue.
  • Identify active namespace IDs, and then identify individual namespaces. Record their block size, capacity and whether they are read-only.

Shutdown

  • Delete IO queues.
  • Inform the controller of shutdown.
  • Wait until CSTS.SHST updates.

Submitting a command

  • Build PRP lists.
  • Wait for space in the submission queue. The controller indicates its internal head pointer in completion queue entries.
  • Setup the command.
  • Update the queue tail doorbell register.

IRQ handler

  • For each completion queue, read all entries where the phase bit has been toggled.
  • Check the status of the commands.
  • Use the submission queue ID and command ID to work out which submitted command corresponds to this completion entry.
  • Update the completion queue head doorbell register.

See Also

External Links

Forum Threads

Personal tools
Namespaces
Variants
Actions
Navigation
About
Toolbox