Blog coding article

Accessing Hardware in Rust

Jonathan
Article

Accessing Hardware in Rust

When it comes to accessing hardware, what should a good API look like?

Published on 29 min read

    Overview

    At Ferrous Systems, we write a lot of Rust code that runs on bare-metal - the code that is first to run when the processor comes out of reset, and code that has no higher power to call upon for assistance. Some of this we do as example code to discuss in our trainings, some we write for our clients, and some we publish as open-source - either through the Rust Embedded Devices Working Group, or through our own Knurling Project.

    One thing that we've observed is that there isn't a great deal of consistency when it comes to the APIs for interacting with underlying hardware. That's not necessarily a problem - different architectures and different peripherals may require different approaches - but I wanted to just jot down a few thoughts and observations I had about what makes a good API, and conversely some of the choices that can add friction when people are discovering Rust on a particular platform.

    Accessing the Hardware

    Rust lets us easily interact with values in memory. That is, we can create values from the fundamental types (the integers, the floats, bool, etc), and we can design our own types that combine those things together (structs, enums, etc). But none of this will actually get our machine to do anything - creating a variable like let led_on = true; does not, sadly, make an LED turn on. To get our machine to do something beyond storing/loading values to/from RAM, we need to head into unsafe Rust. This will let us perform operations that act on data that lies outside of the Rust compiler's model of our program, commanding the hardware (or an operating system kernel) to act. Unfortunately hardware can appear to the processor in different ways, and the right kind of unsafe operation will depend entirely upon the hardware you are trying to interact with. We'll look at three common examples next.

    I/O Read/Write

    Back in the early days of the IBM PC, and with the 8080-based CP/M machines that came before it, the processors had two address spaces - one for data and one for I/O. Almost all programming was done in the data address space, but when you wanted to talk to the hardware, you could use some special I/O instructions that could read from or write to I/O address space. Those of you who've been doing this as long as I have might remember magical numbers from the MS-DOS days, like 0x220 (the default I/O address of a Creative Labs SoundBlaster card), or 0x3F8 (the default I/O address of Serial Port COM1). These are addresses in I/O space, and are also known as ports.

    Phil Opperman's excellent blog on Writing an OS in Rust is a great example of bare-metal code for x86-64 using I/O reads and writes. Phil does this using the x86-64 crate and its Port abstraction, with code that looks like this:

    // in src/main.rs
    
    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
    #[repr(u32)]
    pub enum QemuExitCode {
        Success = 0x10,
        Failed = 0x11,
    }
    
    pub fn exit_qemu(exit_code: QemuExitCode) {
        use x86_64::instructions::port::Port;
    
        unsafe {
            let mut port = Port::new(0xf4);
            port.write(exit_code as u32);
        }
    }
    

    Inside the Port type, we see that it uses inline assembly to produce the appropriate OUT instruction when a user wants to write to the I/O port:

    impl PortWrite for u8 {
        #[inline]
        unsafe fn write_to_port(port: u16, value: u8) {
            unsafe {
                core::arch::asm!("out dx, al", in("dx") port, in("al") value, options(nomem, nostack, preserves_flags));
            }
        }
    }
    

    The IBM PC architecture also supports memory-mapped I/O (the video RAM on the original MDA video card was mapped to address 0xB0000, in addition to the I/O registers for configuration). In time though, and especially with the advent of the PCI bus, PC hardware switched over to using memory-mapped I/O for both configuration and data.

    Newer RISC archictures like Arm and RISC-V even skipped the idea of I/O instructions entirely - they mainly use memory-mapped I/O, but also have system registers for functions closely tied to the processor.

    System Registers

    A system register is a bit like an I/O port, except that instead of having an address, it has some other unique identifier. It is also functionally part of the processor rather than being outside of it as a separate peripheral.

    A good example is with the 32-bit Arm architecture version 7 for Real-Time systems (or Armv7-R to its friends). This has the idea of a co-processor which has numbered registers and we can move values from the co-processor into a normal processor register, using the MCR instruction:

    let value: usize;
    unsafe {
      // Read the MPIDR (*Multiprocessor Affinity Register*)
      core::arch::asm!("mcr p15, 0, {r}, c0, c0, 5", r = out(reg) value, options(nomem, nostack));
    }
    println!("MPIDR contains: {value:08x}");
    

    The co-processor isn't really a co-processor in the traditional sense of a second chip on your mainboard sitting next to your main processor. Arm is just using that mechanism to allow the processor to do things outside the normal Arm architecture - like answer the question "Which processor in a multi-processor system are you?" (as per the example above).

    The magic numbers (OP1 = 0, CRn = c0, CRm = c0, and OP2 = 5 in this case) aren't exactly memorable, and come described in huge tables as part of the Architecture Reference Manual. But, we can again build abstractions to make them easier to work with. The aarch32-cpu crate does something like this:

    pub struct Mpidr(pub u32);
    
    impl SysReg for Mpidr {
        const CP: u32 = 15;
        const CRN: u32 = 0;
        const OP1: u32 = 0;
        const CRM: u32 = 0;
        const OP2: u32 = 5;
    }
    
    impl SysRegRead for Mpidr {}
    
    impl Mpidr {
        #[inline]
        /// Reads MPIDR (*Multiprocessor Affinity Register*)
        pub fn read() -> Mpidr {
            // Safety: reading this co-processor register is always allowed, and
            // has no side-effects
            let value = unsafe { <Self as SysRegRead>::read_raw() };
            Self(value)
        }
    }
    
    // We expose a nice, safe, API
    let id: Mpidr = Mpidr::read();
    

    AArch64 systems also rely on the idea of system registers, but in this architecture Arm chose to give every system register a unique name instead:

    let value: usize;
    unsafe {
      // Read the MPIDR (*Multiprocessor Affinity Register*) for Exception Level 1
      core::arch::asm!("mrs {r:w}, MPIDR_EL1", r = out(reg) value, options(nomem, nostack));
    }
    println!("MPIDR contains: {value:08x}");
    

    I think we can agree that that is much easier to read than the 32-bit version above.

    RISC-V also use system-registers, and while each has a name, you have to use their unique numeric ID in your assembly code:

    let value: usize;
    unsafe {
      // Read the Hart ID with a *CSR Atomic Read and Set Bits* operation
      core::arch::asm!("csrrs {r}, 0xF14, x0", r = out(reg) value, options(nomem, nostack));
    }
    println!("HartID is: {value:08x}");
    

    In general, working with system registers means getting a usize sized integer into or out of the hardware. How we chose to break up that integer into smaller components is an important topic, but on that I want to come back to later.

    Memory-Mapped I/O

    As noted above, I/O operations and system register read/writes only get you a single unit of data at a time - typically an integer of machine word size (a usize in Rust parlance). However, if you want to deal with high-speed I/O interfaces, or large amounts of video memory, this quickly becomes a bottleneck. Instead, most modern computer systems simply present their I/O devices within the same address space as their memory - so-called memory-mapped I/O.

    The problem is that Rust only likes you to use variables that it knows about and it knows the location of. So if there's a Serial Port Transmit FIFO living at memory address 0xE020_5000, how do we tell Rust that writing here is OK? Well, we can use unsafe pointer operations, but we must be careful. What does this code do?

    pub fn write_string_to_uart(s: &str) {
        const UART0_TRANSMIT_FIFO: *mut u32 = 0xE020_5000 as *mut u32;
        for byte in s.bytes() {
            // Safety: This is where the UART FIFO lives
            unsafe { UART0_TRANSMIT_FIFO.write(byte as u32) };
        }
    }
    

    It writes out the string to the UART, right? Let's ask Godbolt's Compiler Explorer (with annotations from me):

    write_string_to_uart:
            cmp     r1, #0           // is string length zero?
            addne   r0, r0, r1       // if not, r0 = string start + string length
            movwne  r1, #20480       //         r1 = 0x5000
            movtne  r1, #57376       //         r1 |= 0xE020_0000
            ldrbne  r0, [r0, #-1]    //         load one byte from r0 - 1
            strne   r0, [r1]         //         write byte to UART FIFO
            bx      lr               // exit the function
    

    OK, first, shout-out to Arm's condition codes meaning that we don't need to waste space on a branch instruction because we can just mark every instruction with "only do this if the last compare was Not Equal". But also, where is my loop? This function only writes a single byte to my UART and I very clearly asked Rust to write all the bytes in my string.

    Well, the write method on *mut u32 believes you are writing to RAM. And what happens if you write 10 values to the same location in RAM? The first nine are over-written, and you only keep the last value. So the optimiser has helped us out! It spotted that we wrote to RAM in a loop, and it threw away the loop and kept only the final write. This is an excellent performance optimisation - but only if we are writing to memory. We actually want the writes to occur because they have side-effects. This is not just "putting a value in memory" but, "writing to this address causes a byte of data to appear on my UART's transmit pin". Well, there's a method for that. Sort of.

    pub fn write_string_to_uart(s: &str) {
        const UART0_TRANSMIT_FIFO: *mut u32 = 0xE020_5000 as *mut u32;
        for byte in s.bytes() {
            // Safety: This is where the UART FIFO lives
            unsafe { UART0_TRANSMIT_FIFO.write_volatile(byte as u32) };
            //                           ^^^^^^^^^^^^^^ - this is now a volatile operation
        }
    }
    

    That compiles to:

    write_string_to_uart:
        cmp     r1, #0           // is string length zero?
        bxeq    lr               // if it is, return from function
        movw    r2, #20480       // r2 = 0x5000
        movt    r2, #57376       // r2 |= 0xE020_0000
    .LBB1_2:
        ldrb    r3, [r0], #1     // load byte into r3 from address in r0, and increment r0 by 1
        subs    r1, r1, #1       // decrement remaining string length by 1
        str     r3, [r2]         // write byte to UART FIFO
        bne     .LBB1_2          // if remaining string length is not zero, loop back to label .LBB1_2
        bx      lr               // exit from function
    

    By using write_volatile, we say "these writes are important, so please do not optimise them away". This works OK, but there's an important caveat - we must only create pointers to MMIO addresses, and never references. That is, this Rust code is unsound:

    #[repr(C)]
    struct Uart {
        // write here to write to TX FIFO, read here to read from RX FIFO
        fifo: u32,
        // control the UART here
        control: u32,
        // get the status here
        status: u32,
    }
    
    let uart_ref: &Uart = unsafe { &*(0xE020_5000 as *mut Uart) };
    

    This is because references in Rust are known to LLVM to be dereferenceable, and anything dereferencable by LLVM can be dereferenced whenever LLVM feels like it, and not just when you expressly ask it to. This is a problem, because reading from that memory address (which is what dereferencing it would involve) has side-effects, and we do not want LLVM to do this whenever it feels like it - we'd be randomly throwing away characters from our UART FIFO. In practice, we observe few issues, but it's generally agreed that References to MMIO Address Space are Unsound and should be avoided.

    You might be thinking "well just use pointers then?". The problem is, Rust doesn't have a good syntax for saying "get me the pointer to this struct field, given a pointer to the start of the struct". You have to write something like:

    #[repr(C)]
    struct Uart {
        fifo: u32,
        control: u32,
        status: u32,
    }
    
    let uart_ptr: *mut Uart = unsafe { 0xE020_5000 as *mut Uart };
    // this does not create a temporary reference, but the syntax is awful
    let fifo_ptr = unsafe { &raw mut (*uart_ptr).fifo };
    unsafe { fifo_ptr.write_volatile(0x00) };
    

    We'll see some neat abstractions about MMIO when we talk about specific crates later on, because there's a bunch of different solutions you can pick from to solve this issue.

    Bitfields within Registers

    The three approaches above (I/O Read/Write, System Registers and Memory-Mapped I/O) all get us single usize sized units of data. However, hardware designers consider these registers to be a precious resource, and using a whole 32-bit value to simply record "Is this peripheral On or Off right now" is quite wasteful. That kind of information only requires a single bit, and there are 32 (or 64) bits in an integer. So, the designers like to pack as many different small values into a single integer as possible. Here's an example - the Interrupt FIFO Level Select Register, UARTIFLS for the Arm PL011 UART:

    Description Name Bits
    Reserved, do not modify, read as zero. - 31:6
    Receive interrupt FIFO level select. RXIFLSEL 5:3
    Transmit interrupt FIFO level select. TXIFLSEL 2:0

    The possible values for RXIFLSEL are:

    • 0b000 = Receive FIFO becomes ≥ 1/8 full
    • 0b001 = Receive FIFO becomes ≥ 1/4 full
    • 0b010 = Receive FIFO becomes ≥ 1/2 full
    • 0b011 = Receive FIFO becomes ≥ 3/4 full
    • 0b100 = Receive FIFO becomes ≥ 7/8 full
    • 0b101-0b111 = reserved.

    The possible values for TXIFLSEL are:

    • 0b000 = Transmit FIFO becomes ≤ 1/8 full
    • 0b001 = Transmit FIFO becomes ≤ 1/4 full
    • 0b010 = Transmit FIFO becomes ≤ 1/2 full
    • 0b011 = Transmit FIFO becomes ≤ 3/4 full
    • 0b100 = Transmit FIFO becomes ≤ 7/8 full
    • 0b101-0b111 = reserved.

    In both cases we see that the 32-bit register contains two 3-bit values, and a bunch of 'reserved space' (usually we write zeroes here and ignore anything we read, but your hardware's technical documentation will tell you what to do). Of the eight options for those 3-bit values, five have assigned meanings and three do not.

    The C Programming Language has the idea of 'bitfields', and in that language we could represent such a register with something like:

    struct {
      unsigned long ifls_reserved: 26;
      unsigned long ifls_rxifsel: 3;
      unsigned long ifls_txifsel: 3;
    }
    

    Unfortunately, Rust has no such features at the language level. Instead, we would typically provide methods to access the fields within a given register, using shifts and masks:

    pub struct Uartifls(u32);
    
    impl Uartifls {
        pub fn get_rxifsel(&self) -> Option<Rxifsel> {
            match (self.0 >> 3) & 0b111 {
                0b000 => Some(Rxifsel::_1_8_full),
                0b001 => Some(Rxifsel::_1_4_full),
                0b010 => Some(Rxifsel::_1_2_full),
                0b011 => Some(Rxifsel::_3_4_full),
                0b100 => Some(Rxifsel::_7_8_full),
                _ => None,
            }
        }
    }
    

    The set_rxifsel method is left an exercise for the reader, but it's worth discussing what a 'modify' method might look like. When modifying a register, we need to do three things:

    • Read the existing register contents
    • Modify some of the bitfields, clearing out the old bits and OR'ing in the new bits
    • Write the updated register back to the hardware

    I like a closure-based API for this:

    impl Uart {
        pub fn modify_ifls<F>(&mut self, f: F) where F: FnOnce(&mut Uartifls) {
            let mut value = self.read_ifls();
            f(&mut value);
            self.write_ifls(value);
        }
    }
    
    fn set_fifo_levels(uart: &mut Uart) {
        uart.modify_ifls(|r| {
            r.set_rxifsel(Rxifsel::_7_8_full);
            r.set_txifsel(Txifsel::_1_2_full);
        });
    }
    

    Note here there are two types - one (Uart) represents the peripheral (and all its registers), and the other (Uartifls) represents the contents of one particular register (the UARTIFLS register).

    The closure also has the advantage that it's impossible to do the read portion of the modify and yet forget to do the write portion (e.g. by adding an early return between the two).

    fn set_fifo_levels(uart: &mut Uart) {
        let mut ifls = uart.read_ifls();
        ifls.set_rxifsel(Rxifsel::_7_8_full);
        // someone adds this later on - note the early return
        do_other_operation()?;
        // so now this might not happen
        ifls.set_txifsel(Txifsel::_1_2_full);
        uart.write_ifls(ifls);
    }
    

    The Rust optimiser does a very good job of making closures go away at compile time, leaving you with pretty optimal machine code. This is what Rust people call a 'Zero Cost Abstraction'. Here's the assembly output of that closure-based set_fifo_levels function, taken from a larger program that does MMIO based access to the peripheral. I've annotated it manually.

    set_fifo_levels:
            push    {fp, lr}          ; Save state
            mov     fp, sp            ; Adjust frame pointer
            ldr     r0, [r0]          ; Get the register pointer from the Uart object
            mov     r2, #34           ; Our two fifo values packed as a single integer
            ldr     r1, [r0]          ; Read the register
            bfi     r1, r2, #0, #6    ; Modify the bottom 6 bits of the value
            str     r1, [r0]          ; Write to the register
            pop     {fp, pc}          ; Pop state to return from function
    

    No closures in sight! This is pretty much what I'd write if I was hand-writing it in assembly language, yet writing it Rust was a lot less error-prone than if I'd actually tried to write assembly language. The abstraction really was zero-cost.

    In terms of the methods export from the API, there are a lot of constants required, with a bunch of shifting and masking to get the right bits. These methods are therefore often auto-generated, either ahead of time, or using proc-macros, because getting all the shifts and masks correct is a fiddly business.

    Documentation

    With all that set-up out of the way, what I really wanted to talk about was - how does the user discover these APIs? That is, upon viewing the documentation for some crate that provides an abstract interface to some hardware (built locally with cargo doc, or hosted on https://docs.rs), how quickly can they answer the following questions:

    • How can I see a list of hardware this crate can manage for me?
    • How can I create an object that represents my specific piece of hardware (e.g. an Arm PL011 UART peripheral at some MMIO memory address)?
    • How can I read, write, or modify specific registers on that hardware (e.g. the IFLS register)?
    • How can I read or write specific bitfields within those registers (e.g. the TXIFLS bitfield)?

    When developing these kinds of libraries, I like to do two things:

    1. Enable #![deny(missing_docs)] to remind me to document all my types and functions
    2. Run cargo doc --open after every API change, to see the documentation as my users will see it

    This is because whilst we develop the crates in our editor, looking at the source code, most users don't actually open the source code for their dependencies. They rely on the documentation, often as hosted on https://docs.rs. To me it's therefore vital that as a developer I inspect the documentation that is generated just as much as I test the machine code that the code compiles down to.

    The Key Players

    Let's look at some of the existing hardware abstraction toolkits in this space, and view them through the lens of Documentation. To help do this, I've come up with a very simple fictional piece of hardware - a UART with only three registers. I've then implemented this driver using:

    The documentation for each of these packages is hosted at https://registry.ferrocene.dev/.

    svd2rust

    The svd2rust tool is a program that generates MMIO-based Rust source code, but from an XML description of the hardware in Arm's System View Description format. These XML files describe all the Peripherals in a system (and the MMIO addresses they exist at), the Registers within those Peripherals, and the Bitfields within those Registers. Where bitfields have a well defined set of values, it produces enum types that cover that set, and it offers read, write and modify functions for each register. Rather than generating code for one peripheral at a time, svd2rust generates an entire crate covering all the peripherals described in the SVD file, along with its interrupt vectors and other details. A crate generated with svd2rust is generally called a Peripheral Access Crate or 'PAC' and this tool (or others like it) are pretty standard for using Cortex-M based MCUs in Rust.

    The source code is at github.com/ferrous-systems/handling-system-registers/tree/main/svd2rust-example and the documentation is at registry.ferrocene.dev/docs/svd2rust-example.

    On the front page, we see a generic module and one for our single peripheral, called Uart. There is one structure called Peripherals and it contains a single instance for every peripheral in the system. The memory addresses for each of these peripherals are hard-coded into the crate, having been taken from the SVD file. At the bottom we see our UART driver, a type alias called Uart.

    Clicking through to Uart, we see one associated-const, and two methods - neither of which are particularly useful to the average user. It's not immediately apparent how we access any of the registers in our UART - but it turns out you need to click through to Uart::RegisterBlock, as linked from the type alias at the top.

    We now see three methods, one for each register. Clicking through the Uart::Status return type for the pub const fn status(&self) -> &Status function, the docs at the top tell us we can call read() and get a value of type Uart::status::R. I actually added these docs to svd2rust some time ago, because without them it was very difficult to find that R type.

    Clicking through we see that the R type has methods for tx_ready() and rx_ready(), and the return types of these functions have methods is_yes() and is_no() - this is all as per the SVD file that I wrote.

    To use this API in practice, you'd write something like:

    /// Represents a UART
    pub struct Uart {
        regs: svd2rust_example::Uart,
    }
    
    impl Uart {
        /// Create a UART driver, from the given low-level object
        pub const fn new(regs: svd2rust_example::Uart) -> Uart {
            Uart { regs }
        }
    
        /// Enable the UART
        pub fn enable(&mut self) {
            self.regs.control().modify(|_r, w| {
                w.en().set_bit();
                w
            });
        }
    
        /// Transmit a byte
        ///
        /// Blocks until space available
        pub fn transmit(&mut self, byte: u8) {
            while self.regs.status().read().tx_ready().is_no() {
                core::hint::spin_loop();
            }
            self.regs.data().write(|w| unsafe { w.byte().bits(byte) });
        }
    }
    
    /// Example program
    pub fn main() {
        let p = unsafe { svd2rust_example::Peripherals::steal() };
        let mut uart = Uart::new(p.Uart);
        uart.enable();
        uart.transmit(b'X');
    }
    

    Here we see that the write() method on a register has that closure-based API we discussed earlier. Here there are no fields defined within the data register, and so we must unsafely write to the raw bits of the register. If the register has defined fields, we'd have nice methods for setting them - as we do for the control register.

    To summarise, because svd2rust uses SVD files, it is only suitable for MMIO based peripherals that have an SVD file. We have seen that svd2rust generates code that is fairly comprehensive (assuming the SVD input is correct and complete), and that whilst code using the API is fairly readable, it can be quite difficult to navigate the documentation to find the names of the fields and the registers.

    tock-registers

    The tock-registers crate was designed for use in TockOS, a real-time operating system written in Rust and focussed on executing multiple mutually-untrusted applications in a safe and reliable manner. Rather than feeding an SVD file into a tool which generates Rust, tock-registers is a series of proc-macros that let you define your peripherals inside your Rust source code.

    The source code is at github.com/ferrous-systems/handling-system-registers/tree/main/tock-registers-example and the documentation is at registry.ferrocene.dev/docs/tock-registers-example.

    Looking at the documentation we see structures called UartRegisters and Uart. These are similar to the RegisterBlock and the Uart types from the svd2rust API. Because the crate was hand-written, I've added the high-level new and transmit functions directly on the Uart type. Looking at UartRegisters we see the three fields clearly enough, and clicking through to the ReadWrite type used in those fields, we see methods for read, write, set etc.

    One difference with this API is that read and write take a value that represents which field you want to read or write (like status.read(Status::tx_ready)) - whereas svd2rust would always read the whole register and then let you access specific fields within it (like status.read().tx_ready()). This read-once-access-many mode of operation is available in tock-registers (using the extract method), but most examples I see access one bitfield at a time. Either style works, it's just something you have to be expecting so you know whether the read method wants an argument or not.

    To use this API in practice, you'd write something like:

    /// Represents a UART
    pub struct Uart {
        // UartRegisters is the type tock-registers has generated
        regs: &'static mut UartRegisters,
    }
    
    impl Uart {
        /// Create a UART driver, with the UART at the given address
        ///
        /// # Safety
        ///
        /// The pointer `addr` must point to a valid UART structure, with
        /// appropriate alignment.
        pub const unsafe fn new(addr: *mut UartRegisters) -> Uart {
            Uart {
                regs: unsafe { &mut *addr },
            }
        }
    
        /// Configure the UART
        pub fn configure(&mut self, enabled: bool, baud: u32, stop_bits: Control::stop_bits::Value) {
            use tock_registers::interfaces::{ReadWriteable};
    
            self.regs.control.modify(
                Control::stop_bits.val(stop_bits as u32)
                    + Control::enable.val(enabled as u32)
                    + Control::baud_rate.val(baud),
            );
        }
    
        /// Transmit a byte
        ///
        /// Blocks until space available
        pub fn transmit(&mut self, byte: u8) {
            use tock_registers::interfaces::{Readable, Writeable};
    
            while self.regs.status.read(Status::tx_ready) == 0 {
                core::hint::spin_loop();
            }
            self.regs.fifo.set(byte as u32);
        }
    }
    

    We can see that the 'modify' function in tock-registers takes a single FieldValue, which you can create by adding together different FieldValues. I found the syntax quite hard to get right, and auto-complete couldn't really help. In particular, each field creates both a module and a const of type Field, and if you pick the wrong one in the auto-complete pop-up in your editor, you don't see the methods you are looking for.

    The final issue with tock-registers, is that is relies on creating references to MMIO addresses. As noted earlier, this is undefined behaviour, and although the code generally works in practice, hopefully a solution is forthcoming.

    In summary, it's nice to be able to create peripherals one at a time rather than having a whole crate auto-generated. The definition code is quite readable, but it can be tricky to find the syntax to read, write and modify registers and their bitfields. The documentation could give more guidance here too.

    safe-mmio

    Our next candidate is safe-mmio from Google. This crate is specifically designed to solve the "no references to MMIO address space" problem, by using structs to hold *mut MyPeripheral pointers, and function-like macros (like field!) to do the conversion from pointer-to-peripheral into pointer-to-peripheral-register without intermediate references. Unlike tock-registers, it doesn't handle bitfields - only register level access. The crates I've seen using safe-mmio typically combine it with the bitflags to provide support for individual bitfields, so that's what I've done here.

    The source code is at github.com/ferrous-systems/handling-system-registers/tree/main/safe-mmio-example and the documentation is at registry.ferrocene.dev/docs/safe-mmio-example.

    Starting with the documentation, as we did before, we've got a repr(C) structure called UartRegisters which very clearly sets out the registers we have available. This actually looks very much like tock-registers. Clicking through to the Control type, we have constants for the bitfields within our register, and some methods which talk about unions and intersections, but there's no obvious method for how to modify a bitfield. It turns out you create a Control value for each bitfield within the register, then OR them together (with |) and write the combined value out to the register. The example code looks like this:

    /// UART parity
    pub enum Parity {
        /// No Parity
        None,
        /// Odd Parity
        Odd,
        /// Even Parity
        Even,
    }
    
    /// Represents a UART
    pub struct Uart<'a> {
        regs: UniqueMmioPointer<'a, UartRegisters>,
    }
    
    impl<'a> Uart<'a> {
        /// Create a UART driver, with the UART at the given address
        pub const fn new(regs: UniqueMmioPointer<'a, UartRegisters>) -> Uart<'a> {
            Uart { regs }
        }
    
        /// Configure the UART
        pub fn configure(&mut self, enabled: bool, baud: u32, parity: Parity) {
            let p = match parity {
                Parity::None => Control::empty(),
                Parity::Odd => Control::PARITY_ENABLE,
                Parity::Even => Control::PARITY_ENABLE | Control::PARITY_EVEN,
            };
            let en = if enabled {
                Control::ENABLE
            } else {
                Control::empty()
            };
            let baud = Control::from_bits((baud << 1) & Control::BAUD_RATE.bits()).unwrap();
            field!(self.regs, control).write(p | en | baud);
        }
    
        /// Transmit a byte
        ///
        /// Blocks until space available
        pub fn transmit(&mut self, byte: u8) {
            while !field!(self.regs, status).read().contains(Status::TX_READY) {
                core::hint::spin_loop();
            }
            field!(self.regs, fifo).write(byte as u32);
        }
    }
    

    Again, I found this a bit fiddly to write - it wasn't obvious how to set the baud rate field, for example. I also had to make a custom enum Parity because the bitflags macro didn't create an enum for me. Accessing the registers is through code like field!(self.regs, fifo).write(byte as u32) which can be hard to read but it's not too bad to write once you know the syntax. One major issue I find though, is around creating the UniqueMmioPointer handle that refers to the peripheral (it's basically a *mut UartRegisters but with added ownership semantics). The UniqueMmioPointer::new function wants a core::ptr::NonNull, which is reasonable enough, but to create one of those you have to jump through some hoops…

    // With tock-registers, I can write:
    let mut uart = unsafe { Uart::new(0x4000_0000 as *mut UartRegisters) };
    // With safe-mmio, I have to write:
    let mut uart = Uart::new(unsafe {
        UniqueMmioPointer::new(NonNull::new_unchecked(0x4000_0000 as *mut UartRegisters))
    });
    

    One additional benefit of safe-mmio is that it avoids the use of the ptr::write_volatile and ptr::read_volatile APIs. Those APIs do work, but on Arm architectures LLVM will sometimes choose an instruction encoding that does register write-back. That is, an assembly instruction that both does a load/store at an address in a register and also adds four to the address in the register.

    // store "w9" to "(address in x0) + 4", *and then* set x0 = (x0 + 4)
    str w9, [x0], #4 
    

    These instructions are quite handy - one instruction is smaller and faster than two - however, there's an issue. When code is running on the AArch64 architecture on top of a hypervisor, and the memory region is set to trap into the hypervisor (say, because it's a virtual UART that the hypervisor is emulating on behalf of the guest OS), the hypervisor gets told about the load or store, and the address, but doesn't get told about the writeback. This then leads to misexecution of the program, because the register isn't updated as it should be. The workaround safe-mmio uses is to have their own functions for volatile reads and writes, which are implemented with inline assembly using instructions that do not perform register writeback.

    Overall, I like what safe-mmio is trying to do, even if the constructors for the MMIO handle types are a little onerous. I'm less impressed with bitflags and I would look for an alternative crate if my registers had bitfields that were wider than a single bit.

    derive-mmio

    So, funny story. I wrote derive-mmio at almost exactly the same time as Google wrote safe-mmio. If I'd seen their crate beforehand, I might not have written it, and perhaps vice-versa. However I think it's interesting to note where we decided to do the same things, and where we took a different path.

    As with tock-registers and safe-mmio, we create a repr(C) struct to describe our peripheral. However instead of using special types to mark each register as read-write or read-only, we use annotations that are understood by the derive-mmio macro.

    The source code is at github.com/ferrous-systems/handling-system-registers/tree/main/derive-mmio-example and the documentation is at registry.ferrocene.dev/docs/derive-mmio-example.

    Opening the documentation, we see two types - MmioUartRegisters, which is an MMIO handle (like a UniqueMmioPointer from safe-mmio), and UartRegisters (our repr(C) struct). Clicking through to the struct we see the fields clearly set out and documented. You can also click through and see details about the type of each register. In this case, I have opted to use the bitbybit crate. We can see that the Control type has methods for each bitfield, like fn baud_rate(&self) -> u23 and fn set_enable(&mut self, field_value: bool). Note that u23 is not a typo. The bitbybit crate uses arbitrary-int to provide integers which can be any number of bits in width.

    Looking at the MmioUartRegisters type, we see methods to read, write and/or modify each register according to the attributes given to it in the structure definition. That is, we have auto-generated methods (with documentation) rather than making users use a field! macro, as safe-mmio does.

    Using the API looks like this:

    /// Represents a UART
    pub struct Uart<'a> {
        regs: MmioUartRegisters<'a>,
    }
    
    impl<'a> Uart<'a> {
        /// Create a UART driver, with the UART at the given address
        pub const fn new(regs: MmioUartRegisters<'a>) -> Uart<'a> {
            Uart { regs }
        }
    
        /// Configure the UART
        pub fn configure(&mut self, enabled: bool, baud: u32, parity: Parity) {
            self.regs.modify_control(|w| {
               w.with_baud_rate(u23::from_u32(baud));
               w.with_enable(enabled);
               w.with_parity(parity);
               w 
            });
        }
    
        /// Transmit a byte
        ///
        /// Blocks until space available
        pub fn transmit(&mut self, byte: u8) {
            while !self.regs.read_status().tx_ready() {
                core::hint::spin_loop();
            }
            self.regs.write_fifo(byte as u32);
        }
    }
    

    The closure syntax is inspired by the the one from svd2rust, and it lends itself fairly well to auto-completion. The read APIs also return complete registers by default, rather than the tock-registers approach of reading only specific fields.

    The MmioUartRegisters type also has a neat constructor which just takes a usize:

    let regs = unsafe { UartRegisters::new_mmio_at(0x4000_0000) };
    

    However, at this time, derive-mmio does rely on volatile reads and writes, and so suffers from the risk that LLVM will choose an MMIO load/store instruction that an AArch64 hypervisor cannot emulate correctly if the I/O operation is trapped. That's something I think I will add in the future, because AArch64 support will be increasingly important for embedded systems as Cortex-R82 based devices hit the market.

    If you wanted to use this API style with a co-processor register, you can use the bitbybit types, and have write a struct with read and write methods that perform the appropriate inline assembly operations.

    In summary, I've designed this approach to be documentation friendly as well as readable. The generated code leans on auto-generated methods (with docs) rather than trait implementations (which only get documented at the trait level, not the implementation level). It borrows ideas from svd2rust, whilst also letting you build your driver peripheral by peripheral like tock-registers has been doing for many years. One thing I might change is that I could move the UART register types into a registers sub-module, to keep them distinct from the types that represent the peripheral as a whole.

    Rounding Up

    We've talked about three different ways to access hardware - I/O addressing, Co-Processor (or System) Registers, and Memory-Mapped I/O. We then looked at the four major approaches for building MMIO APIs - svd2rust, tock-registers, safe-mmio + bitflags and derive-mmio + bitbybit.

    There is no perfect solution, and each of these has its merits. You might like svd2rust (or one of its derivatives like chiptool) because it generates all the low-level drivers for your whole MCU at once. But if you have an automotive SoC with a 10,000 page datasheet and no SVD file, that approach doesn't work. It would also be nice to see svd2rust finally move away from MMIO reference types, something that also applies to tock-registers. I like how tock-registers lets you define your drivers one peripheral at a time, however I struggle to find the documentation I need for any given field or register. I like that safe-mmio solves the AArch64 hypervisor problem at the same time as solving the MMIO reference problem, but I would probably combine it with bitbybit rather than bitflags, because the bitbybit APIs just work better where fields are enumerations or values wider than simple booleans.

    Finally, and fairly obviously because it's the one I wrote, I would urge you to look at derive-mmio. But also I would urge everyone to run cargo doc on your own software a little more often, and ask yourself, "How will my users be able to use this documentation to solve their questions?"