Embedded Rust pattern - Zero Sized References

"Zero Sized Reference" (ZSR) sounds like an impossible thing given that mem::size_of returns a non-zero value for references to Zero Sized Types (ZST) like &() but ZSRs can in fact be constructed and they can improve both the performance and correctness of your embedded application.

In this post, we'll introduce you to this pattern which is actually used in many embedded crates – though many developers may not have given the actual pattern much attention.

Let's start by motivating the need for zero sized references.

Automatic deallocation for memory pools

Some embedded applications are simple enough that a memory pool that manages memory blocks of the same size (e.g. 128 bytes) – instead of a full blown general-purpose memory allocator – is sufficient to satisfy their dynamic memory management needs.

The heapless crate provides a lock-free memory pool for such cases. The core of the Pool API is shown below:

// module: heapless::pool

/// Lock-free memory pool that manages blocks of the same
/// size (`size_of::<T>`)
pub struct Pool<T> { /* .. */ }

/// An owning pointer into a block managed by `Pool<T>`
pub struct Box<T> { data: NonNull<Node<T>> }

/// LIFO linked list (AKA stack) node
struct Node<T> { /* .. */ }

impl<T> Pool<T> {
    /// Creates an empty memory pool
    pub const fn empty() -> Self { /* .. */ }

    /// Returns `None` when the pool is observed as exhausted
    // (if you are wondering why there's no lifetime relationship
    //  between `self` and the returned value, that's because
    //  `Pool` manages "statically allocated" (`&'static mut`)
    //  memory)
    pub fn alloc(&self) -> Option<Box<T>> { /* .. */ }

    /// Returns the memory `block` to the pool
    pub fn dealloc(&self, block: Box<T>) { /* .. */ }

    // omitted: API to give an initial chunk of memory the pool
}

One thing that's notably missing is that Box<T> does not implement the Drop trait. This means a block is not automatically returned to the memory pool when it goes out of scope.

static P: Pool<[u8; 128]> = Pool::empty();

fn main() {
    let x = P.alloc().expect("OOM");
    // do stuff with `x`

    // oops, this leaks memory -- the memory block is gone forever
    drop(x);

    // to avoid leaking memory you have to call this:
    // P.dealloc(x);
}

Let's try to fix that!

Implementing Drop

One may come up with this solution:

pub struct Box<T> {
    data: NonNull<Node<T>>,
    // added this ...
    pool: &'static Pool<T>,
}

impl<T> Pool<T> {
    // omitted: empty, alloc and dealloc methods

    // ... and this ...
    unsafe fn dealloc_raw(&self, p: NonNull<Node<T>>) { /* .. */ }
}

// ... so we can implement this
impl<T> Drop for Box<T> {
   fn drop(&mut self) {
       unsafe {
          // run T's destructor
          ptr::drop_in_place(self.data.as_ptr());

          // dealloc memory block
          self.pool.dealloc_raw(self.data);
       }
   }
}

This gets the job done: Boxes will be returned to their Pool when they go out of scope. But, this solution also doubles the size of pool::Box<T> (e.g. from 4B to 8B on a 32-bit architecture like ARMv7-M) which is going to regress the performance of moving Boxes around.

If doubling the size of Box doesn't sound like a big issue to you I shall tell you that most error handling libraries go out of their way (even tapping into the unsafe arts) to make their Error type a thin pointer, instead of a fat pointer (e.g. Box<dyn std::error::Error>), because the perf gains are significant.

Going back to the task at hand: can we implement Drop without increasing the size of Box? Yes!

Why use references when you can use ZST?!

We can actually keep the two-field Box implementation from last section and simply shrink the pool field from 4 (or 8) bytes to 0 bytes by using a Zero Sized Type instead of a reference. We'll need to introduce a Pool trait into the mix to make things work out. Here's the revised version:

// module: heapless::pool

// put the revised API in a new module
pub mod singleton {
    /// A memory pool
    pub trait Pool {
        // the type of the memory block managed by this pool
        type Data;
        // ^ this is the `T` in the original `Box<T>` version

        /* Public API */
        fn alloc() -> Option<Box<Self>>;

        /* Implementation details */
        #[doc(hidden)]
        unsafe fn __dealloc_raw(p: NonNull<Node<Self::Data>>);
    }

    pub struct Box<P: Pool> {
        data: NonNull<Node<P::Data>>,
        _pool: PhantomData<P>, // zero sized type, not a reference
    }

    impl<P: Pool> Drop for Box<P> {
        fn drop(&mut self) {
            unsafe {
                ptr::drop_in_place(self.data.as_ptr());
                // NOTE static method call: no receiver (`&self`)
                P::__dealloc_raw(self.data);
            }
        }
    }
}

// the original `Pool<T>` + `Box<T>` implementation still lives here

This may compile but how does one even use this new API? How do you implement the Pool trait? The answer is: you don't implement the Pool trait yourself; you use the pool! macro provided by the heapless crate:

pool!(P: [u8; 128]);

fn main() {
    // note that `alloc` is a static method (no receiver: `&self`)
    let x: Box<P> = P::alloc().expect("OOM");
    //         ^ NOTE `P`, not `[u8; 128]`
    // do stuff with `x`
    drop(x); // <- this returns the memory block to the pool :tada:
}

The pool! macro will expand to something like this:

// expansion of pool!(P: [u8; 128]);

use heapless::pool;

pub struct P; // <- the memory pool is a unit struct

impl pool::singleton::Pool for P {
    type Data = [u8; 128];

    fn alloc() -> Option<Box<P>> {
        // ommited: conversion from `pool::Box` to
        // `pool::singleton::Box`
        P::__impl().alloc().map(/* something */)
    }

    unsafe fn __dealloc_raw(p: NonNull<Node<[u8; 128]>>) {
        P::__impl().__dealloc_raw(p)
    }
}

impl P {
    fn __impl() -> &'static pool::Pool {
        static POOL: pool::Pool = pool::Pool::empty();
        &POOL
    }
}

Singleton-ish

What the pool! macro is actually doing is creating some sort of global singleton where all instances of, for example, P are handles to the same static variable POOL, which is hidden from the user. In other words, P is a Zero Sized version of a &'static Pool Reference that points into the hidden POOL variable.

That's the basic idea around Zero Sized References (ZSR): a zero sized proxy type that's equivalent to a shared (&'static) reference. ZSRs come in different flavors: pool! creates ZSR of the "shared" kind (&'static T) but there's also an "owned" variant which we'll look into later.

But first, let's look into the properties of the pool! abstraction.

Different types, same interface

In your embedded application you may end up using different memory pools, each one associated to a different part of your HAL like the radio interface or the USB interface. Some of these memory pools may end up managing memory blocks of the same size. If you are using the non-pool! version of Pool you may end up in a situation where it's not obvious to which pool you should return a memory block. See below:

// two pools that manage blocks of the same size
static A: Pool<[u8; 128]> = Pool::empty();
static B: Pool<[u8; 128]> = Pool::empty();

fn do_stuff(boxed: Box<[u8; 128]>) {
    // ..

    // which one to call?
    A.dealloc(boxed);
    B.dealloc(boxed);
}

The boxed argument could come from either pool A or B. If you pick A.dealloc above you may end up exhausting pool B (all of B's blocks get transferred to pool A); if you pick B.dealloc you may exhaust pool A.

With the pool! version it's simply not possible to return a memory block to the wrong pool because boxes that belong to a pool are uniquely typed. Each pool! invocation creates a new static variable and gives that static variable a different type. See below:

pool!(A: [u8; 128]);
pool!(B: [u8; 128]);

fn do_stuff(boxed: Box<A>) { // <- box managed by pool A
    // ..

    // `dealloc` is not a method of a `Pool` trait
    // but if it were this would be rejected at compile time
    B::dealloc(boxed); //~ error: expected type `B`; found type `A`
}

Even though both pools, A and B, are proxies to static variables with the same type Pool<[u8; 128]> each pool is exposed to the user as a different type. This lets you track in the type system the association between a memory block and a pool.

If for some reason you need to write function that must work with boxes managed by different pools you can write generic code:

// generic function ...
fn zero_before_dealloc<P, T>(boxed: Box<P>)
where
    P: Pool<Data = T>
    T: Zeroable, // unsafe marker: implies no destructor, etc.
{
    let p: NonNull<Node<T>> = Box::into_raw(boxed);
    // zeroes the block, even in presence of compiler optimizations
    T::zero(p);
    unsafe {
        P::dealloc_raw(p); // return it to the pool
    }
}

fn discard(a: Box<A>, b: Box<B>) {
    // ... that works with boxes from pool A and B
    zero_before_dealloc(a);
    zero_before_dealloc(b);
}

Zero Sized "Owned" References

The pool! macro creates a static variable proxy with global (static variable like) visibility but this not a strict requirement of the ZSR pattern. Here we present an "owned" variant that does not have global visibility:

/* Public API */
pub struct Proxy { // NOTE Zero Sized Type
    _marker: PhantomData<&'static mut Impl>,
}

impl Proxy {
    /// Returns the `Some` variant only once
    pub fn claim() -> Option<Proxy> {
        static CLAIMED: AtomicBool = AtomicBool::new(false);

        if CLAIMED
            .compare_exchange(false, true, SeqCst, SeqCst)
            .is_ok()
        {
            // not yet claimed
            // NOTE(unsafe) this branch is executed at most once

            unsafe { Self::__impl().write(Impl::new()) }

            // do not move the previous 'write' beyond this fence
            atomic::compiler_fence(SeqCst);

            Some(Self { _marker: PhantomData })
        } else {
            // already claimed
            None
        }
    }

    /// Frobs `arg`
    pub fn frob(&mut self, arg: SomeArg) {
        (*self.__impl()).frob(arg)
    }

    /* Implementation detail */
    fn __impl() -> *mut Impl {
        static mut IMPL: MaybeUninit<Impl> = MaybeUninit::uninit();
        unsafe { IMPL.as_mut_ptr() }
    }
}

/* Private API */
struct Impl { /* data */ }

impl Impl {
    // NOTE constructor does not need to be `const`
    fn new() -> Self { /* .. */ }
    fn frob(&mut self, arg: SomeArg) { /* .. */ }
}

Usage looks like this:

fn main() {
    // before claim: `IMPL` static variable is *un*initialized
    let mut proxy = Proxy::claim().expect("already claimed");
    // after claim: `IMPL` static variable is initialized

    // further calls to `claim` (from any thread) will return None
    assert!(Proxy::claim().is_none());

    // all methods will operate on an initialized static variable
    proxy.frob(SomeArg);
}

Proxy is a Zero Sized Reference that behaves like a normal variable and it's subject to the usual ownership semantics. We say this is an "owned" variant because Proxy is equivalent to a mutable reference with static lifetime (&'static mut T), which has the same move semantics as alloc::Box<T>.

So, where would you use this "owned" variant?

Spotted in the wild: MMIO registers

This "owned" variant of the ZSR pattern is widely used in the embedded Rust ecosystem. The API generated by svd2rust uses this pattern to:

  • make references to registers zero sized, for the same perf reasons as wanting to keep pool::Box pointer sized.

  • to let you control access to peripherals via ownership, which gives you better encapsulation than C where peripherals can be accessed from pretty much anywhere.

For reference, the svd2rust API as of version 0.17.0 looks like this:

#![no_std]

fn main() {
    // `take` is the same as `claim`: it returns
    // the `Some` variant only once
    let peripherals = nrf51::Peripherals::take().expect("already claimed");
    //                ^^^^^ Peripheral Access Crate (PAC)
    //                      generated by svd2rust

    // "the GPIO peripheral"
    // this handle grants exclusive access to the peripheral
    let gpio = peripherals.GPIO;
}

Unlike the general version of claim, Peripherals::take performs no initialization of static variables. These Zero Sized References are not pointing into static variables (RAM) but into Memory Mapped I/O registers (which have known addresses).

In recent work: filesystems

Recently, we came up with a Filesystem API based on the ZSR pattern for our friends at Iqlusion who are building armistice – a hardware private key storage for next-generation cryptography – as part of the development of a Hardware Abstraction Layer for the USB Armory Mk II development board.

We used the Zero Sized References pattern to implement "close on drop" semantics for files without storing a pointer to its filesystem in the file handle.

Like with memory pools you'll want a trait (Filesystem) that provides a common interface to different filesystem types.

/// NOTE do NOT implement this yourself;
/// use the `filesystem!` macro
pub unsafe trait Filesystem: Copy {
    /// Where does the filesystem live? RAM? on-chip FLASH?
    type StorageDevice: Storage;
    //                  ^^^^^^^
    // interface between the FS and the storage device

    /* Public API */
    fn mount(
        storage: Self::StorageDevice,
        format_before_mounting: bool,
    ) -> io::Result<Self>;

    /* Implementation details */
    #[doc(hidden)]
    unsafe fn __close_in_place(
        &self,
        f: &mut File<Self>, // not consumed by the method
    ) -> io::Result<()>;
}

/// A handle to an open file that lives in filesystem `FS`
pub struct File<FS: Filesystem> {
    // omitted fields: state, buffers, caches, etc.
    fs: FS, // Zero Sized Reference to the filesystem
}

impl<FS: Filesystem> File<FS> {
    /// NOTE must present a handle to the FS to "prove"
    /// it has been mounted
    pub fn create(
        fs: FS,
        path: impl AsRef<Path>,
    ) -> io::Result<Self> {
        // ..
    }

    pub fn write_all(
        &mut self,
        data: &[u8],
    ) -> io::Result<()> { /* .. */ }

    pub fn close(self) -> io::Result<()> { /* .. */ }
}

/// NOTE this panics if I/O errors occur while closing the file.
/// Use the `File.close` method, which returns a `Result`, to
/// handle those I/O errors
impl<FS: Filesystem> Drop for File<FS> {
    fn drop(&mut self)  {
        let fs = self.fs; // make a copy
        fs.close_in_place(&mut self).expect("I/O error");
        // omitted: deallocate buffers
    }
}

You would use the API like this:

filesystem!(F, StorageDevice = uSD);

// get handle to micro SD card subsystem
let usd = uSD::claim()?;
let format = true;
let fs = F::mount(usd, format)?;

let mut f: File<F> = File::create(fs, "foo.txt")?;
f.write_all(b"Hello, file!")?;

// file will be closed automatically at the end of the scope
// (implicit `drop`)
// drop(f);

// OR you can close the file manually to handle errors
f.close()?;

Note how the API also prevents operations like "creating a file on a filesystem that has not yet been mounted" at compile time: File::create requires a handle to the filesystem and that handle can only exist after the filesystem has been mount-ed.

If you use the filesystem! macro to create one more filesystem, say backed by RAM (so tmpfs like), then that filesystem gets its own type, let's say R. All files include the filesystem they belong to in their type so File<F> lives in FS F whereas File<R> lives in FS R. Having different types means that you can avoid, at compile time, operations like closing (committing) a file to the wrong filesystem:

filesystem!(F, StorageDevice = uSD); // non-volatile storage
filesystem!(R, StorageDevice = RAM); // volatile storage
// ..

let r = F::mount(usd, format)?;
let f = R::mount(ram_block, format)?;

let mut file: File<R> = File::create(fs, "foo.txt")?;
file.write_all(b"Hello, file!")?;

// wrong filesystem!
F::close(file)?; //~ error: expected type `F`; found type `R`

This Filesystem implementation uses the littlefs C library under the hood to do the bulk of the work. Creating a safe Rust wrapper around that C library proved rather challenging – and probably deserves its own blog post – but at the end of the day Rust features let us defuse the traps in the C API so that consumers of the Rust wrapper won't run into them.

Food for thought: capabilities

If you are writing single-process embedded applications the above Filesystem API lets you restrict, at compile time, which parts of your application can use the FS.

Consider the following functions. You can tell which ones use the filesystem, and which filesystem, by just looking at their signatures. Or, in other words, the function signatures reflect the capabilities of the subroutine.

// filesystem stored in internal (on-chip) Flash memory
filesystem!(Internal, StorageDevice = FLASH);

// filesystem stored in external (on-board) SPI-NOR Flash
filesystem!(External, StorageDevice = SPI_NOR);

// can use either filesystem
fn create_lockfile(f: impl Filesystem) -> io::Result<(), ()> {
    // ..
}

fn uses_the_internal_fs(f: Internal) -> io::Result<(), ()> {
    // ..
}

fn uses_the_external_fs(f: External) -> io::Result<(), ()> {
    // ..
}

fn cannot_use_any_fs(/* because no arguments */) { /* .. */ }

If you are structuring your embedded application as a set of tasks, either async tasks or reactive tasks, you can similarly control the capabilities of each task by moving, or not, a Filesystem handle into the task. With async tasks that may look like this:

#![no_std]

fn main() -> ! {
    let fs = Filesystem::mount(/* .. */);

    // tasks
    let first = uses_the_fs(fs);
    let second = cannot_use_the_fs();

    // run asynchronous tasks concurrently
    executor::run!(first, second)
}

async fn uses_the_fs(fs: Fs) -> ! { /* .. */ }

async fn cannot_use_the_fs() -> ! { /* .. */ }

If we had opted for a globally available fs API, like the one provided in the Rust standard library, it would be much harder to identify which parts of the application may perform FS operations.

Conclusion

This is one of the patterns we use to build correct-by-construction software for our clients. In this post we have covered some of the possible uses of this pattern but, for brevity, have left out (important) details like the concurrency (thread / interrupt) safety of the ZSR proxies and the extra mileage you can get out of the runs once section in the ZSR::claim constructor. Still, we hope that the post motivates you into trying out this pattern in your embedded code!


Ferrous Systems is a Rust consultancy based in Berlin. Need to strengthen your Rust Project with the external development support? Looking for the advise on how to get the best out of Rust features? Get in touch with us! Want to get up to speed with the language and its tooling? We also offer remote training on basic Rust, advanced topics and embedded Rust — to receive our updated training program, subsribe to our newsletter.