At Ferrous Systems, we've worked on a wide range of software in Rust, from embedded systems, to highly concurrent asynchronous applications. While the environments that these systems operate in are wildly different, they both can be boiled down to the same problem statement:
How can I process multiple events coming from outside of my own system in a way that is both efficient at runtime, as well as clear to develop and understand?
When you step back a little bit, server applications and embedded systems are not so different. They spend their time waiting for events to occur, whether that is a button press from a user, or a connection request from a browser. They then must handle many of these events concurrently and expediently, all the while waiting for the next event in a complex sequence of states to occur.
While a server might wait for a transaction with a database to complete so it can respond to a request, an embedded system might wait for a DMA transaction with an SD card to finish writing a log message. These events both take a relatively huge amount of time compared to operations within the CPU itself, and neither of these systems should stop the world when these events occur, they can be doing something much better with their limited resources (or be sleeping to save energy)!
We think that libraries like async-std have made it easier to let the compiler generate the necessary state machines for people writing highly concurrent and efficient server and desktop applications. But what if we could have these same benefits when writing Embedded Rust?
Why is this hard?
Currently, any use of async fn
or .await
in a #![no_std]
crate will result in rather unhelpful compiler errors:
#![no_std]
use core::future::Future;
async fn a(f: impl Future) {
f.await;
}
error[E0433]: failed to resolve: could not find
`poll_with_tls_context` in `future`
error[E0433]: failed to resolve: could not find
`from_generator` in `future`
To understand why this happens, we need to look under the hood of the async/await language feature. It turns out that the compiler will replace both of these constructs by simpler Rust code, since async/await is essentially just syntactic sugar. The function a
in the example above is translated (or "desugared") to code that is morally equivalent to the following:
fn a(mut f: impl Future) -> impl Future {
::core::future::from_generator(move || {
loop {
match ::core::future::poll_with_tls_context(unsafe {
::core::pin::Pin::new_unchecked(&mut f)
}) {
::core::task::Poll::Ready(result) => break result,
::core::task::Poll::Pending => {}
}
yield;
}
})
}
(I say "morally equivalent" because the actual desugaring contains some other code that is not relevant to this explanation)
The desugared code makes use of generators (indicated by the yield
expression), which is a currently experimental language feature.
The code also ends up calling core::future::from_generator
and core::future::poll_with_tls_context
, which showed up in the error message we got. It turns out that these functions only exist in libstd, not libcore, explaining the error we see. But why is that?
This has to do with the Future::poll
method that is implemented by all futures, including the ones created by defining an async fn
. In addition to a self
argument, this function takes a &mut task::Context
, and as it turns out that while generators let you yield
values out of them, they don't yet offer a way to pass any additional data into them when resuming them. The standard library works around that issue by storing the context in thread-local storage (TLS), and pulling it back out when another Future
is await
ed. Unfortunately TLS is not available when #![no_std]
is used, since it is provided by the OS.
There are various workarounds that have been explored since, but they all result in a less-than-ideal user experience by requiring custom annotations or even compiling your own libcore replacement. We will explore a few projects like this below, since they have done some impressive exploration of the design space despite these limitations.
A more holistic solution to this problem would be to allow generators to take an argument when resumed, so that's what we set out to implement. The nice thing about this is that it doesn't just allow fixing async/await on #![no_std]
, but makes generators much more powerful in general.
Generator Resume Arguments
We won't go into much implementation detail here, but while the work it took to get resume arguments working was tedious (mostly due to the breakage introduced in a lot of tests), it was also surprisingly uncomplicated for such in-depth compiler work, thanks to Rust's type system pointing out most of the places affected by any change. The initial Pull Request was opened after around a week of on-off work here.
Unfortunately, this initially regressed compile-times of the await-call-tree
benchmark by around 10%. Since this could affect lots of programs using async/await, we decided that this wasn't acceptable, and set out to fix it. This resulted in #68606 and, after a hint by the original author of the generator code, #68672, each cutting off a significant portion of compile-time of the await-call-tree
benchmark.
These optimizations had the nice side-effect of also benefitting existing async/await-heavy code, reducing the cargo build
time of the async-std
project by around 20% 1.
Next Steps
The Pull Request adding generator resume arguments has just been merged, so the next step is to modify the desugaring of async/await to pass the task context using resume arguments instead of TLS. This will also include a move of all the libstd-internal APIs to libcore, enabling async/await in #![no_std]
code. We will make sure to post an update once that has happened!
Of course, the journey doesn't end there. We hope that this will make it easy for everyone to try out different ways of using async/await on #![no_std]
platforms.
Prior Art
Despite the current obstacles, some projects have already explored the space of #![no_std]
async programming using various workarounds.
Perhaps the most well-known project targeting embedded systems is Embrio. It targets microcontrollers such as the nRF51, and features an executor that is woken up by hardware interrupts. Usage of async/await is enabled by using an Embrio-specific procedural macro on async fn
s (#[embrio_async]
), which performs its own custom desugaring.
The firmware for the Polymer mechanical keyboard was implemented using Embrio, demonstrating that it is already useful for building real embedded applications.
Meanwhile, yaar explores an interesting approach of storing tasks using intrusive collections. While this crate does not by itself enable async/await, this approach could allow creation of tasks without needing an allocator, which can be desirable on some embedded systems.
The SunriseOS project, which aims to reimplement the microkernel used on the Nintendo Switch, uses a libcore wrapper called core-futures-tls, which makes use of TLS built into the ELF binary format itself, and relies on the operating system's loader. A fork of this library uses a static mut
instead, which allows using it on single-core microcontrollers (although the soundness of this approach might be a bit questionable).