Background
Ferrous Systems was contracted by the Internet Security Research Group (ISRG) to work on rustls, a modern TLS library in Rust, in 2023.
We worked on adding no-std support to rustls, as well as designing and implementing an alternative API to the Connection
API that lets the user manage the buffers used by rustls.
We also did some performance improvements on the side.
You can view ISRG's rustls work plan on their website.
I'd like to walk through some of the work I did on the performance side as I ran into some interesting borrow checker problems and I think it would be instructive to share how I dealt with them.
I'll cover three borrow checker problems, my solutions to those problems and the limitations of my solutions. The content will be covered in 3 posts; one borrow checker problem per post.
NOTE: most snippets shown here are simplified versions of the rustls codebase as of version v0.22.0. Links are provided to the relevant parts of the rustls codebase should you wish to look at the full version.
Processing incoming TLS data
First, let's cover at a high level how rustls (v0.22.0) handles incoming TLS data.
The Connection
API has an internal contiguous buffer of bytes that it uses to read and buffer TLS data from an IO object, usually a network socket.
Let's call this internal buffer incoming_tls_data
.
As a first approximation, you can think of this buffer as being a Vec<u8>
.
When you call read_tls
, data is read from the IO object and appended to this internal buffer.
// first approximation
struct Connection {
incoming_tls_data: Vec<u8>,
// ..
}
After reading data with read_tls
, the end user will call process_new_packets
to complete the handshake process and to decrypt any application data that read_tls
may have read from the network socket.
process_new_packets
does many things but it starts by extracting TLS records out of the incoming_tls_data
buffer.
TLS data is framed in records. Each record has a header that indicates what kind of payload it contains – for example, the payload could be a certificate – and how big the payload is. In addition to this, the payload of some TLS record types is encrypted; this is the case with records that carry application data.
<--------------------- TLS data --------------------->
<----- record 1 -----> <----- record 2 ----->
+----------+-----------+----------+-----------+------+
| Header 1 | Payload 1 | Header 2 | Payload 2 | (..) |
+----------+-----------+----------+-----------+------+
encr?| no | no | no | yes |
size | 5B | 1B | 5B | 122B |
type | ChangeCipherSpec | EncryptedExtensions |
Figure 1. ASCII diagram of TLS records as they appear in a TLS data stream. Diagram indicates the record types; header and payload sizes in bytes; and whether the payload is encrypted or not.
The object used to extract records, and potentially decrypt their payloads, from this linear buffer is called the MessageDeframer
.
A second, better approximation of the Connection
object then becomes:
// NOTE: the types that sit between `MessageDeframer` and
// `Connection` have been omitted
struct Connection {
incoming_tls: MessageDeframer,
// ..
}
struct MessageDeframer {
buffer: Vec<u8>
// ..
}
MessageDeframer
's most important method is pop
.
This method extracts one record out of its internal buffer
and decrypts its payload if it's encrypted.
impl MessageDeframer {
fn pop(
&mut self,
/* .. */,
) -> Result<Option<Deframed>, /* .. */> {
// ..
}
}
struct Deframed {
message: PlainMessage,
// ..
}
struct PlainMessage {
payload: Payload,
// ..
}
struct Payload(Vec<u8>);
PlainMessage
is a TLS record with decrypted Payload
.
deframer.buffer =
+----------+-----------+----------+-----------+-----+
| Header 1 | Payload 1 | Header 2 | 0xd4e48.. | Hea |
+----------+-----------+----------+-----------+-----+
deframer.pop() ->
+----------+-----------+
| Header 1 | Payload 1 |
+----------+-----------+
deframer.pop() ->
+----------+-----------+
| Header 2 | Payload 2 |
+----------+-----------+
deframer.pop() -> None
Figure 2. ASCII diagram of the operation of the pop
method.
The first record has an unencrypted payload; the second, an encrypted payload; the third record is incomplete.
The main takeaway here is that PlainMessage
is an "owning" object:
it does not borrow data from the MessageDeframer
's buffer.
This means that data is eagerly copied out of the deframer's buffer and placed into a fresh heap allocation.
That's the kind of heap allocation we want to eliminate so
the first step is to tweak PlainMessage
internals so it can borrow from MessageDeframer
's buffer instead of cloning part of it.
Inside MessageDeframer::pop
As I mentioned above, MessageDeframer::pop
performs two operations: it extracts one record out of a linear buffer and then decrypts the payload of said record, if necessary.
The first operation is handled by an object called the codec::Reader
; the second, by an object called the RecordLayer
.
If we look at how MessageDeframer
uses codec::Reader
, we see that codec::Reader
is initialized with a slice into MessageDeframer
's buffer: good, no allocation or cloning there.
Then, MessageDeframer
uses the OpaqueMessage::read
method to get one OpaqueMessage
message out of the Reader
.
impl MessageDeframer {
fn pop(/* .. */) -> /* .. */ {
// ..
let mut rd = codec::Reader::init(
&self.buffer[start..self.used],
);
let m = match OpaqueMessage::read(&mut rd) {
// ..
};
let msg = match record_layer.decrypt_incoming(m) {
// ..
};
// ..
}
}
struct OpaqueMessage {
payload: Payload,
// ..
}
buffer =
+--------------+----------------+----+
| 0x160304007a | 0xd4e4877bac.. |(..)|
+--------------+----------------+----+
rd = codec::Reader::init(&buffer[some_range]) ->
+--------------+----------------+
| 0x160304007a | 0xd4e4877bac.. |
+--------------+----------------+
m = OpaqueMessage::read(rd) ->
+--------+----------------------+
| Header | 0xd4e4877bac978b79.. |
+--------+----------------------+
if m.has_encrypted_payload() then
msg = RecordLayer::::decrypt(m) ->
+--------+----------------------+
| Header | b"plaintext payload" |
+--------+----------------------+
Figure 3. ASCII diagram that depicts the operations that happen inside MessageDeframer::pop
.
OpaqueMessage::read
decodes the raw bytes that make up the header; RecordLayer::decrypt
decrypts the encrypted payload.
OpaqueMessage
is a version of PlainMessage
that contains a potentially encrypted payload; it is also an owning object.
This means that the OpaqueMessage::read
method is performing the heap allocation we want to remove.
If we trace down where OpaqueMessage
is converted into a PlainMessage
, we arrive at the MessageDecrypter::decrypt
method:
trait MessageDecrypter {
fn decrypt(
&mut self,
msg: OpaqueMessage,
/* .. */
) -> Result<PlainMessage /* .. */>;
}
There is not a lot of detail there as that's a trait method declaration.
If we look at actual implementations of this trait (for example, the ring implementation), we discover that they perform the decryption "in-place".
This means OpaqueMessage
's payload is mutated as part of the decryption process and then the OpaqueMessage
object is effectively "casted" (zero-cost converted) into a PlainMessage
object.
No heap allocation occurs in this process; that is good.
There's an important takeaway from looking at these MessageDecrypter
implementations:
OpaqueMessage
's payload needs to be mutable.
Mutable codec::Reader
Based on what we have learned, we need to make these changes:
OpaqueMessage
needs to hold a mutable slice intoMessageDeframer
's bufferPlainMessage
needs to hold an immutable slice intoMessageDeframer
's bufferOpaqueMessage::read
must be tweaked to not allocate / clone
For now, let's ignore the Payload
newtype and assume that the new types will look like this:
struct OpaqueMessage<'a> { // now has a lifetime parameter
payload: &'a mut [u8],
// ..
}
struct PlainMessage<'a> { // same here
payload: &'a [u8],
// ..
}
Next, let's look at the updated OpaqueMessage::read
signature:
impl<'a> OpaqueMessage<'a> {
// ^^ ^^
fn read(
r: &mut codec::Reader<'a>,
// ^^
) -> Result<Self, /* .. */> {
// ..
}
}
Those would be the right lifetime contraints but there's a problem:
codec::Reader
contains an immutable slice from which we cannot create the mutable slice that we need to put in OpaqueMessage
.
struct Reader<'a> {
buffer: &'a [u8],
cursor: usize,
}
We'll deal with the issue by creating a mutable version of Reader
– we'll call it ReaderMut
– that contains a mutable slice.
Only OpaqueMessage::read
will use this new type so not a lot of churn elsewhere in the codebase with this approach.
struct ReaderMut<'a> {
buffer: &'a mut [u8],
cursor: usize,
}
impl<'a> OpaqueMessage<'a> {
fn read(
r: &mut codec::ReaderMut<'a>,
) -> Result<Self, /* .. */> {
// ..
}
}
The reborrow problem
As I copied over methods from Read
to ReadMut
I hit a borrow checker problem in the codec::Reader::take
method:
impl<'a> ReaderMut<'a> {
// lifetime 's made explicit for clarity
fn take<'s>(
&'s mut self,
length: usize,
) -> Option<&'a mut [u8]> {
if self.left() < length {
return None;
}
let current = self.cursor;
self.cursor += length;
Some(&mut self.buffer[current..current + length])
//^~ error: lifetime may not live long enough
}
}
The error message is: "method (take
) was supposed to return data with lifetime 'a
but it is returning data with lifetime 's
".
Because the self
variable has type &'s ReaderMut<'a>
, mutably accessing its buffer
field through self
limits the slice re-borrow to &'s mut [u8]
.
Let's desugar the last line to make that clearer:
self.cursor += length;
// NOTE: 's, not 'a
let reborrow: &'s mut [u8] = &mut self.buffer;
let retval: &'s mut [u8] =
reborrow.index_mut(current..current + length);
Some(retval)
This had me thinking for a bit: "why is this not a problem in the immutable case?" because this works fine:
impl<'a> Reader<'a> {
fn take(
&mut self,
length: usize
) -> Option<&'a [u8]> {
if self.left() < length {
return None;
}
let current = self.cursor;
self.cursor += length;
Some(&self.buffer[current..current + length])
}
}
"Isn't the same constrained reborrow happening in this case?" The answer is "no" because immutable slices are Copy
so the compiler can copy the buffer
field instead of borrowing it.
In the Reader
case, the compiler can desugar the last line into:
self.cursor += length;
// NOTE: 'a! this coerce-copies the `buffer` field
let copy: &'a [u8] = self.buffer;
let retval: &'a [u8] = copy.index(current..current + length);
Some(retval)
This last desugaring is not possible in the mutable case because it would require moving the non-Copy
buffer
field out of self
and
that can't happen through the mutable reference specified in the function signature of take
(&mut self
).
Solution: mem::take
We can't move buffer
out of self
but we can mem::take
it like this:
let buffer: &'a mut [u8] = core::mem::take(&mut self.buffer);
and now we have the 'a
lifetime we wanted!
The side effect is that self.buffer
becomes an empty mutable slice.
Leaving the field empty like that will prevent us from using the take
method again so more changes are needed to preserve the semantics of Reader::take
.
The final version looks like this:
struct ReaderMut<'a> {
buffer: &'a mut [u8],
// renamed because it has a different role now
used: usize,
}
impl<'a> ReaderMut<'a> {
fn take<'s>(
&'s mut self,
length: usize,
) -> Option<&'a mut [u8]> {
if self.left() < length {
return None;
}
// `self.buffer` becomes `&mut []`
let buffer = mem::take(&mut self.buffer);
let (taken, rest) = buffer.split_at_mut(length);
self.used += length;
self.buffer = rest; // <- repopulate the field
Some(taken)
}
}
A closer look
I think it's important to highlight that take
does not freeze ReaderMut
.
You can tell from the method signature: the input parameter &'s mut self
and the output parameter Option<&'a mut [u8]>
have different lifetime parameters.
Because ReaderMut
is not frozen, you can call take
again while the value returned by take
is in scope:
let mut buffer = [0, 1, 2, 3];
let reader = ReaderMut::init(&mut buffer);
let first = reader.take(2).unwrap(); // start of 'first
// borrow checker acceps this while `first` is alive
let second = reader.take(2).unwrap();
drop(first); // end of 'first
Another insight is that the buffer
field of ReaderMut
is modified by the take
method.
What take
does is chop length
bytes from the front of buffer
and returns that.
Let's have a closer look using a hypothetical dbg2!
macro that preserves more type information (e.g. mutability) than dbg!
:
let mut buffer = [0, 1, 2, 3];
let reader = ReaderMut::init(&mut buffer);
let first = reader.take(2).unwrap();
dbg2!(first, reader);
That prints:
first = &mut [0, 1]
reader = ReaderMut {
buffer: &mut [2, 3],
used: 2,
}
There you can see that there are two mutable, non-overlapping slices. The fact that they are non-overlapping is important to adhere to Rust's aliasing rules.
Aliasing rules are why the first version of ReaderMut::take
was rejected.
That version of take
left buffer
unchanged and did not freeze ReaderMut
.
If that version had been accepted by the compiler then the dbg2!
snippet would print:
first = &mut [0, 1]
reader = ReaderMut {
buffer: &mut [0, 1, 2, 3],
used: 2,
}
There you can see that first
and reader.buffer
are mutable slices that overlap and that breaks Rust's aliasing rules.
On the other hand, the Reader::take
implementation does not modify the buffer
field and that produces overlapping immutable slices:
let buffer = [0, 1, 2, 3];
let reader = Reader::init(&buffer);
let first = reader.take(2).unwrap();
dbg2!(first, reader);
first = &[0, 1]
reader = Reader {
buffer: &[0, 1, 2, 3],
taken: 2,
}
and that's fine as it's in line with Rust's aliasing rules.
Alternative: the Option
dance
mem::take
was not there in Rust 1.0; in fact, it was only stabilized in Rust 1.40.
So, what did people do to deal with this problem before mem::take
came into existence?
They did "the Option
dance".
I'll now demonstrate the Option
dance (🕺) to the new Rustaceans that do not know about it.
struct ReaderMut<'a> {
buffer: Option<&'a mut [u8]>, // <- wrap in Option
used: usize,
}
impl<'a> ReaderMut<'a> {
fn take<'s>(
&'s mut self,
length: usize,
) -> Option<&'a mut [u8]> {
if self.left() < length {
return None;
}
let buffer = self.buffer.take().unwrap(); // <-
let (taken, rest) = buffer.split_at_mut(length);
self.used += length;
self.buffer = Some(rest); // <-
Some(taken)
}
}
Not too different except for the addition of some unwelcome unwrap
s.
Even accessing buffer
without the intention of taking it requires an unwrap
;
here's the Option
dance version of the ReaderMut::left
method.
fn left(&self) -> usize {
self.buffer.as_ref().unwrap().len()
}
Although mem::take
is the better option (no pun intended), the Option
dance pattern still has its place.
mem::take<T>(dest: &mut T)
requires that the value of type T
being taken implements the Default
trait so that a default value can be left behind in dest
.
If you are working with types that do not implement this trait then dancing is your only option (pun intended).
Wrap-up
That concludes the first blog post of this series. Thus far, we have removed one heap allocation from the process of deframing (decoding) and decrypting TLS records out of a linear buffer. As all TLS records go through that process, that's one less heap allocation per processed record.
In the next post, we'll continue working our way through the rustls pipeline that processes incoming TLS records. The next borrow checker problem will involve lifetime parameters in traits and Higher-Rank Trait Bounds (HRTB).
If the borrow checker has you in a bind or if you want to optimize the memory usage of your project by switching from heap allocated objects to references, our consulting service can help you with that! Get in touch through our contact form.