Article

Allocation-free decoding with traits and high-ranked trait bounds

This is the second post of a series where we go through the performance work we did in rustls, a modern TLS library in Rust. Today we implement allocation-free decoding using traits and high-ranked trait bounds

Published on 13 min read

    Recap of the first post

    In the first post, we started working through the rustls' pipeline that decodes and decrypts TLS records, removing the heap allocations we came across. After the changes in that first post, the pipeline we covered looks like this:

    • The input to the pipeline is a contiguous byte buffer that contains TLS data.
    • A mutable chunk of bytes – the type is &'buffer mut [u8] – is split off from the front of buffer.
    • The codec::ReaderMut API is used to decode the mutable slice into an OpaqueMessage.
    • OpaqueMessage consists of a decoded header – you can think of the header as a struct – and a potentially encrypted payload, which is represented as a mutable slice.
    • The RecordLayer::decrypt API takes the OpaqueMessage, decrypts the payload (if needed) and returns a PlainMessage.

    All these steps happen in the MessageDeframer::pop method.

    buffer = 
        +---------------+----------------+----+
        | 0x160304007a  | 0xd4e4877bac.. |(..)|
        +---------------+----------------+----+
    
    rd = codec::ReaderMut::init(&mut buffer[some_range]) ->
        +---------------+----------------+
        | 0x160304007a  | 0xd4e4877bac.. |
        +---------------+----------------+
    
    m = OpaqueMessage::read(rd) ->
        +---------------+----------------+
        | Header { .. } | 0xd4e4877bac.. |
        +---------------+----------------+
    
    if m.has_encrypted_payload() then
      msg = RecordLayer::::decrypt(m) ->
        +---------------+----------------------+
        | Header { .. } | b"plaintext payload" |
        +---------------+----------------------+
    

    Figure 1. ASCII diagram that depicts the operations that happen inside MessageDeframer::pop.

    In the original rustls codebase (v0.22.0), the involved types looked roughly like this:

    // in mod codec
    struct Reader<'a> {
        buffer: &'a [u8], // NOTE immutable slice
        // ..
    }
    
    struct OpaqueMessage {
        payload: Vec<u8>,
        // ..
    }
    

    Note that OpaqueMessage is a heap allocated buffer (Vec) so OpaqueMessage::read needs to allocate that Vec and copy the data in Reader.buffer into it.

    To remove that allocation, we wanted to change the payload type from Vec into a slice but we learned that the RecordLayer::decrypt operation happens "in place" and needs to mutate OpaqueMessage.payload. Therefore, OpaqueMessage needs to hold a payload with type &mut [u8]. However, Reader cannot create a mutable slice from its buffer field because that's an immutable slice.

    To deal with that, we introduced a mutable version of codec::Reader, ReaderMut, that operates on mutable slices and is able to return the mutable slice that'll be stored in OpaqueMessage.payload. As we implemented ReaderMut, we ran into a borrow checker problem which we solved using the core::mem::take API.

    Side note: a reader pointed out that the pattern of using [T]::split_at_mut, after core::mem::take, has been referred to as "splitting borrows" in the rustonomicon. It's great to have names for patterns as it makes it easier to refer to them in discussions so here I'm helping spread the name 📢.

    At the end of the first post, our types looked like this:

    // mod codec
    struct ReaderMut<'a> { // NOTE different type
        buffer: &'a mut [u8], // NOTE mutable slice
    }
    
    struct OpaqueMessage<'a> {
        payload: &'a mut [u8], // NOTE also mutable slice
        // ..
    }
    

    After the decrypt operation, the payload does not need to be modified so we coerce it into an immutable slice when we store it in the PlainMessage struct. The updated PlainMessage struct looks like this:

    struct PlainMessage<'a> {
        payload: &'a [u8],
    
        // this field is part of the header
        typ: ContentType,
    
        // .. other header information ..
    }
    

    One allocation down, many more to go

    By making OpaqueMessage::read not allocate, we have removed one allocation from the pipeline, but there are more allocations down the line. Turns out the PlainMessage.payload field needs further decoding after being decrypted. The PlainMessage.typ field indicates which type the payload bytes should be decoded into.

    The decoding itself is done using the Codec trait (interface) and the codec::Reader object:

    trait Codec: Sized {
        // `Reader`'s lifetime parameter has been made explicit
        fn read<'r>(
            r: &mut Reader<'r>,
        ) -> Result<Self, /* .. */>;
        // ..
    }
    

    There are multiple types in the codebase that implement this trait. The catch is that the implementations perform heap allocations because those types are of the owning kind. For example, have a look at CertificateRequestPayloadTls13:

    struct CertificateRequestPayloadTls13 {
        context: PayloadU8, // <- owning type
        extensions: Vec<CertReqExtension>,
    }
    
    impl Codec for CertificateRequestPayloadTls13 {
        fn read(r: &mut Reader) -> Result<Self, /* .. */> {
            Ok(Self {
                context: PayloadU8::read(r)?,
                extensions: Vec::read(r)?,
            })
        }
        // ..
    }
    
    /// Arbitrary, unknown-content, u8-length-prefixed payload
    struct PayloadU8(Vec<u8>); // <- owning type
    
    impl Codec for PayloadU8 {
        fn read(r: &mut Reader) -> Result<Self, /* .. */> {
            let len = u8::read(r)? as usize;
            let mut sub = r.sub(len)?;
            let body = sub.rest().to_vec(); // <- allocation
            Ok(Self(body))
        }
        // ..
    }
    

    To remove those allocations, we'll need to switch from owning types, like Vec<u8> to borrowed types, like &[u8]. The interesting parts of doing that are the changes that need to be done in the Codec trait and the ramifications of making those changes.

    Consider this new, borrowed version of PayloadU8 and the desired Codec implementation:

    struct PayloadU8<'a>(&'a [u8]); 
    
    impl<'a> Codec for PayloadU8<'a> {
    //   ^^
        fn read(
            r: &mut Reader<'a>,
            //             ^^
        ) -> Result<Self, /* ... */> { //~ error
            // ..
        }
        // ..
    }
    

    That's what we want to write but note that the impl-level generic lifetime parameter 'a appears in the type of input parameter r so in this implementation the read method is not generic. That does not match the declaration of the generic read method in the Codec trait so the compiler throws an error:

    error: method not compatible with trait
    note: expected signature `fn(&mut Reader<'r>) -> Result<_, _>`
             found signature `fn(&mut Reader<'a>) -> Result<_, _>`
    

    A borrowing Codec

    What we want is a way to tie / associate the lifetime of the Reader input parameter to the return type of the read method, Self. One way to achieve that is to add a lifetime parameter to the trait itself, like this:

    trait Codec<'r>: Sized {
        //      ^^ new addition
        fn read(_: &mut Reader<'r>) -> Result<Self, /* .. */>;
        //     ^ no generic lifetime parameter here!
    }
    

    By doing so, the read method no longer has a generic lifetime parameter.

    With this new Codec trait, the PayloadU8 implementation becomes:

    impl<'a> Codec<'a> for PayloadU8<'a> {
    //   ^^        ^^                ^^
        fn read(r: &mut Reader<'a>) -> Result<Self, /* ... */> { 
            //                 ^^
            let len = u8::read(r)? as usize;
            let mut sub = r.sub(len)?;
            let body = sub.rest(); // <- no allocation!
            Ok(Self(body))
        }
        // ..
    }
    

    Now we are able to write decoding functions that do not require heap allocations!

    By the way, adding a lifetime parameter to the trait is the same approach serde::Deserialize uses to be able to deserialize into slices and other borrowed data structures. It can be hard to realize that, however, because serde has more generics (the Deserializer trait lets you deserialize not just from byte slices but also from strings, etc.) and a much larger API surface (it uses the visitor pattern to ease the implementation work done by downstream users).

    Fewer allocations, more problems

    After changing the Codec interface, all the implementations needed to be updated. Those updates were not unlike the change in the PayloadU8 implementation so I won't list them here as they don't add anything new.

    However, there was code outside of those implementations that broke and deserves some attention. The easiest code to analyze is the test_enum8 function, which is a test helper:

    fn test_enum8<T: Codec>(first: T, last: T) {
        // ..             ^ lifetime parameter is missing
    
        let mut buf = Vec::new();
        // ..
    
        let t = T::read_bytes(&buf).unwrap();
        // ..
    }
    
    trait Codec<'r>: Sized {
        fn read(r: &mut Reader<'r>) -> Result<Self, /* .. */>;
    
        // unveil this method because the signature matters
        fn read_bytes(bytes: &'r [u8]) -> Result<Self, /* .. */> {
            //                ^^
            let mut reader = Reader::init(bytes);
            Self::read(&mut reader)
        }
    
        // ..
    }
    

    This unmodified version of the function won't compile because the Codec trait in the trait bound of T is missing its lifetime parameter. The compiler helpfully provides two alternatives to fix the issue:

    help: consider making the bound lifetime-generic with a new `'a` lifetime
       |
    NN | fn test_enum8<T: for<'a> Codec<'a>>(first: T, last: T) {
       |                  +++++++      ++++
    help: consider introducing a named lifetime parameter
       |
    NN | fn test_enum8<'a, T: Codec<'a>>(first: T, last: T) {
       |               +++         ++++
    

    If you use the trial and error approach, you'll find out that the first alternative works and the second, doesn't, but why is that?

    Take 1: adding a lifetime parameter to test_enum8

    If we try the second alternative:

    fn test_enum8<'a, T: Codec<'a>>(first: T, last: T) {
        // ..     ^^           ^^
    
        let mut buf = Vec::new(); // start of 'buf
        // ..
    
        let t = T::read_bytes(&buf).unwrap();
        //                    ^^^^ error 
        // ..
        // end of 'buf
    }
    

    We get this compiler error: "borrowed value does not live long enough". The rest of the error message tells us that read_bytes requires that &buf has lifetime 'a but buf has a smaller lifetime.

    We used the lifetime parameter 'a in the trait bound T: Codec<'a> and the declaration of Codec<'a>::read_bytes (shown below) requires that its input parameter has type &'a [u8]. Those two facts create the requirement that &buf must be coerce-able into &'a [u8].

    trait Codec<'r>: Sized {
        //      ^^
        fn read_bytes(bytes: &'r [u8]) -> Result<Self, /* .. */> {
        // ..                 ^^
    }
    

    But why is that requirement a problem? test_enum8's generic lifetime parameter 'a is unconstrained because it does not appear in any of the input parameters of the function. Therefore, the caller of the function is free to pick any value for it. For example, the caller could instantiate 'a to 'static:

    test_enum8::<'static, _>(first, last);
    

    &buf certainly won't have 'static lifetime because its lifetime lies within the span of the test_enum8 function. Now you see the problem: &buf must satisfy the lifetime constraint 'buf: 'a; that is 'buf must outlive an arbitrary 'a chosen by the caller and that can't be satisfied because buf is destroyed within the body of test_enum8.

    That being said, with a tweak in the function body you can make this alternative work:

    fn test_enum8<'a, T: Codec<'a>>(first: T, last: T) {
        // ..
    
        let mut buf = Vec::new(); 
        // ..
    
        let boxed_buf: Box<[u8]> = buf.into_boxed_slice();
        let static_buf: &'static mut [u8] = Box::leak(boxed_buf);
        let t = T::read_bytes(static_buf).unwrap();
        // ..
    }
    

    By leaking buf we turn it into a slice with static lifetime, static_buf, and then pass that into read_bytes. This will certainly satisfy the constraint 'static: 'a where 'a is an arbitrary lifetime because the "longest" a lifetime can ever be is 'static.

    However, even if this is just test code, we should not be leaking memory just to appease the borrow checker.

    Solution: HRTB in test_enum8

    How come the first alternative works? What is this for <'a> syntax and what are its limitations?

    fn test_enum8<T: for<'a> Codec<'a>>(first: T, last: T) {
        // ..            ^^        ^^
    
        let mut buf = Vec::new(); // start of 'buf
        // ..
    
        let t = T::read_bytes(&buf).unwrap();
        // ..
    }
    

    T: for<'a> Codec<'a> is a Higher-Ranked Trait Bound and what it's basically saying is that the type T implements the Codec<'a> trait for each and every possible value of the generic lifetime parameter 'a.

    How is this different than the previous option? If T implements Codec<'a> for any value of 'a then it certainly implements Codec<'buf>; 'buf being the concrete (non-generic) lifetime of the local variable buf. This time there's no constraint on what the lifetime of buf must be, to be able to use read_bytes so the borrow checker won't complain.

    Limitations

    This HRTB alternative is good enough for the usage of this generic function because the function is only ever instantiated with C-like enum types like the one shown below.

    enum AlertLevel {
        Warning,
        Fatal,
    }
    
    impl Codec<'_> for AlertLevel {
        // ..
    }
    
    // NOTE: `'_` is short-hand for this
    // impl<'a> Codec<'a> for AlertLevel { /* .. */ }
    

    Importantly, these types implement the Codec trait for every possible lifetime parameter; that's what the impl Codec<'_> syntax means.

    The limitation of this version of test_enum8 is that it can not be instantiated with types that contain borrowed data. Consider this made-up generic structure and its accompanying Codec implementation:

    struct Slice<'a> {
        bytes: &'a [u8],
    }
    
    impl<'a> Codec<'a> for Slice<'a> {
        // ..
    }
    

    A concrete instance of this type implements the Codec trait for that single, concrete lifetime. For example, the type Slice<'static> only satisfies the bound: Slice<'static>: Codec<'static>. Therefore, a concrete type won't satisfy the bound T: for<'a> Codec<'a>. Therefore, it cannot be used with the test_enum8 function:

    fn fail(first: Slice<'static>, last: Slice<'static>) {
        test_enum8(first, last);
        //~^ error implementation of `Codec` is not general enough
    }
    

    The more detailed error message says

    Codec<'0> would have to be implemented for the type Slice<'_>, for any lifetime '0 but Codec<'1> is actually implemented for the type Slice<'1>, for some specific lifetime '1

    In the message '1 refers to the concrete lifetime 'static and '0 refers to the "any" lifetime 'a in test_enum8's trait bound: T: for<'a> Codec<'a>.

    Wrap-up

    That concludes the second blog post of this series.

    In the first one, we removed one allocation in the first decoding stage of the pipeline, which may involve decryption, that all records have to go through.

    In this second one, we removed allocations from the second decoding stage that the decrypted payloads have to go through.

    In the next and final post, we'll look into storing borrowed data in trait objects (e.g. Box<dyn MyTrait>) and using Cow-like enums to avoid and/or delay heap allocations.


    If the borrow checker has you in a bind or if you want to optimize the memory usage of your project by switching from heap allocated objects to references, our consulting service can help you with that! Get in touch through our contact form.