Blog coding article

SBOM support in Rust

Sebastian
Article

SBOM support in Rust

Collaboration with the client Stackable on a STF funded project to improve SBOM support in the Rust eco-system.

Published on 14 min read

    Introduction

    Last year Lars Francke, the co-founder and CTO of Stackable, approached us with a project proposal to improve the generation of Software Bill of Materials (SBOM) in the Rust eco-system. One particular task was to find a way to expand cargo, Rust's package manager, to provide all essential build information to generate SBOM files. Another tool we were asked to work on was the cyclonedx-rust-cargo crate, a plugin for cargo to generate BOM files for Rust projects using the CycloneDX format.

    The project itself was funded by the Sovereign Tech Fund (STF), a program from the "German Federal Ministry for Economic Affairs and Climate Action" (website). The STF aims to fund Open Source projects to promote and secure fundamental technologies in Software. At Ferrous Systems we have worked on other STF funded projects, such as Rustls in collaboration with Prossimo, and currently hickory-dns with ISRG.

    The work was made possible by being selected for two rounds of funding by STF's Contribute Back Challenges. The first round started in September 2023, with the goal to improve cargo support for SBOMs and to expand support for version 1.4 of CycloneDX in the cyclonedx-rust-cargo crate. Thankfully another collaborator, Sergey "Shnatsel" Davidoff, maintainer of the cargo-auditable crate and leader of the Rust Secure Code Working Group joined the collaboration as well.

    We were really excited about this project, the summary is the collaboration worked pretty well and we achieved nearly all tasks. During the process we learned a lot about the cargo project, its RFC process, what worked and which steps were necessary to improve the situation for SBOMs. Even though we did not fully achieve all the tasks by the end of the second round, we recognized the impact the work had quite immediately. Thanks to Lars's efforts, the topic of SBOM support was brought directly to the Cargo team. The continued discussion partially informed a Pre-RFC, which was published by someone from the community in November of 2023.

    Not being able to complete all tasks was also expected, as we had to follow the RFC process in the Cargo project. In some cases the RFC process can take anywhere from a few months to several years.

    Toward the end of round one, we applied for round two of STF's Contribute Back Challenges. With the learnings from round one we were able to set out further tasks for round two. Thankfully the application was selected shortly after and we were able to continue the work in January of this year.

    SBOM & CycloneDX

    What is an SBOM and why is it important? A Software Bill of Materials (or SBOM) declares, among other things, the inventory of all components used to build the software artifacts, as part of the software supply chain. Using this information can help detect vulnerability / security issues with the software or determine all conflicts in used licenses. A major reason to provide SBOMs for software in Germany is that the Federal Office for Information Security highly recommends them as part of their technical guidelines for Cyber Resilience (see PDF for details).

    In recent years a number of pieces of legislations have been passed to improve cybersecurity. For example the US issued an Executive Order on improving the Nation's Cybersecurity. In Europe, the EU has proposed the Cyber Resilience Act (CRA) to improve cybersecurity and cyber resilience. These efforts are in response to an increased number of cyber attacks in recent years.

    Currently, there are two popular competing specifications for SBOMs, CycloneDX & SPDX. SPDX 2.2.1 is an ISO standard and CycloneDX 1.6 is an Ecma standard on track to be standardized as an ISO standard as well.

    The Work

    The work was split into two rounds, with a handful of tasks in each round. The following sections do not list all the work done, but should give a good overview of what has been achieved in our collaboration.

    Ownership & Governance

    The main project we worked on was the cyclonedx-rust-cargo project, a cargo plugin to generate BOM files for Rust projects. The project was in an abandoned state mid-2023. In addition, its library was not able to produce valid CycloneDX SBOM files. Unfortunately the original author did not respond to any inquiries regarding the project. Thankfully the OWASP foundation, the developer of the CycloneDX specification, stepped in and was able to help.

    Lars and Sergey got maintainer access to the repository and fixed the first technical issues until the project was able to generate SBOM files. The project had support for version 1.3 of the CycloneDX specification at this point. The documentation was updated, and a contributing guide was added according to OWASP CycloneDX guidelines. Thanks to Lars & Sergey the project is now actively maintained and has even seen contributions from the community since.

    SBOM support in cargo

    One of our tasks was to evaluate the current state of cargo, Rust's package manager, and specifically how to extract relevant information from a Rust project to build SBOM files. cargo has a number of commands to expose information about a project, most importantly cargo metadata. This command collects information on workspaces, the resolved dependencies and the current package. Unfortunately the command has certain restrictions & limitations, it has no native support for the following:

    • hashes or checksums to check integrity / authenticity
    • the final set of resolved dependencies is not exposed, e.g. when using features, which can result in overreporting dependencies
    • different cargo commands may resolve to different dependencies
    • user needs to specify target platform manually
    • does not provide information on generated artifacts
    • build configuration is not recorded

    One naive approach early on was to simply expose the existing checksum via the cargo metadata command (see issue GH 12818). Even though this was not merged, it contributed to the discussion with the Cargo team, which Lars has initiated earlier. We used the Cargo Office Hours to meet with the Cargo team a few times. A big thanks to Lars who raised awareness & highlighted the importance of SBOM support in Rust. This task of round one has not been completed, but it gave us a lot of insight and learnings, and became a top priority for the second round of the project.

    Rust and Cargo follow an RFC process where substantial changes require a proposal to allow the community to discuss & accept any changes. Interestingly, other Cargo members were already discussing SBOM support in cargo, so we wanted to know if a collaboration was feasible. In particular, Arlo Siemsen (arlosi) compiled a draft Pre-RFC to gather initial feedback, which was published in November 2023 in the internal Rust forum. In early January of this year, Arlo published a refined version of the RFC to get more specific feedback. Shortly after, we met with them to discuss what a proof of concept could look like. With the timely RFC proposal in place we were able to come up with a more concrete plan on how to add SBOM support in cargo based on the RFC for round two.

    The main goal of the RFC is to provide a way to emit internal information for building pre-cursor files alongside compiled artifacts. Each compiled artifact then has an accompanying JSON file that contains information on resolved dependencies, build configuration, environment variables, the compiler, etc. These generated pre-cursor files are not complete SBOMs in themselves, but provide important information that external tools can use to build full SBOM files with. cargo currently does not expose all information necessary to build these complete SBOM files. Therefore, the new logic proposed in the RFC would expose all this information directly from rustc, the Rust compiler, as JSON files created at build-time. The mechanism is similar to how Cargo emits dep-info (.d) files.

    The proof of concept resulted in a PR, which received positive feedback. At the time of writing, it has not been merged. The new SBOM feature is marked as experimental. Once the PR is merged, users are required to use the cargo nightly version and need to opt-in to enable the feature. This is to get real world experience with the new feature before stabilising it, which would limit the freedom to improve it without breaking a large number of existing projects. Once it's deemed useful enough, the work does not necessarily stop; it requires follow-up efforts to improve the feature and stabilize it.

    SBOM support in cyclonedx-rust-cargo

    When we started work on the cyclonedx-rust-cargo project it provided support for CycloneDX version 1.3 only. One task in round one was to add support for version 1.4. Thanks to the community member "tokcum", there was already a Pull Request open, which we incorporated into the repository with slight changes in a separate PR. The main difference is that we omitted a few huge files that added little benefit. A few minor things were missing to fully support version 1.4, but these have been amended over time, for example by adding signatures.

    Thanks to this community addition, support for version 1.4 was fairly easy to manage. A task we inherited from round one was to provide support for version 1.5 in round two as well. The specification for version 1.5 was released in June 2023. The CycloneDX specification is being developed by OWASP with a release cycle of about 10-12 months. The specification does not follow SemVer which brought its own challenges. For example, a new version of the specification could mean that some parts become deprecated, that fields needed to represent multiple different types, that new fields were added or that existing enum types were expanded with new values, etc. This led to a few challenges on how to organize the code inside the repository itself and how to cope with the complexity of adding a new, slightly incompatible version.

    The general structure of the cyclonedx-rust-cargo repository is that the workspace consists of two major crates:

    • The Cargo plugin cargo-cyclonedx that adds a new cargo command
    • cyclonedx-bom, which implements parsers and types for the specification

    Before we added support for version 1.5, we recognized that simply adding another version would increase the amount of duplicated code. The idea was to share common code between all versions, while being able to support new versions in the future. The bigger the difference between versions of the specification, the more difficult this approach becomes, because then fewer types can be shared.

    For the second round, we set out to reduce code duplication on one hand, while making it easier to support multiple versions in the future on the other hand. The internal structure of the cyclonedx-bom crate contains a sub folder for each supported version, e.g. src/specs/v1_4 for version 1.4. Each spec folder contains all types that are related to this version. The parser logic for all the different versions transforms the input, either XML or JSON, into a more general representation of the BOM. All spec-related types are later represented by models (in src/models) that need to represent all different possible versions. The important property of all models is that they need to represent all supported specifications. The models are kind of an intermediate format, and can be serialized into different versions. But this also makes adding new versions more complex each time, because each new version may add, remove or change existing types. Even though we completed the task to support version 1.5, we also considered this one of the major sources of issues for future versions.

    When we investigated existing crates to help us reduce commonly shared types, we did not find a valuable solution that worked for this project. Therefore we followed a different approach to minimize code duplication. The new cyclonedx-bom-macros helper crate introduces a versioned macro to annotate existing code to declare types, fields and logic to exist for different versions. The macro's goal was to reduce the amount of code to write, it still would generate the code under the hood, but it kept all types in their own version-specific namespace, e.g. v1_3 or v1_4. Check the crate's README for further details on usage.

    Using the versioned macro worked pretty well for types that share a lot of code, but with each significant change between versions of the CycloneDX specification it would required us to re-evaluate the existing code. A side effect was that the annotation using the macro worked fine for two versions initially, but adding support for the next version could easily add significant code & mental overhead to make it work as well.

    In the end, it turned out to be a trade-off between reduced code duplication & increased mental complexity to understand which part of the code is interacting with which version of the specification. We still believe it's a good option to reduce code duplication. It is definitely logic that needs to be re-evaluated over time when more versions are supported. It may become more difficult to write code, and we also realized rust-analyzer had a harder time & did not work properly.

    Validation

    The cyclonedx-bom crate can be used to parse SBOM files in JSON or XML format. The CycloneDX specification repository provides schemas for both these formats. There are differences between the two formats, which requires the code to be structured in a way to accommodate for. The JSON serialization logic is done using the serde_json crate. The logic is fairly straightforward by annotating all types & fields appropriately. The XML serialization logic on the other hand is logic that is responsible for a huge chunk of the code base. The cyclonedx-bom crate uses xml-rs, a low level library to read & write XML documents. We consider the XML serialization code another source for improvement. Currently it contains a lot of similar patterns & boiler plate code.

    After parsing SBOM files, the resulting BOM model can be validated, with all errors aggregated into an inspectable list. For example, validations check that an enum field only contains a set of allowed values or that strings match a specific pattern. During round one, we recognized that the validation logic was not designed in a way to support multiple versions because, at that time, only version 1.3 was supported. Another drawback of the existing validation logic was that each validate call built its own representation of where in the BOM tree a validation failed, which resulted in a lot of repetitve code. Lastly, the validation API was designed in a way to return a Result<...> type, while all validation errors were aggregated without returning Err at all. Previously, when a BOM was validated, the final result was a list of failed validations, sometimes with false information about where in the BOM it happened or lacking details about the cause.

    One of the proposed tasks for the second round was to refactor the validation logic to design a more appropriate API that would take the specification version into account and would require less effort to locate a failed validation in the BOM. We wanted to reduce the repetitive code, while providing the means to build better validation diagnostics. The result can be found in this PR. The existing validation code already provided all the necessary information. The crucial part was to not only refactoring the validation logic but also updating the test code, which turned out to be the bigger task. The test code now checks the BOM hierarchy more consistently and in a cleaner way. We consider the refactor a success, because all existing high-level tests passed and the reduced amount of code turned out to be easier to read and understand.

    As a last piece of the puzzle, we wanted to check that all generated SBOM files pass validation with their associated schema files. This turned out to be fairly straightforward for the JSON schemas (see this PR). We were really happy when we realized that all generated JSON files passed their associated schema files. The situation looked different for the XML format, because there is no good solution currently available in the Rust eco-system to validate XML files with their schema. Nevertheless, users have the chance to use the API to validate generated JSON files against the associated schema to diagnose errors.

    Error Handling

    Another task we proposed for round two was to see how to improve error handling & detection. As mentioned above, the CycloneDX foundation provides schema files for both XML and JSON formats and for all versions. At the time of writing the last supported version is CycloneDX 1.6, released in April of this year.

    The CycloneDX specification repository not only contains all schema files, but it also provides a lot of sample files for both formats that are either to be expected valid or invalid. It's somewhat hidden in the repository; for example, check the folder with samples for version 1.5. This turned out to be a huge benefit when adding a new version. While adding support for version 1.5, we copied the sample files to the appropriate test folder. Thankfully the cyclonedx-rust-cargo crate already had tests in place to check that all files starting with invalid-* were expected to fail validation, while all valid-* prefixed files were expected. We achieved the goal of passing validation for all existing sample files & versions. During work on round two, we discovered a few cases where the samples did not fully cover the validation logic, and added further sample files.

    cargo auditable

    We would like to highlight another important contribution from Sergey that was part of the funded project. Sergey is the maintainer of the cargo-auditable crate. Over time, changes in cyclonedx-rust-cargo and Cargo have the chance to significantly improve the tooling around security and SBOMs in the Rust eco-system. Sergey is also the author of the RFC that allows Cargo to embed the dependency versions into the compiled binary directly.

    The cargo auditable crate allows embedding project-specific build information into the resulting binary artifacts. This capability allows other tools such as cargo audit or trivy to read this part of the binary. These tools can determine the specific crate versions used to build the artifact as well as can scan binaries for known bugs or security vulnerabilities from official security databases while the overhead to the binaries is negligible.

    Thanks to Sergey for his feedback, guidance and the numerous code reviews in the cyclonedx-rust-cargo repository. It really helped us gain a better understanding of the project and the impact on the tooling around security in the Rust eco-system.

    Conclusion

    The second round officially ended in April. Most of the tasks have been completed, with only a few small PRs left to complete version 1.5 support. Due to the RFC process in the Cargo project, the PR with the proof of concept implementation has not been merged yet, but reception has been good and the feedback was very helpful. The main goal for this PR is to ensure that the code integrates all feedback, that it follows coding guidelines, and that it gets to a state where it can be merged.

    The cyclonedx-rust-cargo repository is now actively maintained thanks to Lars & Sergey. It has also seen contributions form other users, either as new issues or pull requests. The library has been adopted by other projects as well. As one example, the genealogos project, a tool that generates BOM files from Nix evaluation, uses the cyclonedx-bom crate as part of their tooling.

    Working on these Open Source crates has been quite gratifying. For one, the tasks set out in both rounds have been met with one exception - the changes since last year have transformed the cyclonedx-bom generator crate into a production-ready library that has seen some adoption in other tools. The cyclonedx-rust-cargo project is currently the only viable option for CycloneDX support in the Rust community.

    Thanks to Lars for getting this project funded and trusting us with the implementation. Thanks to Sergey for his feedback, input, code reviews and his expertise in general. Thanks to the Soverein Tech Fund to fund this type of project. Regular sync calls every other week helped us plan new work and provided status updates, while asynchronous communication over text chat helped us to quickly address particular issues and implementation details.

    A special shout-out to Ferrous System's project manager, Leslie, who did an excellent job scheduling work for the engineering team while balancing individual responsibilities and involvements with other projects. Last but not least, kudos to the software engineers working on the project; Tshepang Mbambo in round one and Christian Poveda in round two, and Sebastian Ziebell, who worked on the project throughout the entire funding period.