Optimizing Rust Binary Size
I develop and maintain a git extension called git-req. It enables developers to check out pull requests from GitHub and GitLab by their number instead of branch name. It initially started out as a bash script that invoked Python for harder tasks (e.g., JSON parsing). It worked well enough, but I wanted to add functionality that would have been painful to implement in bash. Additionally, one of my goals was to make it as portable as possible, and requiring a Python distribution be available flew against that. That meant that I needed to distribute this as a binary instead of a script, so I set about finding a programming language to use. After surveying what was available, and determining what would be the best addition to my toolbox, I selected Rust.
The programming language has a steep learning curve, but has been fun to learn and immerse myself within. The community is great, and I'm excited to find more opportunities to use Rust in the future.
The rewrite took a while to accomplish, but when all was said and done, everything worked, and worked well. I was able to implement some snazzy new features as well as polish some rough edges. However, for how "simple" I felt the underlying program to be, it clocked in at 13 megabytes. That felt like a lot. So, I decided to see what could be done.
For those playing along at home, the starting binary size was: 13535712 bytes (12.9MB).
Phase 1: Building
The first thread I pulled was ensuring that the compiler would output code in
such a way that it prioritized disk space over speed. I'm fine with the build
taking slightly more time, as well as with the program being slightly slower
- most commands incur network traffic, so a few extra milliseconds of
CPU time are nothing in comparison. I found two simple additions to my
Cargo.toml got me all I needed:
1. Optimization Level
The optimization level instructs the compiler as to what trade-offs it should
make at compile time. One can opt for longer compile times and larger file size
in exchange for faster run times, or instead request a smaller file size for
longer compile times and slightly slower run times. To turn this knob, add the
[profile.release] opt-level = "s"
0: no optimizations
1: basic optimizations
2: some optimizations
3: all optimizations
"s": optimize for binary size
"z": optimize for binary size, but also turn off loop vectorization.
The docs encourage experimentation - I strongly suggest heeding that guidance.
My initial guess was
"z", which seemed like the most extreme option. After
testing all possible values, it turned out
"s" resulted in the smallest
New binary size: 12832464 bytes (12.2MB).
2. Link Time Optimization (LTO)
Link Time Optimization is an optimization phase that the compiler carries out
where it assesses the entire program (instead of an individual file) to
determine if there are optimizations to be made (e.g., removing dead code). To
enable it, add the following to
[profile.release] lto = true codegen-units = 1
This instructs the Rust compiler to apply a "full" set of optimizations only
when building for release. Possible
false: LTO only across the crate or its codegen units.
"fat": LTO across all crates in the dependency graph.
"thin": similar to
"fat", but faster to run while offering similar gains to
"off": No LTO
codegen-units setting limits how many pieces the compiler may split the
crate into in order to optimize build parallelization. One of the great things
Rust's borrow checker enables is fearless parallelization, which it and its
tooling exploit. By setting this value to
1, I was able to ensure that the
linking phase would not parallelize, and instead consider the full codebase,
thus ensuring that the code was properly optimized (at the expense of longer
New binary size: 8338640 bytes (8.0MB).
62% of the original binary size. Nice, but why stop there?
Phase 2: Trimming the Fat
Now that we've made the compiler play nicely, where else can we get some gains? Since Rust ships with a fairly minimal standard library, developers rely on its robust package ecosystem for things like JSON serialization and HTTP requests. One issue with this is that external dependencies are the primary vector for bloat in any application. If only there were a way to measure such bloat in a Rust application...
... oh wait, there is: cargo-bloat.
Running it against git-req with the
--release --crates flags outputs:
File .text Size Crate 4.4% 11.3% 360.4KiB reqwest 3.8% 9.6% 306.0KiB std 3.3% 8.4% 267.5KiB clap 3.2% 8.1% 259.5KiB regex 2.9% 7.5% 237.9KiB regex_syntax 2.6% 6.5% 208.3KiB [Unknown] 2.4% 6.1% 193.3KiB rustls 1.3% 3.2% 103.1KiB goblin 1.2% 3.2% 101.3KiB backtrace 1.2% 3.2% 100.4KiB libgit2_sys 1.1% 2.9% 93.3KiB yaml_rust 1.1% 2.7% 86.6KiB git_req 1.1% 2.7% 85.8KiB ring 1.0% 2.7% 84.6KiB unicode_normalization 0.9% 2.3% 74.0KiB object 0.9% 2.2% 70.0KiB h2 0.7% 1.7% 55.0KiB hyper 0.5% 1.3% 41.6KiB http 0.5% 1.3% 41.5KiB duct 0.5% 1.3% 40.1KiB term 4.4% 11.3% 361.4KiB And 79 more crates. Use -n N to show more. 39.1% 100.0% 3.1MiB .text section size, the file size is 8.0MiB
Wow - git-req (
git_req) only accounts for 1.1% of the file, with a long tail
of crates bringing up the rear. More interestingly, there are a few crates that
dominate the file size. Let's tackle the big one:
As someone who does a lot of Python development, when I first started with Rust I wanted something that mimicked the ergonomics of the popular Requests library. The phonetically-similar reqwest offers just that. Unfortunately, it was pretty big, and there appeared to be a lot of the library that I wasn't using, nor was I planning on using. With those two points, I recognized that this library was a prime candidate for replacement.
I started with this
that discussed the merits of various HTTP crates available to developers. Of
importance to me were those that had: rustls support, serde support, minimal
use of unsafe, and a sane API. Based on those criteria,
ureq checked all the
Replacing reqwest with ureq was fairly straightforward. Let's see how this manifests in file size...
File .text Size Crate 3.9% 10.5% 267.5KiB clap 3.8% 10.1% 258.2KiB regex 3.5% 9.3% 237.2KiB regex_syntax 3.4% 9.1% 231.3KiB std 3.1% 8.2% 208.2KiB [Unknown] 2.7% 7.2% 182.4KiB rustls 1.5% 4.0% 103.1KiB goblin 1.5% 4.0% 101.4KiB backtrace 1.5% 3.9% 100.4KiB libgit2_sys 1.4% 3.8% 95.9KiB yaml_rust 1.3% 3.6% 91.6KiB git_req 1.2% 3.3% 84.6KiB unicode_normalization 1.2% 3.2% 82.3KiB ureq 1.1% 2.9% 74.0KiB object 1.1% 2.8% 71.8KiB ring 0.6% 1.6% 41.6KiB duct 0.6% 1.6% 40.0KiB term 0.6% 1.6% 39.9KiB url 0.6% 1.5% 37.7KiB time 0.4% 1.0% 26.7KiB rustc_demangle 2.3% 6.2% 158.7KiB And 43 more crates. Use -n N to show more. 37.4% 100.0% 2.5MiB .text section size, the file size is 6.6MiB
Wow, the piece of functionality that was the biggest offender is now not even in the top 10.
Binary size: 6967472 bytes (6.7MB).
Phase 3: Things I'm Not Comfortable Doing Yet
In my research, I stumbled upon some optimizations that I wasn't comfortable applying yet - mostly because I want to spend some time to ensure there won't be any runtime implications for git-req.
Most rendered binaries have to ship with content to support all possible use-cases for the application. If you know how a binary will be used, you can apply post-processing to it to strip out the unnecessary portions. The strip tool is one of the more popular utilities that does this. Applying it to git-req yields substantial savings: 4718240 bytes (4.5MB)! Why wouldn't I want to ship this immediately? One word: backtraces.
When a release-grade Rust application panics, if the
environment variable is set to
1, the application will print out a backtrace
before it dies. This is immensely useful for debugging, and, given the amount
of variance in the environments this application is running, I feel that
playing file size golf at the expense of supportability is out of the
question... for now.
Rust has the concept of
allow developers to explicitly modify parts of the application at compile-time.
In the case of git-req, the
color-backtrace library is especially useful to
me, the program's author, because scrutinizing backtraces is a regular part of
my workflow. Hopefully, this is significantly less of a problem for end-users,
so whatever benefit they may get is minimal, at best. I could update this to be
hidden behind a feature flag, enabling me to not ship the library. Given it
isn't in the top 100 contributors to bloat in git-req, I consider if not worth
the effort to implement and maintain.
Question everything - gains are to be had. Check out