In a rust project, I'd like to reduce the amount of llvm IR generated, to speed up compile times (and other benefits, e.g. better icache utilization). To help guide my efforts, I would like to know which source-code functions correspond to the largest amounts of llvm IR (or, as a proxy, actual assembly/machine code).
One big factor here is monomorphization (alongside other factors like inlining). It's hard for me to guess whether a single large non-generic function is generating more llvm IR, or a smaller polymorphic function which produces many monomorphizations (or a medium size generic function with a handful of monomorphizations).
Therefore, I'd like some way to know how much each source-code function contributes to LLVM IR and/or final binary size (or any similar metric). Is this possible?
Not gonna lie... depending on how deep you want to dig, it's going to get complicated.
First pass
Firstly, you should have a look at identifying which part of the compilation process is slow.
The following command will give you an overview of where the time is actually spent:
From there you can see whether the time is spent in the front-end or the back-end, for example, and thus focus your investigation.
LLVM: First order approximation
There are generally tools which allow analyzing a binary (executable, DLL, ...) and extracting sizes is well supported.
On Linux, where binaries use the ELF format, it's relatively easy to extract:
This would point you to large functions, but it won't tell you why they are large... and there things get tricky.
LLVM: Second order approximation
Optimizers optimize, and this may lead to either increasing or decreasing binary size.
Dead Code Elimination will remove code. It's great for the final binary, but it may remove code that took a while to generate, so it'll hide that cost.
Inlining will inline the code of a function in another, which may lead you to misattribute the cost if you stick to just the size of symbols. You'd need to switch to attributing the cost per assembly instruction via DWARF symbols -- hoping the map is complete.
This gets quite complicated, quite fast.
Back to compilation times?
Clang supports the
-ftime-traceintroduced in 9.0 option, and therefore there is support in LLVM to report exactly where the time is being spent (seetimeTraceProfilerInitialize).Unfortunately, Github refuses to return code search results at the moment (not logged in), so I cannot check whether
rustcis wired to make use of this facility.A casual search indicates that you may have some luck combining the unstable (nightly-only, thus)
-Zself-profileoptions with the tools from themeasuremerepository. Most notably, it should be possible to generate a flamegraph out of the profiling data. This may or may not use the aforementionedtimeTraceProfilerInitializefacility.I have not tested those myself, however, so I make no promise.