My involvement with LLVM 19

LLVM 19 will soon be released. This post provides a summary of my contributions in this release cycle to record my learning progress.

LLVM binary utilities

Hashing

I optimized the bit mixer used by llvm::DenseMap<std::pair<X, Y>> and llvm::DenseMap<std::tuple<X...>>. llvm/ADT/Hashing.h, used by StringRef hashing and DenseMap, was supposed to be non-deterministic. Despite this, a lot of code relied on a specific iteration order. I made multiple fixes across the code base and landed [Hashing] Use a non-deterministic seed if LLVM_ENABLE_ABI_BREAKING_CHECKS to improve test coverage (e.g. assertion builds) and ensure future flexibility to replace the algorithm.

The change has a noticeable code size reduction

1
2
3
4
5
6
7
8
9
# old
movq _ZN4llvm7hashing6detail19fixed_seed_overrideE@GOTPCREL(%rip), %rax
movq (%rax), %rax
testq %rax, %rax
movabsq $-49064778989728563, %rcx # imm = 0xFF51AFD7ED558CCD
cmoveq %rcx, %rax

# new
movabsq $-49064778989728563, %rcx

... and significant compile time improvement.

I optimized DenseMap::{find,erase}, yielding compile time improvement.

Optimizations to the bit mixer in Hashing.h and the DenseMap code have yielded significant benefits, reducing both compile time and code size. This suggests there's further potential for improvement in this area.

However, the reduced code size also highlights potential significant code size increase when considering faster unordered map implementations like boost::unordered_flat_map, Abseil's Swiss Table, and Folly's F14. While these libraries may offer better performance, they often come with a significant increase in code complexity and size.

Introducing a new container alongside DenseMap to selectively replace performance-critical instances could lead to substantial code modifications. This approach requires careful consideration to balance potential performance gains with the additional complexity.

NumericalStabilitySanitizer

NumericalStabilitySanitizer is a new feature for the 19.x releases. I have made many changes on the compiler-rt part.

Clang

Driver maintenance

Options used by the LLVM integrated assembler are currently handled in an ad-hoc way. There is deduplication with and without LTO. Eventually we might want to adopt TableGen for these -Wa, options.

Others:

Code review

I reviewed a wide range of patches, including areas like ADT/Support, binary utilities, MC, lld, clangDriver, LTO, sanitizers, LoongArch, RISC-V, and new features like NumericalStabilitySanitizer and RealTimeSanitizer.

To quantify my involvement, a search for patches I commented on (repo:llvm/llvm-project is:pr -author:MaskRay commenter:MaskRay created:>2024-01-23) yields 780 results.

Link: My involvement with LLVM 18