With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody. — Hyrum's Law
In a compiler, the most common form of Hyrum's Law is dependence on
unspecified behavior — hash bucket order, the order of equal
elements after std::sort, padding offsets. The same framing
covers a few cases that are technically undefined behavior (use of an
invalidated iterator) or plain incidental properties (ABI struct layout,
ELF section offsets).
When the compiler itself harbors such a dependency, the symptom is
usually output that varies build-to-build: an unstable sort that lands
differently after the standard library changes, a hash map whose
iteration order shifts when the hash function does. Occasionally the
variation is run-to-run within a single build —
DenseMap<void *, X> keys with an ASLR-derived seed
reorder buckets each invocation. Either way, reproducible builds,
bisection, and bug reports all assume same input → same output, and a
stealth Hyrum dependency breaks that.
This post surveys some mechanisms that perturb the contract's blind spots so dependencies cannot quietly form.
Hash seed perturbation
The first line of defense is the hash function itself.
llvm/include/llvm/ADT/Hashing.h:
1 | inline uint64_t get_execution_seed() { |
The seed XORed into every llvm::hash_value is the
runtime address of install_fatal_error_handler — under
ASLR, different every process. The header comment is explicit:
the seed is non-deterministic per process (address of a function in LLVMSupport) to prevent having users depend on the particular hash values.
Every hash_combine / hash_integer_value
call picks up the seed, and every DenseMap<K, V>
keyed by a hash_value-using type then reorders its buckets
per run. MD5, BLAKE3, SHA1, SHA256 stay byte-stable — those are the
right tools when you actually want a digest.
My commit
ce80c80dca45 introduced the seed in 2024.
Container iteration order
Code can grow dependencies on the iteration order.
LLVM_ENABLE_REVERSE_ITERATION walks hash containers
backwards to flag violations.
llvm/include/llvm/Support/ReverseIteration.h:
1 | template <class T = void *> constexpr bool shouldReverseIterate() { |
DenseMap flips its BucketItTy to
std::reverse_iterator<pointer>;
SmallPtrSet swaps begin() and
end(); StringMap bitwise-NOTs the hash before
bucket selection — the only thing that perturbs StringMap,
since its hash bypasses get_execution_seed.
Unlike the hash seed, reverse iteration isn't auto-on with
assertions; -DLLVM_REVERSE_ITERATION=ON opts in explicitly.
In 2026 has already merged fixes triggered by it: 7f703cabf728
(MLIR SSA-value completion order), 0b3afd35c41d
(MLIR SROA alloca order), and f5e2c5ddcec7
(a clang test).
Iterator invalidation
Orthogonal to iteration order: what happens to an existing iterator
after a mutation. llvm/include/llvm/ADT/EpochTracker.h:
1 | class DebugEpochBase { |
DenseMap and friends inherit from
DebugEpochBase. Mutations bump the epoch; iterators capture
it at construction and assert on mismatch. The destructor bumps too, so
stale iterators into destroyed containers assert rather than read freed
memory.
Without it, mutate-during-iteration "happens to work" depending on
bucket layout — and bucket layout is what the hash seed and reverse
iteration above perturb. The epoch check turns the latent bug into a
clean assert regardless of which "lucky" layout the run lands on.
Collapses to a no-op under NDEBUG.
Pre-shuffling unstable sorts
The same defensive pattern shows up twice in the monorepo, in different sub-projects, years apart.
llvm::sort under
EXPENSIVE_CHECKS
llvm/include/llvm/ADT/STLExtras.h:
1 |
|
std::sort and qsort are unstable; code
observing the order of equal elements is depending on undocumented
behavior. Pre-shuffling makes that observation different every run. commit
5a3d47fabcb6 added the wrapper in 2018, motivated by PR35135.
LLVM also ships its own llvm::shuffle rather than
calling std::shuffle, "so that LLVM behaves the same when
using different standard libraries." A reproducibility tool whose
reproducibility depends on the host stdlib is worse than no tool — and
the linker section below relies on this.
llvm::stable_sort deliberately does not pre-shuffle; it
is the explicit opt-in for code that legitimately needs ordering of
equal elements.
libc++
_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY
libc++ has a near-perfect parallel mechanism, designed for downstream
users rather than the project's own internals.
libcxx/include/__debug_utils/randomize_range.h:
1 | template <class _AlgPolicy, class _Iterator, class _Sentinel> |
Three callsites:
std::sort— pre-shuffles the input.std::partial_sort— pre-shuffles the input and re-shuffles the unsorted tail afterward.std::nth_element— pre-shuffles, then re-shuffles each side of the partition.
Seed handling rhymes with get_execution_seed: ASLR or
static std::random_device for per-process variation, with
_LIBCPP_RANDOMIZE_UNSPECIFIED_STABILITY_SEED=<n> as a
fixed-seed escape hatch. Off by default; C++11 and later only.
libcxx/docs/DesignDocs/UnspecifiedBehaviorRandomization.rst
explains the motivation:
Google has measured couple of thousands of tests to be dependent on the stability of sorting and selection algorithms. As we also plan on updating (or least, providing under flag more) sorting algorithms, this effort helps doing it gradually and sustainably.
It cites PR20837 — a
worst-case O(n²) std::sort — as the upgrade
libc++ specifically wanted to ship. The shuffle is the gating tool: if
downstream tests pass with it enabled, they will pass after the
algorithm change too.
Comparing the two is more interesting than either alone:
llvm::sort's wrapper is internal hygiene: LLVM is its own primary user, so the shuffle lives inSTLExtras.hbehind a build flag with no docs.- libc++'s wrapper is user-facing —
DesignDocs/page, public macro, public seed override, explicit "Patches welcome." invitation. It has to be: libc++'s users are not libc++, and the contract being defended is the C++ standard itself. - libc++ generalizes the primitive:
__debug_randomize_rangeapplies at three callsites, each declaring which sub-range the algorithm leaves unspecified. LLVM's wrapper only covers the simpler equal-element case. - Hashed containers —
std::unordered_*iteration order — are unspecified in both, but libc++ does not randomize them. LLVM-the-library does; on this one surface LLVM is ahead of its own stdlib.
Linker
output: --shuffle-sections and
--randomize-section-padding
Two ELF-only lld knobs perturb layout details that no contract covers.
--shuffle-sections=<glob>=<seed>
lld/ELF/Writer.cpp:
1 | for (const auto &patAndSeed : ctx.arg.shuffleSections) { |
Three regimes in one option:
seed = -1— deterministic reverse, stable even as new sections appear. Glob.init_array*to-1, rebuild, run the test suite: anything that breaks is a real static-init-order bug. One flag, no Frankenstein link script.seed > 0— deterministic random shuffle, reproducible across runs and hosts (becausellvm::shuffleis host-independent). Useful in CI without breaking bisection.seed = 0—std::random_device()-seeded. Fresh nondeterminism every link.
History: 423cb321dfae
introduced the =-1 reverse mode; 16c30c3c23ef
generalized to per-glob seeds, which is what makes the
.init_array*=-1 recipe possible; c135a68d426f
fixed a bug where the feature itself produced an invalid dynamic
relocation order — even Hyrum mitigations have correctness traps.
--randomize-section-padding=<seed>
The sister option perturbs section offsets by inserting
padding between input sections and at segment starts
(lld/ELF/Writer.cpp):
1 | static void randomizeSectionPadding(Ctx &ctx) { |
Callers grow dependencies on padding-induced offsets the linker never promised — profile-guided pipelines, side-channel research, exploit toolchains pinning to specific addresses. A seeded perturbation makes those dependencies visible.
Both options are ELF-only; MachO and COFF ports have nothing equivalent.
ABI break detection
llvm/include/llvm/Config/abi-breaking.h.cmake:
1 |
|
Every TU including the header takes a weak reference to
EnableABIBreakingChecks or
DisableABIBreakingChecks depending on its own build flag.
Mixing the two against the same libLLVM produces an
unresolved symbol at link time. MSVC gets the same guarantee via
#pragma detect_mismatch.
Out-of-tree users routinely compile against headers from one tree and
link against a different libLLVM. Without this gate,
whichever struct layout the link happens to pick silently miscompiles;
with it, the link fails.
What LLVM is not doing
The mechanisms above all target surfaces no stable consumer should care about: bucket order, equal-element sort order, init-array order. Debuggers, profilers, sanitizers, and reproducible-build infrastructure consume those outputs and need them stable.
In some cases, stronger guarantee is only provided with explicit
options. For example, Bitcode and textual IR preserve use-list order
only under -preserve-bc-uselistorder /
-preserve-ll-uselistorder.
A near-cousin: clang's -frandomize-layout-seed /
__attribute__((randomize_layout)). Mechanically the same —
seeded std::shuffle on struct fields — and it does
coincidentally invalidate offsetof dependencies. But the
intent is exploit mitigation, cribbed from GrSecurity's Randstruct GCC
plugin: per-build kernel hardening, not a developer tool.