2026-05-10

Fighting Hyrum's Law in LLVM

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody. — Hyrum's Law

In a compiler, the most common form of Hyrum's Law is dependence on unspecified behavior — hash bucket order, the order of equal elements after std::sort, padding offsets. The same framing covers a few cases that are technically undefined behavior (use of an invalidated iterator) or plain incidental properties (ABI struct layout, ELF section offsets).

When the compiler itself harbors such a dependency, the symptom is usually output that varies build-to-build: an unstable sort that lands differently after the standard library changes, a hash map whose iteration order shifts when the hash function does. Occasionally the variation is run-to-run within a single build — DenseMap<void *, X> keys with an ASLR-derived seed reorder buckets each invocation. Either way, reproducible builds, bisection, and bug reports all assume same input → same output, and a stealth Hyrum dependency breaks that.

This post surveys some mechanisms that perturb the contract's blind spots so dependencies cannot quietly form.

2026-04-12

Recent lld/ELF performance improvements

Updated in 2026-05.

Since the LLVM 22 branch was cut, I've landed patches that parallelize more link phases and cut task-runtime overhead. This post compares current main against lld 22.1, mold, and wild.

Headline: a Release+Asserts clang --gc-sections link is 1.34x as fast as lld 22.1; Chromium debug with --gdb-index is 1.09x as fast. mold and wild are still ahead — the last section explains why.

2026-02-22

Bit-field layout

The C and C++ standards leave nearly every detail to the implementation. C23 §6.7.3.2:

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified

C++ is also terse — [class.bit]p1:

Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.

2026-02-16

Call relocation types

Most architectures encode direct branch/call instructions with a PC-relative displacement. This post discusses a specific category of branch relocations: those used for direct function calls and tail calls. Some architectures use two ELF relocation types for a call instruction:

# i386, x86-64
call foo              # R_386_PC32, R_X86_64_PC32
call foo@plt          # R_386_PLT32, R_X86_64_PLT32

# m68k
bsr.l foo             # R_68K_PC32
bsr.l foo@plt         # R_68K_PLT32

# s390/s390x
brasl %r14, foo       # R_390_PC32DBL
brasl %r14, foo@plt   # R_390_PLT32DBL

# sparc
call    foo, 0        # not PIC: R_SPARC_WDISP30
call    foo, 0        # gas -KPIC: R_SPARC_WPLT30

2026-02-01

lld 22 ELF changes

For those unfamiliar, lld is the LLVM linker, supporting PE/COFF, ELF, Mach-O, and WebAssembly ports. These object file formats differ significantly, and each port must follow the conventions of the platform's system linker. As a result, the ports share limited code (diagnostics, memory allocation, etc) and have largely separate reviewer groups.

With LLVM 22.1 releasing soon, I've added some notes to the https://github.com/llvm/llvm-project/blob/release/22.x/lld/docs/ReleaseNotes.rst as an lld/ELF maintainer. As usual, I've reviewed almost all the patches not authored by me.

For the first time, I used an LLM agent (Claude Code) to help look through commits (git log release/21.x..release/22.x -- lld/ELF) and draft the release notes. Despite my request to only read lld/ELF changes, Claude Code also crafted notes for other ports, which I retained since their release notes had been quite sparse for several releases. Changes back ported to the 21.x release are removed (git log --oneline llvmorg-22-init..llvmorg-21.1.8 -- lld).

I'll delve into some of the key changes.

2026-01-25

Long branches in compilers, assemblers, and linkers

Branch instructions on most architectures use PC-relative addressing with a limited range. When the target is too far away, the branch becomes "out of range" and requires special handling.

Consider a large binary where main() at address 0x10000 calls foo() at address 0x8010000-over 128MiB away. On AArch64, the bl instruction can only reach ±128MiB, so this call cannot be encoded directly. Without proper handling, the linker would fail with an error like "relocation out of range." The toolchain must handle this transparently to produce correct executables.

This article explores how compilers, assemblers, and linkers work together to solve the long branch problem.

Compiler (IR to assembly): Handles branches within a function that exceed the range of conditional branch instructions
Assembler (assembly to relocatable file): Handles branches within a section where the distance is known at assembly time
Linker: Handles cross-section and cross-object branches discovered during final layout

2026-01-22

Maintaining shadow branches for GitHub PRs

I've created pr-shadow with vibe coding, a tool that maintains a shadow branch for GitHub pull requests (PR) that never requires force-pushing. This addresses pain points I described in Reflections on LLVM's switch to GitHub pull requests#Patch evolution.

2025-12-31

2025年总结

TODO

一如既往，主要在工具链领域耕耘。但由于工作忙碌在open source社区投入的时间减少了。

Blogging

不包括这篇总结，一共写了18篇文章。

llvm-project

翻新了integrated assembler，写了4篇相关的blog posts: https://maskray.me/blog/tags/assembler/
Reviewed numerous patches. query is:pr created:>2025-01-01 reviewed-by:MaskRay => "989 Closed"

Linux kernel

贡献了两个commits，被引用了一次。

ccls

clang.prependArgs
支持了LLVM 21和22

ELF specification

尝试推进compact section header table，没有取得共识。一些成员希望采用general compression (like zstd)的方式，像SHF_COMPRESSED那样压缩section header table。包括我在内的另一些人不喜欢采用general compression。

Misc

Reported 6 feature requests or bugs to binutils.

旅行

第一次去：台南、西安、兰州、天水、Sacramento、Puerto Vallarta, Jalisco, Mexico、Mazatlán, Sinaloa, Mexico
曾经去过：台北(上一次是近11年前)、北京

2025-12-14

Weak AVL Tree

tl;dr: Weak AVL trees are replacements for AVL trees and red-black trees.

The 2014 paper Rank-Balanced Trees (Haeupler, Sen, Tarjan) presents a framework using ranks and rank differences to define binary search trees.

Each node has a non-negative integer rank r(x). Null nodes have rank -1.
The rank difference of a node x with parent p(x) is r(p(x)) − r(x).
A node is i,j if its children have rank differences i and j (unordered), e.g., a 1,2 node has children with rank differences 1 and 2.
A node is called 1-node if its rank difference is 1.

Several balanced trees fit this framework:

AVL tree: Ranks are defined as heights. Every node is 1,1 or 1,2 (rank differences of children)
Red-Black tree: All rank differences are 0 or 1, and no parent of a 0-child is a 0-child. (red: 0-child; black: 1-child; null nodes are black)
Weak AVL tree (new tree described by this paper): All rank differences are 1 or 2, and every leaf has rank 0.
- A weak AVL tree without 2,2 nodes is an AVL tree.

1	AVL trees ⫋ weak AVL trees ⫋ red-black trees

Weak AVL Tree

Weak AVL trees are replacements for AVL trees and red-black trees. A single insertion or deletion operation requires at most two rotations (forming a double rotation when two are needed). In contrast, AVL deletion requires O(log n) rotations, and red-black deletion requires up to three.

Without deletions, a weak AVL tree is exactly an AVL tree. With deletions, its height remains at most that of an AVL tree with the same number of insertions but no deletions.

The rank rules imply:

Null nodes have rank -1, leaves have rank 0, unary nodes have rank 1.

Insertion

The new node x has a rank of 0, changed from the null node of rank -1. There are three cases.

If the tree was previously empty, the new node becomes the root.
If the parent of the new node was previously a unary node (1,2 node), it is now a 1,1 binary node.
If the parent of the new node was previously a leaf (1,1 node), it is now a 0,1 binary node, leading to a rank violation.

When the tree was previously non-empty, x has a parent node. We call the following subroutine with x indicating the new node to handle the second and third cases.

The following subroutine handles the rank increase of x. We call break if there is no more rank violation, i.e. we are done.

The 2014 paper isn't very clear about the conditions.

// Assume that x's rank has just increased by 1 and rank_diff(x) has been updated.

p = x->parent;
if (rank_diff(x) == 1) {
  // x was previously a 2-child before increasing x->rank.
  // Done.
} else {
  for (;;) {
    // Otherwise, p is a 0,1 node (previously a 1,1 node before increasing x->rank).
    // x being a 0-child is a violation.

    Promote p.
    // Since we have promoted both x and p, it's as if rank_diff(x's sibling) is flipped.
    // p is now a 1,2 node.

    x = p;
    p = p->parent;
    // x is a 1,2 node.
    if (!p) break;
    d = p->ch[1] == x;

    if (rank_diff(x) == 1) { break; }
    // Otherwise, x is a 0-child, leading to a new rank violation.

    auto sib = p->ch[!d];
    if (rank_diff(sib) == 2) { // p is a 0,2 node
      auto y = x->ch[d^1];
      if (y && rank_diff(y) == 1) {
        // y is a 1-child. y must the previous `x` in the last iteration.
        Perform a double rotation involving `p`, `x`, and `y`.
      } else {
        // Otherwise, y is null or a 2-child.
        Perform a single rotation involving `p` and `x`.
        x is now a 1,1 node and there is no more violation.
      }
      break;
    }

    // Otherwise, p is a 0,1 node. Goto the next iteration.
  }
}

Insertion never introduces a 2,2 node, so insertion-only sequences produce AVL trees.

Deletion

TODO: Describe deletion

1
2
3

if (!was_2 && !x && !p->ch[0] && !p->ch[1] && p->rp()) {
  // p was unary and becomes 2,2. Demote it.
}

Implementation

Since valid rank differences can only be 1 or 2, ranks can be encoded efficiently using bit flags. There are three approaches:

Store two bits representing the rank differences to each child. Bit 0: rank difference to left child (1 = diff is 2, 0 = diff is 1). Bit 1: rank difference to right child
Store a single bit representing the parity (even/odd) of the node's absolute rank. The rank difference to a child is computed by comparing parities. Same parity → rank difference of 2. Different parity → rank difference of 1
Store a 1-bit rank difference parity in each node.

FreeBSD's sys/tree.h (https://reviews.freebsd.org/D25480, 2020) uses the first approach. The rb_ prefix remains as it can also indicate Rank-Balanced:) Note: its insertion operation can be futher optimized as the following code demonstrates.

https://github.com/pvachon/wavl_tree and https://crates.io/crates/wavltree use the second approach.

The third approach is less efficient because a null node can be either a 1-child (parent is binary) or a 2-child (parent is unary), requiring the sibling node to be probed to determine the rank difference: int rank_diff(Node *p, int d) { return p->ch[d] ? p->ch[d]->par_and_flg & 1 : p->ch[!d] ? 2 : 1; }

https://maskray.me/blog/2025-12-14-weak-avl-tree is a C++ implementation covering both approaches, supporting the following operations:

insert: insert a node
remove: remove a node
rank: count elements less than a key
select: find the k-th smallest element (0-indexed)
prev: find the largest element less than a key
next: find the smallest element greater than a key

Node structure:

ch[2]: left and right child pointers.
par_and_flg: packs the parent pointer with 2 flag bits in the low bits. Bit 0 indicates whether the left child has rank difference 2; bit 1 indicates whether the right child has rank difference 2. A cleared bit means rank difference 1.
i: the key value.
sum, size: augmented data maintained by mconcat for order statistics operations.

Helper methods:

rd2(d): returns true if child d has rank difference 2.
flip(d): toggles the rank difference of child d between 1 and 2.
clr_flags(): sets both children to rank difference 1 (used after rotations to reset a node to 1,1).

Invariants:

Leaves always have flags() == 0, meaning both null children are 1-children (null nodes have rank -1, leaf has rank 0).
After each insertion or deletion, mconcat is called along the path to the root to update augmented data.

Rotations:

The rotate(x, d) function rotates node x in direction d. It lifts x->ch[d] to replace x, and updates the augmented data for x. The caller is responsible for updating rank differences.

Misc

Visualization: https://tjkendev.github.io/bst-visualization/avl-tree/bu-weak.html

2025-12-07

Sacramento游记

周末从旧金山湾南部去Sacramento参观。