Call relocation types

Most architectures encode direct branch/call instructions with a PC-relative displacement. This post discusses a specific category of branch relocations: those used for direct function calls and tail calls. Some architectures use two ELF relocation types for a call instruction:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# i386, x86-64
call foo # R_386_PC32, R_X86_64_PC32
call foo@plt # R_386_PLT32, R_X86_64_PLT32

# m68k
bsr.l foo # R_68K_PC32
bsr.l foo@plt # R_68K_PLT32

# s390/s390x
brasl %r14, foo # R_390_PC32DBL
brasl %r14, foo@plt # R_390_PLT32DBL

# sparc
call foo, 0 # not PIC: R_SPARC_WDISP30
call foo, 0 # gas -KPIC: R_SPARC_WPLT30

Read More

lld 22 ELF changes

For those unfamiliar, lld is the LLVM linker, supporting PE/COFF, ELF, Mach-O, and WebAssembly ports. These object file formats differ significantly, and each port must follow the conventions of the platform's system linker. As a result, the ports share limited code (diagnostics, memory allocation, etc) and have largely separate reviewer groups.

With LLVM 22.1 releasing soon, I've added some notes to the https://github.com/llvm/llvm-project/blob/release/22.x/lld/docs/ReleaseNotes.rst as an lld/ELF maintainer. As usual, I've reviewed almost all the patches not authored by me.

For the first time, I used an LLM agent (Claude Code) to help look through commits (git log release/21.x..release/22.x -- lld/ELF) and draft the release notes. Despite my request to only read lld/ELF changes, Claude Code also crafted notes for other ports, which I retained since their release notes had been quite sparse for several releases. Changes back ported to the 21.x release are removed (git log --oneline llvmorg-22-init..llvmorg-21.1.8 -- lld).

I'll delve into some of the key changes.

Read More

Long branches in compilers, assemblers, and linkers

Branch instructions on most architectures use PC-relative addressing with a limited range. When the target is too far away, the branch becomes "out of range" and requires special handling.

Consider a large binary where main() at address 0x10000 calls foo() at address 0x8010000-over 128MiB away. On AArch64, the bl instruction can only reach ±128MiB, so this call cannot be encoded directly. Without proper handling, the linker would fail with an error like "relocation out of range." The toolchain must handle this transparently to produce correct executables.

This article explores how compilers, assemblers, and linkers work together to solve the long branch problem.

  • Compiler (IR to assembly): Handles branches within a function that exceed the range of conditional branch instructions
  • Assembler (assembly to relocatable file): Handles branches within a section where the distance is known at assembly time
  • Linker: Handles cross-section and cross-object branches discovered during final layout

Read More

2025年总结

TODO

一如既往,主要在工具链领域耕耘。但由于工作忙碌在open source社区投入的时间减少了。

Blogging

不包括这篇总结,一共写了18篇文章。

llvm-project

Linux kernel

贡献了两个commits,被引用了一次。

ccls

  • clang.prependArgs
  • 支持了LLVM 21和22

ELF specification

尝试推进compact section header table,没有取得共识。 一些成员希望采用general compression (like zstd)的方式,像SHF_COMPRESSED那样压缩section header table。包括我在内的另一些人不喜欢采用general compression。

Misc

Reported 6 feature requests or bugs to binutils.

旅行

  • 第一次去:台南、西安、兰州、天水、Sacramento、Puerto Vallarta, Jalisco, Mexico、Mazatlán, Sinaloa, Mexico
  • 曾经去过:台北(上一次是近11年前)、北京

Weak AVL Tree

tl;dr: Weak AVL trees are replacements for AVL trees and red-black trees.

The 2014 paper Rank-Balanced Trees (Haeupler, Sen, Tarjan) presents a framework using ranks and rank differences to define binary search trees.

  • Each node has a non-negative integer rank r(x). Null nodes have rank -1.
  • The rank difference of a node x with parent p(x) is r(p(x)) − r(x).
  • A node is i,j if its children have rank differences i and j (unordered), e.g., a 1,2 node has children with rank differences 1 and 2.
  • A node is called 1-node if its rank difference is 1.

Several balanced trees fit this framework:

  • AVL tree: Ranks are defined as heights. Every node is 1,1 or 1,2 (rank differences of children)
  • Red-Black tree: All rank differences are 0 or 1, and no parent of a 0-child is a 0-child. (red: 0-child; black: 1-child; null nodes are black)
  • Weak AVL tree (new tree described by this paper): All rank differences are 1 or 2, and every leaf has rank 0.
    • A weak AVL tree without 2,2 nodes is an AVL tree.
1
AVL trees ⫋ weak AVL trees ⫋ red-black trees

Weak AVL Tree

Weak AVL trees are replacements for AVL trees and red-black trees. A single insertion or deletion operation requires at most two rotations (forming a double rotation when two are needed). In contrast, AVL deletion requires O(log n) rotations, and red-black deletion requires up to three.

Without deletions, a weak AVL tree is exactly an AVL tree. With deletions, its height remains at most that of an AVL tree with the same number of insertions but no deletions.

The rank rules imply:

  • Null nodes have rank -1, leaves have rank 0, unary nodes have rank 1.

Insertion

The new node x has a rank of 0, changed from the null node of rank -1. There are three cases.

  • If the tree was previously empty, the new node becomes the root.
  • If the parent of the new node was previously a unary node (1,2 node), it is now a 1,1 binary node.
  • If the parent of the new node was previously a leaf (1,1 node), it is now a 0,1 binary node, leading to a rank violation.

When the tree was previously non-empty, x has a parent node. We call the following subroutine with x indicating the new node to handle the second and third cases.

The following subroutine handles the rank increase of x. We call break if there is no more rank violation, i.e. we are done.

The 2014 paper isn't very clear about the conditions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Assume that x's rank has just increased by 1 and rank_diff(x) has been updated.

p = x->parent;
if (rank_diff(x) == 1) {
// x was previously a 2-child before increasing x->rank.
// Done.
} else {
for (;;) {
// Otherwise, p is a 0,1 node (previously a 1,1 node before increasing x->rank).
// x being a 0-child is a violation.

Promote p.
// Since we have promoted both x and p, it's as if rank_diff(x's sibling) is flipped.
// p is now a 1,2 node.

x = p;
p = p->parent;
// x is a 1,2 node.
if (!p) break;
d = p->ch[1] == x;

if (rank_diff(x) == 1) { break; }
// Otherwise, x is a 0-child, leading to a new rank violation.

auto sib = p->ch[!d];
if (rank_diff(sib) == 2) { // p is a 0,2 node
auto y = x->ch[d^1];
if (y && rank_diff(y) == 1) {
// y is a 1-child. y must the previous `x` in the last iteration.
Perform a double rotation involving `p`, `x`, and `y`.
} else {
// Otherwise, y is null or a 2-child.
Perform a single rotation involving `p` and `x`.
x is now a 1,1 node and there is no more violation.
}
break;
}

// Otherwise, p is a 0,1 node. Goto the next iteration.
}
}

Insertion never introduces a 2,2 node, so insertion-only sequences produce AVL trees.

Deletion

TODO: Describe deletion

1
2
3
if (!was_2 && !x && !p->ch[0] && !p->ch[1] && p->rp()) {
// p was unary and becomes 2,2. Demote it.
}

Implementation

Since valid rank differences can only be 1 or 2, ranks can be encoded efficiently using bit flags. There are three approaches:

  • Store two bits representing the rank differences to each child. Bit 0: rank difference to left child (1 = diff is 2, 0 = diff is 1). Bit 1: rank difference to right child
  • Store a single bit representing the parity (even/odd) of the node's absolute rank. The rank difference to a child is computed by comparing parities. Same parity → rank difference of 2. Different parity → rank difference of 1
  • Store a 1-bit rank difference parity in each node.

FreeBSD's sys/tree.h (https://reviews.freebsd.org/D25480, 2020) uses the first approach. The rb_ prefix remains as it can also indicate Rank-Balanced:) Note: its insertion operation can be futher optimized as the following code demonstrates.

https://github.com/pvachon/wavl_tree and https://crates.io/crates/wavltree use the second approach.

The third approach is less efficient because a null node can be either a 1-child (parent is binary) or a 2-child (parent is unary), requiring the sibling node to be probed to determine the rank difference: int rank_diff(Node *p, int d) { return p->ch[d] ? p->ch[d]->par_and_flg & 1 : p->ch[!d] ? 2 : 1; }

https://maskray.me/blog/2025-12-14-weak-avl-tree is a C++ implementation covering both approaches, supporting the following operations:

  • insert: insert a node
  • remove: remove a node
  • rank: count elements less than a key
  • select: find the k-th smallest element (0-indexed)
  • prev: find the largest element less than a key
  • next: find the smallest element greater than a key

Node structure:

  • ch[2]: left and right child pointers.
  • par_and_flg: packs the parent pointer with 2 flag bits in the low bits. Bit 0 indicates whether the left child has rank difference 2; bit 1 indicates whether the right child has rank difference 2. A cleared bit means rank difference 1.
  • i: the key value.
  • sum, size: augmented data maintained by mconcat for order statistics operations.

Helper methods:

  • rd2(d): returns true if child d has rank difference 2.
  • flip(d): toggles the rank difference of child d between 1 and 2.
  • clr_flags(): sets both children to rank difference 1 (used after rotations to reset a node to 1,1).

Invariants:

  • Leaves always have flags() == 0, meaning both null children are 1-children (null nodes have rank -1, leaf has rank 0).
  • After each insertion or deletion, mconcat is called along the path to the root to update augmented data.

Rotations:

The rotate(x, d) function rotates node x in direction d. It lifts x->ch[d] to replace x, and updates the augmented data for x. The caller is responsible for updating rank differences.

Misc

Visualization: https://tjkendev.github.io/bst-visualization/avl-tree/bu-weak.html

Stack walking: space and time trade-offs

On most Linux platforms (except AArch32, which uses .ARM.exidx), DWARF .eh_frame is required for C++ exception handling and stack unwinding to restore callee-saved registers. While .eh_frame can be used for call trace recording, it is often criticized for its runtime overhead. As an alternative, developers can enable frame pointers, or adopt SFrame, a newer format designed specifically for profiling. This article examines the size overhead of enabling non-DWARF stack walking mechanisms when building several LLVM executables.

Runtime performance analysis will be added in a future update.

Read More

Remarks on SFrame

SFrame is a new stack walking format for userspace profiling, inspired by Linux's in-kernel ORC unwind format. While SFrame eliminates some .eh_frame CIE/FDE overhead, it sacrifices functionality (e.g., personality, LSDA, callee-saved registers) and flexibility, and its stack offsets are less compact than .eh_frame's bytecode-style CFI instructions. In llvm-project executables I've tested on x86-64, .sframe section is 20% larger than .eh_frame. It also remains significantly larger than highly compact schemes like Windows ARM64 unwind codes.

SFrame describes three elements for each function:

  • Canonical Frame Address (CFA): The base address for stack frame calculations
  • Return address
  • Frame pointer

An .sframe section follows a straightforward layout:

  • Header: Contains metadata and offset information
  • Auxiliary header (optional): Reserved for future extensions
  • Function Descriptor Entries (FDEs): Array describing each function
  • Frame Row Entries (FREs): Arrays of unwinding information per function

Read More