Most architectures encode direct branch/call instructions with a PC-relative displacement. This post discusses a specific category of branch relocations: those used for direct function calls and tail calls. Some architectures use two ELF relocation types for a call instruction:
1 | # i386, x86-64 |
This post describes why I think this happened.
Static linking: one type suffices
In the static linking model, all symbols are resolved at link time:
every symbol is either defined in a relocatable object file or an
undefined weak symbol. A branch instruction with a PC-relative
displacement—x86 call, m68k bsr.l, s390
brasl—can reuse the same PC-relative data relocation type
used for data references.
- i386:
R_386_PC32for bothcall fooand.long foo - . - x86-64:
R_X86_64_PC32for bothcall fooand.long foo - . - m68k:
R_68K_PC32forbsr.l foo,move.l var,%d0, and.long foo - . - s390x:
R_390_PC32DBLforbrasl %r14, foo,larl %r1, var, and.long foo - .
No separate "call" relocation type is needed. The linker simply patches the displacement to point to the symbol address.
Dynamic linking changes the picture
With System V Release 4 style shared libraries, variable access and function calls diverge.
For variables and function addresses, a reference
from one component to a symbol defined in another cannot use a plain
PC-relative relocation, because the distance between the two components
is not known at link time. The Global Offset
Table was introduced for this purpose, along with GOT-generating
relocation types. (Additionally, copy
relocations are a workaround for external data symbols from
-fno-pic relocatable files.) To satisfy the pointer
equality requirement, a PC-relative data relocation in an
executable must resolve to the same address as its counterpart in a
shared object—this is why GOT indirection is used for symbols not known
at compile time to be preemptible.
For direct function calls, the situation is
different. A call instruction has "transfer control there by any means"
semantics - the caller usually doesn't care how the callee is
reached, only that it gets there. This allows the linker to interpose a
PLT
stub when the target is in another component, without any special
code sequence at the call site. Alternatively, some architectures
support an indirect call sequence that bypasses PLT entirely: -fno-plt
on x86 and -mno-plt
on MIPS (o32/n32 non-PIC).
Variable accesses do not have the same semantics - so the PC-relative data relocation type cannot be reused on a call instruction.
This is why separate branch relocation types were introduced:
R_386_PLT32, R_68K_PLT32,
R_390_PLT32DBL, and so on. The relocation type carries the
semantic information: "this is a function call that can use PLT
indirection."
Misleading names
The @plt notation in assembly and the PLT32
relocation type names are misleading. They suggest that a PLT entry is
involved, but that is often not the case - when the callee is defined in
the same component, the linker resolves the branch directly—no PLT entry
is created.
R_386_CALL32 and R_X86_64_CALL32 would have
been a better name.
In addition, the @plt notation itself is problematic as
a relocation
specifier.
Architecture comparison
Single type (clean design). Some architectures recognized from the start that one call relocation type is sufficient. The linker can decide whether a PLT stub is needed based on the symbol's binding and visibility.
- AArch64:
R_AARCH64_CALL26forblandR_AARCH64_JUMP26forb. - PowerPC64 ELFv2:
R_PPC64_REL24forbl.
These architectures never had the naming confusion—there is no "PLT" in the relocation name, and no redundant pair.
Redundant pairs (misguided). Some architectures introduced separate "PLT" and "non-PLT" call relocation types, creating a distinction without a real difference.
- SPARC:
R_SPARC_WPLT30alongsideR_SPARC_WDISP30. The assembler decides at assembly time based on PIC mode and symbol preemptivity, when ideally the linker should make these decisions. - PPC32:
R_PPC_REL24(non-PIC) andR_PPC_PLTREL24(PIC) have genuinely different semantics (the addend ofR_PPC_PLTREL24encodes the r30 GOT pointer setup). However,R_PPC_LOCAL24PCis entirely useless—all occurrences can be replaced withR_PPC_REL24. - RISC-V:
R_RISCV_CALL_PLTalongside the now-removedR_RISCV_CALL. The community recognized that only one relocation is needed.R_RISCV_CALL_PLTis kept (despite the name, does not mandate a PLT entry).
x86-64 started with R_X86_64_PC32 for
call foo (inherited from the static-linking mindset) and
R_X86_64_PLT32 for call foo@plt (symbols not
compile-time known to be non-preemptible). In 2018, binutils https://sourceware.org/bugzilla/show_bug.cgi?id=22791
switched to R_X86_64_PLT32 for call foo. LLVM
integrated assembler followed suit.
This means R_X86_64_PC32 is now effectively reserved for
data references, and R_X86_64_PLT32 marks all calls—a clean
separation achieved by convention.
However, GNU Assembler still produces R_X86_64_PC32 when
call foo references an STB_LOCAL symbol. I've
sent a patch to fix this: [PATCH
v2] x86: keep PLT32 relocation for local symbols instead of converting
to PC32.
GCC's s390 port seems to always generate @plt (even for
hidden visibility functions), leading to R_390_PLT32DBL
relocations.
Range extension thunks
When a branch target is out of range, some architectures allow the linker to insert a range extension thunks: On AArch64 and PowerPC64, this is well established.
On x86-64, the ±2GiB range of call/jmp has
been sufficient so far, but as executables grow, relocation
overflow becomes a concern. There are proposals to add range
extension thunks to x86-64, which would require the linker to identify
call sites-a PC-relative data relocation like R_X86_64_PC32
would not be suitable (due to pointer equality requirement), making
consistent use of R_X86_64_PLT32 for calls all the more
important.
Recommendation for future architectures
For a specific instruction or pseudo instruction for function calls
and tail calls, use a single call relocation type—no "PLT" vs. "non-PLT"
distinction. The assembler should emit the same relocation, and the
linker, which knows whether the symbol is preemptible, decides whether a
PLT stub is needed. Optionally enable range extension thunks for the
relocation type. AArch64's R_AARCH64_CALL26 and PowerPC64
ELFv2's R_PPC64_REL24 demonstrate this approach well.
This discussion does not apply to intra-function branches, which target local labels.
See also
- All about Procedure Linkage Table for how PLT works
- All about Global Offset Table for the data-access counterpart
- Copy relocations, canonical PLT entries and protected visibility for the variable-access complications
- Relocation generation in assemblers for how assemblers decide which relocations to emit
- Long branches in compilers, assemblers, and linkers for range extension thunks