Most architectures encode branch/call instructions with a PC-relative displacement. This post discusses a specific category of branch relocations: those used for function calls and tail calls. Some architectures use two ELF relocation types for a call instruction:
1 | # i386, x86-64 |
This post describes why I think this happened.
Static linking: one type suffices
In the static linking model, all symbols are resolved at link time:
every symbol is either defined in a relocatable object file or an
undefined weak symbol. A branch instruction with a PC-relative
displacement—x86 call, m68k bsr.l, s390
brasl—can reuse the same PC-relative data relocation type
used for data references.
- i386:
R_386_PC32for bothcall fooand.long foo - . - x86-64:
R_X86_64_PC32for bothcall fooand.long foo - . - m68k:
R_68K_PC32forbsr.l foo,move.l var,%d0, and.long foo - . - s390x:
R_390_PC32DBLforbrasl %r14, foo,larl %r1, var, and.long foo - .
No separate "call" relocation type is needed. The linker simply patches the displacement to point to the symbol address.
Dynamic linking changes the picture
With System V Release 4 style shared libraries, variable access and function calls diverge.
For variables and function addresses, a reference
from one component to a symbol defined in another cannot use a plain
PC-relative relocation, because the distance between the two components
is not known at link time. The Global Offset
Table was introduced for this purpose, along with GOT-generating
relocation types. (Additionally, copy
relocations are a workaround for external data symbols from
-fno-pic relocatable files.) To satisfy the address
uniqueness requirement, a PC-relative data relocation in an executable
must resolve to the same address as its counterpart in a shared
object—this is why GOT indirection is used for symbols not known to be
preemptible.
For function calls, the situation is different. A call instruction has "transfer control there by any means" semantics - the caller usually doesn't care how the callee is reached, only that it gets there. This allows the linker to interpose a PLT stub when the target is in another component, without any special code sequence at the call site.
Variable accesses do not have the same semantics - so the PC-relative data relocation type cannot be reused on a call instruction.
This is why separate branch relocation types were introduced:
R_386_PLT32, R_68K_PLT32,
R_390_PLT32DBL, and so on. The relocation type carries the
semantic information: "this is a function call that can use PLT
indirection."
Misleading names
The @plt notation in assembly and the PLT32
relocation type names are misleading. They suggest that a PLT entry is
involved, but that is often not the case - when the callee is defined in
the same component, the linker resolves the branch directly—no PLT entry
is created.
R_386_CALL32 and R_X86_64_CALL32 would have
been a better name.
In addition, the @plt notation itself is problematic as
a relocation
specifier.
Architecture comparison
Single type (clean design). Some architectures recognized from the start that one call relocation type is sufficient. The linker can decide whether a PLT stub is needed based on the symbol's binding and visibility.
- AArch64:
R_AARCH64_CALL26for bothblandb. - PowerPC64 ELFv2:
R_PPC64_REL24forbl.
These architectures never had the naming confusion—there is no "PLT" in the relocation name, and no redundant pair.
Redundant pairs (misguided). Some architectures introduced separate "PLT" and "non-PLT" call relocation types, creating a distinction without a real difference.
- SPARC:
R_SPARC_WPLT30alongsideR_SPARC_WDISP30. The assembler decides at assembly time based on PIC mode and symbol preemptivity, when ideally the linker should make these decisions. - PPC32:
R_PPC_REL24(non-PIC) andR_PPC_PLTREL24(PIC) have genuinely different semantics (the addend ofR_PPC_PLTREL24encodes the r30 GOT pointer setup). However,R_PPC_LOCAL24PCis entirely useless—all occurrences can be replaced withR_PPC_REL24. - RISC-V:
R_RISCV_CALL_PLTalongside the now-removedR_RISCV_CALL. The community recognized that only one relocation is needed.R_RISCV_CALL_PLTis kept (despite the name, does not mandate a PLT entry).
x86-64 started with R_X86_64_PC32 for
call foo (inherited from the static-linking mindset) and
R_X86_64_PLT32 for call foo@plt (symbols not
compile-time known to be non-preemptible). In 2018, binutils https://sourceware.org/bugzilla/show_bug.cgi?id=22791
switched to R_X86_64_PLT32 for call foo. LLVM
integrated assembler followed suit.
This means R_X86_64_PC32 is now effectively reserved for
data references, and R_X86_64_PLT32 marks all calls—a clean
separation achieved by convention.
However, GNU Assembler still produces R_X86_64_PC32 when
call foo references an STB_LOCAL symbol. I've
sent a patch to fix this: [PATCH
v2] x86: keep PLT32 relocation for local symbols instead of converting
to PC32.
GCC's s390 port seems to always generate @plt (even for
hidden visibility functions), leading to R_390_PLT32DBL
relocations.
Range extension thunks
A dedicated call relocation type also enables range extension thunks: when the target is out of the branch instruction's range, the linker can insert a thunk without the compiler or assembler needing to know. This works precisely because the relocation marks the site as a branch—the linker knows it can redirect through a thunk.
On AArch64 and PowerPC64, this is well established. On x86-64, the
±2GiB range of call/jmp has been sufficient so
far, but as executables grow, relocation
overflow becomes a concern. There are proposals to add range
extension thunks to x86-64, which would rely on the linker being able to
identify call sites—another reason to consistently use
R_X86_64_PLT32 rather than R_X86_64_PC32 for
calls.
Recommendation for future architectures
For a specific instruction or pseudo instruction for function calls
and tail calls, use a single call relocation type—no "PLT" vs. "non-PLT"
distinction. The assembler should emit the same relocation, and the
linker, which knows whether the symbol is preemptible, decides whether a
PLT stub or range extension thunk is needed. AArch64's
R_AARCH64_CALL26 and PowerPC64 ELFv2's
R_PPC64_REL24 demonstrate this approach well.
This discussion does not apply to intra-function branches, which target local labels.
See also
- All about Procedure Linkage Table for how PLT works
- All about Global Offset Table for the data-access counterpart
- Copy relocations, canonical PLT entries and protected visibility for the variable-access complications
- Relocation generation in assemblers for how assemblers decide which relocations to emit
- Long branches in compilers, assemblers, and linkers for range extension thunks