This post explores how GNU Assembler and LLVM integrated assembler generate relocations, an important step to generate a relocatable file. Relocations identify parts of instructions or data that cannot be fully determined during assembly because they depend on the final memory layout, which is only established at link time or load time. These are essentially placeholders that will be filled in (typically with absolute addresses or PC-relative offsets) during the linking process.
Relocation generation: the basics
Symbol references are the primary candidates for relocations. For
instance, in the x86-64 instruction movl sym(%rip), %eax
(GNU syntax), the assembler calculates the displacement between the
program counter (PC) and sym
. This distance affects the
instruction's encoding and typically triggers a
R_X86_64_PC32
relocation, unless sym
is a
local symbol defined within the current section.
Both the GNU assembler and LLVM integrated assembler utilize multiple passes during assembly, with several key phases relevant to relocation generation: