In an executable or shared object (called a component in ELF), a text section may need the absolute virtual address of a symbol (e.g. a function or a variable). The reference arises from an address taken operation or a PLT entry. The address may be:
- a link-time constant
- the load base plus a link-time constant
- dependent on runtime computation by ld.so
For the first case, this component must be a position-dependent executable: a link-time address equals its virtual address at run-time. The text section can hold the absolute virtual address directly or use a PC-relative addressing.
(For a FDPIC ABI for MMU-less Linux, the compiler may add an offset to the FDPIC register instead.)
Load base plus constant
For the second case, this component is either a position-independent executable or a shared object. The difference between the link-time addresses of two symbols equals their virtual address difference at run-time. The first byte of the program image, the ELF header, is loaded at the load base. The text section can obtain the current program counter, then add the distance from the PC to the symbol (PC-relative address), to compute the run-time virtual address.
Runtime computation by ld.so
For the third case, we need help from the runtime loader (abbreviated as ld.so). The linker emits a dynamic relocation to let the runtime loader perform a symbol lookup to determine the associated symbol value at runtime.
The symbol is either potentially defined in another component or is a
STT_GNU_IFUNC symbol. See GNU indirect function
If the text section holds the address which is relocated by the
dynamic relocation, this is called text relocations.
movl 0x0, %eax # relocated by R_386_32
Relocation section '.rel.dyn' at offset 0x190 contains 1 entry:
Offset Info Type Sym. Value Symbol's Name
00001199 00000101 R_386_32 00000000 var
More commonly, the address is stored in the Global Offset Table
(abbreviated as GOT). The compiler emits code which uses
position-independent addressing to extract the absolute virtual address
from GOT. The relocations (e.g.
R_X86_64_REX_GOTPCRELX) are called GOT-generating. The
linker will create entries in the Global Offset Table.
Global Offset Table
The Global Offset Table (usually consists of
.got.plt) holds the symbol addresses which are referenced
by text sections. The table holds link-time constant entries and entries
which are relocated by a dynamic relocation.
.got.plt holds symbol addresses used by PLT entries.
.got holds everything else.
Why do we need a GOT entry for a link-time constant? Well, at compile time it is probably undecided whether the entry may resolve to another component. The compiler may emit a GOT-generating relocation and use an indirection in a conservative manner. At link time the linker may find that the value is a constant.
-shared, it is possible to get a
link-time constant by utilitizing
Life of a .got.plt entry
TODO: link to my future article about PLT.
Life of a .got entry
Defined symbols generally belong to the first and second cases.
However, on ELF, a non-local default visibility symbol in a shared
object is preemptible by default. For
-fpic code, the third
case is used: since such a definition may be interposed by
another definition at runtime, the compiler conservatively uses GOT
# -fno-pic or -fpie
Using the C/C++ internal linkage (
namespace) or protected/hidden visibility can avoid the indirection for
See Copy relocations, canonical PLT entries and protected visibility for why GCC protected data uses (unneeded) indirection.
If the symbol has the default visibility, the definition may be in a
different component. For position independent code (
-fpic), the compiler uses GOT indirection
extern int ext_var;
movq ext_var@GOTPCREL(%rip), %rax
For position dependent code (
the compiler optimizes for statically linked executables and uses direct
addressing (usually absolute relocations). How does it work if the
symbol is actually defined in a shared object? To avoid text
relocations, there are copy relocations and canonical PLT entries. It
essentially changes the third case (symbol lookup) to the first two
cases. See Copy
relocations, canonical PLT entries and protected visibility for
If the symbol has a non-default visibility, the definition must be defined in the component. The compiler can safely assume the address is either a link-time constant or the load base plus a constant.
movl ext_hidden_var(%rip), %eax
A GOT-generating relocation references a symbol. When the linker sees such a referenced symbol for the first time, it reserves an entry in GOT. For subsequent GOT-generating relocations referencing the same symbol, the linker just reuses this entry. The address of the GOT entry is insignificant.
Technically the linker can use multiple entries for one symbol. It just wastes space for the majority of cases, but some awful ABIs do use multi-GOT, e.g. mips and ppc32.
The entry needs a dynamic relocation or is a link-time constant.
R_*_GLOB_DAT relocation is identical to an absolute
relocation of the word size (e.g.
R_X86_64_64). ld.so performs a symbol lookup and fills the
location with the virtual address.
GOT indirection to PC-relative
When the symbol associated to a GOT entry is non-preemptible, the third case effectively becomes the first or the second case. The code sequence nevertheless has a load from the GOT entry. Why don't we optimize the code sequence?
Some psABI (Processor Specific Application Binary Interface) documents do define such an optimization.
For example, x86-64's
R_X86_64_REX_GOTPCRELX optimization can transform a
load/store via GOT to a direct load/store, and a GOT-indirect call/jump
to a direct call/jump.
R_X86_64_GOTPCRELX applies to an
instruction without a REX prefix while
R_X86_64_REX_GOTPCRELX applies to an instruction with a REX
PowerPC64 ELFv2's TOC-indirect to TOC-relative optimization:
We have a pair of
R_PPC64_TOC16_LO_DS relocations. Instruction rewriting is
safe because the relocation types provide a strong guarantee:
interleaved instructions cannot use intermediate values of the modified
On Mach-O, ld64's arm64 port defines some GOT optimization as well.
For a regular adrp+ldr+ldr code sequence loading the value of a variable through GOT indirection, either the first two instructions (adrp+ldr) can be optimized (computing the GOT address by PC-relative), or the three instructions can be optimized as a whole (load the variable directly via LDR (literal)).
LC_LINKER_OPTIMIZATION_HINT may be related to the fact
that Mach-O supports only 16 relocation types. The relocation types
cannot encode more information. On ELF, there are plethora of relocation
types and a separate section would not be needed.
Combining .got and .got.plt
The x86-64 psABI defines a minor linker optimization: if a symbol
the linker can keep just
GLOB_DAT. This optimization is
only implemented by GNU ld's x86 port and mold.
For an applicable symbol, its PLT entry is placed in
.plt.got and the associated GOTPLT entry is placed in
ld.bfd -z now) or
ld.bfd -z lazy). The GOTPLT entry is relocated by
Let's use x86-64 to demonstrate how this optimization works.
R_X86_64_64, is always
eagerly resolved by ld.so. For eager binding PLT, an
R_X86_64_JUMP_SLOT relocation has the same behavior as an
R_X86_64_GLOB_DAT. The existence of
R_X86_64_GLOB_DAT means that an eager symbol lookup exists,
regardless of what we do with
Therefore, the two entries can be combined to one single
ld.lld does not implement this optimization: https://bugs.llvm.org/show_bug.cgi?id=32938. I think the optimization has low value but high linker complexity.
cat > a.c <<e
Disassembly of section .plt:
As stated previously, some GOT entries are for non-preemptible
-shared links, they need
relative relocations. Recording
is bit expensive, so mips optimizes them out by reordering GOT entries
to the start. The linker emits
linker applies relative relocation operations on the first
DT_MIPS_LOCAL_GOTNO GOT entries.
A regular REL format relocation costs 2 words. mips does micro
optimization here again by using just one word for
DT_MIPS_SYMTABNO-DT_MIPS_GOTSYM GOT entries which are
otherwise relocated by
Optimizing the Performance of Dynamically-Linked Programs mentions that IRIX implemented a further optimization called Quickstart. If shared libraries used at the run-time are the same as thosed used at link time and all libraries are mapped into the preassigned locations (like System V release 3 static shared libraries), symbol lookup can be skipped. Other systems do not seem to implement Quickstart. Such prelink schemes appear to be a bad idea with the focus on security, e.g. Address Space Layout Randomization (2003).
Hey, this seems clever, isn't it? No, it's awful.
There is a more useful technique which can speed up symbol lookup:
DT_GNU_HASH. Both mips and
the dynamic symbol table, but in a different way, so
DT_GNU_HASH is incompatible on mips. To overcome this
shortcoming, some folks added
DT_MIPS_XHASH support to
binutils and glibc. Their scheme adds another table to the GNU hash
table, giving back some space they saved.
Sorry to be blunt, but let me add more arguments why mips was
shortsighted. Relative relocations have a much better size saving
DT_RELR. If an
R_X86_64_REX_GOTPCRELX like GOT optimization technique is
used, many non-preemptible GOT entries will not be needed at all.
If someone tries to add
DT_MIPS_XHASH support to LLVM,
I'd definitely be sad.
To future architectures, GOT optimization is somewhat useful. When designing relocation types, make sure GOT optimization can be retroactively added.
Adding new relocation types require bleeding edge toolchain support, while overloading old GOT-generating relocations needs to be careful with the semantics. Instruction rewriting can easily break the program if not careful.
The AArch64 ABI has added GOT
indirection to PC-relative addressing optimization. Since the
optimization was retroactively added and the compiler does not guarantee
R_AARCH64_LD64_GOT_LO12_NC are adjacent, the linker
optimization has to check the relocation pair is adjacent.
More about the linker-loader protocol
GNU ld defines the symbol relative to the Global Offset Table.
- The aarch64, arm, mips, ppc, and riscv ports define the symbol at
the start of
- The x86 port defines the symbol at the start of
Code can use the symbol to access GOT entries.
IMO only ancient (badly designed) architectures reference
_GLOBAL_OFFSET_TABLE_ directly. Modern architectures use
With GOT optimization, a GOT entry can be suppressed. If
_GLOBAL_OFFSET_TABLE_ is referenced directly, the linker
needs to define it even if it is otherwise unused.
SunOS 4.x introduced dynamic shared library support. It stored the
link-time address of
In 1993, Paul Kranenburg (pk) added shared library support to NetBSD
flavored a.out binary format. NetBSD followed the
_GLOBAL_OFFSET_TABLE_ scheme. Some ports used two
In the same year, FreeBSD ported the NetBSD ld and rtld code.
In 1995, Roland McGrath added shared library support to glibc. glibc
_DYNAMIC (one dash instead of two) at
_GLOBAL_OFFSET_TABLE_ scheme. glibc ported this hack (to
find the load address of the loader) to more architectures. See Build glibc with LLD 13
ELF Application Binary Interface s390x Supplement defines
_GLOBAL_OFFSET_TABLE_ = _DYNAMIC.
TODO: link to my future article about PLT.
DT_PLTGOT is defined as the address of
The linker reserves the first 3 entries of
.plt usually starts with a header which calls
.got.plt with an argument
other arch-specific arguments.
ld.so puts a descriptor into
.got.plt and the address
of the lazy PLT resolver into
.got.plt. The lazy PLT
resolver identifies the caller object with the descriptor and uses other
arguments to figure out the to-be-called function.
.got.plt have the
SHF_WRITE flag. Traditionally they are always writable,
which is considered bad from the security perspective. GNU invented the
PT_GNU_RELRO program header.
The idea is that
.got only contains relocations which
should be eagerly resolved. With
-z relro, the linker
PT_GNU_RELRO. At runtime,
after ld.so resolved relocations for an object, it calls
mprotect(relro_start, relro_size, PROT_READ) to mark the
.got region read-only. This is sometimes called "partial
(I reported https://sourceware.org/bugzilla/show_bug.cgi?id=24769 that GNU ld's riscv port doesn't implement partial RELRO correctly.)
-z relro -z now, the linker additionally places
PT_GNU_RELRO. At runtime, ld.so
.got.plt relocations eagerly and then calls
mprotect. This scheme disables lazy binding PLT. It is
sometimes called "full RELRO". When the program has many
R_*_JUMP_SLOT relocations, there may be significant startup
In 2006, c0ntex introduced GOT hijacking attack in How to hijack the Global Offset Table with pointers for root shells.
Non-address GOT entries
GOT has some reserved entries at the start of
.got.plt. Most remaining entries are symbol addresses. The
tls_index objects (module ID and offset from
dtv[m] to the symbol for general-dynamic/local-dynamic TLS models), TLS
descriptors, and TP offsets.
PowerPC64 ELFv2 TOC
TODO: Move this to a future PowerPC64 article.
Somehow PowerPC64 ELFv2 decided to reinvent GOT. They call it TOC (table of contents).
extern int var0;
addis 3, 2, .LC0@toc@ha
.got .o files do not reference
.got directly, the TOC scheme makes
explicit in .o files. Therefore the TOC layout is under control of the
compiler and presumably the compiler can leverage better information to
optimize the layout for locality. Well, I disagree with this point. The
compiler does not know the global information. A linker is better placed
to do such link-time optimization.
Let's look at a jump table example.
void puts(const char *);
A::foo is not optimized out, Clang emits:
An .toc entry (not in a group) incorrectly references
.rodata._ZN1A3fooEi in a COMDAT group. This violates an ELF
specification rule when
non-prevailing and therefore discarded:
A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.
Unfortunately this is difficult to fix. We cannot place
.toc in the group. If we do, loading the address of a
weak/global symbol in a COMDAT will break similarly.
GNU ld works around the issue by garbage collecting
entries. Reliance on garbage collection for correctness is a bad design.
For ld.lld, I simply let ld.lld to ignore a
referencing a discarded symbol. D63182
Well, the above can be fixed by changing
.LC0 to a
STB_GLOBAL symbol, but we will
get a useless symbol in
.symtab. So PowerPC64 ELFv2's
.toc is prettier than ppc32
.got2, but that is
the pot calling the kettle black.
In "Runtime computation", I mentioned that GOT is not the only approach allowing addresses dependent on runtime computation. The text relocation technique is another. The name is derived from the fact that dynamic relocations apply to text sections.
Traditionally code and read-only data is placed in the same segment,
which is called the text segment. The linker uses the criterion
!(sh_flags & SHF_WRITE) to check whether a dynamic
relocation is a text relocation. When the output needs text relocations,
the linker adds a flag
Linker/loader developers often frowned upon text relocations. In https://lore.kernel.org/lkml/CAFP8O3LZ3ZtpkF=RdyDyyXn40oYeDkqgY6NX7YRsBWeVnmPv1A@mail.gmail.com/, I collected some evidence.
Runtime pseudo relocations
On x86, the MinGW runtime supports runtime pseudo relocations, which are conceptually the same as text relocations.
Myth: Position-dependent code doesn't use GOT.
Not true. To avoid copy relocations and canonical PLT entries, GOT
indirection can be used. See
-fno-direct-access-external-data in the copy relocations
article. That said, the option is not common yet.
There is a way to convert a symbol lookup (the third case in the very beginning) to the first two cases.
Position-dependent code typically uses direct access relocations to reference a symbol. If the symbol is not defined by the executable,
On Windows, an undefined symbol is by default similar to a protected visibility symbol on ELF. Direct access is used.
// Like __attribute__((visibility("protected"))) on ELF.
movl var(%rip), %eax
which is like an unconditional GOT entry.
__declspec(dllimport) extern int ext_var;
To avoid explicit
__declspec(dllimport), MinGW invented
.rdata$.refptr.var. This is like an enabled-by-default
can be defined in the same linked image or another DLL.
movq .refptr.var(%rip), %rax