Updated in 2023-08.
(In celebration of my 2800th llvm-project commit) Happy Halloween!
This article describes relative relocations and how the RELR format can greatly decrease file sizes.
An ELF linker performs the following steps to process an absolute
relocation type whose width equals the word size (e.g.
R_AARCH64_ABS64
, R_X86_64_64
).
1 | if (undefined_weak || (!preemptible && (no_pie || is_shn_abs))) |
Note: in FDPIC ABIs, there is no single base address. The knwon FDPIC ABIs do not use relative relocations. This article doed not discuss FDPIC.
In -pie
or -shared
mode, the linker
produces a relative relocation (R_*_RELATIVE
) if the symbol
is non-preemptible. The dynamic relocation is called a relative
relocation. -no-pie
mode does not produce relative
relocations.
1 | .section .meta.foo,"a",@progbits |
Another source of R_*_RELATIVE
relocations are GOT
entries. See All about Global
Offset Table. 1
2
3
4
5
6if (preemptible)
emit an R_*_GLOB_DAT // the third case
else if (!pic || is_shn_abs)
link-time constant // the first case
else
emit a relative relocation // the second case
The linker can produce an R_*_RELATIVE
relocation in
some other circumstances, but they are rare, e.g. the unfortunate
R_386_GOT32X/R_386_TLS_IE
, PowerPC64's position-independent
long branch thunks.
Some architectures have R_*_RELATIVE
variants. On x86-32
(ILP32), R_X86_64_RELATIVE
applies to a word32 location
while R_X86_64_RELATIVE64
applies to a word64 location.
Itanium seems to have multiple relative relocation types for different
endianness.
On AArch64, when PAuth is enabled,
R_AARCH64_AUTH_RELATIVE
may be produced instead of
R_AARCH64_RELATIVE
.
Representation
ELF has two relocation formats, REL and RELA. 64-bit capability and RELA are improvements on previous formats used by a.out and COFF.
1 | typedef struct { |
RELA is nice for static relocations on RISC architectures but is very
size inefficient for dynamic relocations. REL is 33% more efficient but
is still bloated when encoding relative relocations. For a relative
relocation, the symbol index is 0, but we have to pay a word for
r_info
.
The dynamic loader performs
*(Elf_Addr*)(base+r_offset) += base + addend;
. For the REL
format (used by arm and x86-32), addend
is an implicit
addend read from the to-be-relocated location. For the RELA format (used
by most architectures), addend
is r_addend
stored in the relocation record.
Relative relocations have great locality. On a 64-bit platform, it's common to have a consecutive sequence of locations which are all relocated, e.g.
1 | {r_offset = 0x2000, r_info = (R_X86_64_RELATIVE << 32) | 0}, |
Sources of relative relocations
Before we jump into compressing relative relocations, let's think of the sources of relative relocations and whether we can decrease their number.
In a position-independent executable, R_*_RELATIVE
relocations typically dominate: they easily take 90% of the total
dynamic relocations. The ratio is larger if there are fewer
R_*_JUMP_SLOT
relocations. For a mostly statically linked
position-independent executable, the ratio can be as large as 99%.
A symbolic shared object has a similar distribution of dynamic
relocations. But non-symbolic shared objects are much more common and
R_*_RELATIVE
relocations usually take a very small portion
of their dynamic relocations. (See ELF interposition
and -Bsymbolic about symbol interposition.)
1 | // R_*_RELATIVE if non-preemptible |
For string literal arrays, if setting a capacity wastes more space
(const char a[][CAP] = {"a", "b", ...};
), then there is no
elegant way in C. If you are not concerned of making the code less
readable, you can leverage the trick from musl's strerror
implementation:
1 | // __strerror.h |
For C++, virtual tables can contribute many R_*_RELATIVE
relocations through function pointers. Fuchsia folks contributed
-fexperimental-relative-c++-abi-vtables
to Clang which is
also available on Linux. (underneath i64 sub (lhs, rhs)
in
a LLVM IR constant expression uses a PC-relative relocation.) This can
make a large portion of the memory image read-only and save a lot of
space (32-bit PC-relative offsets instead of 64-bit absolute addresses),
but is difficult to deploy in practice because of the ABI change.
Some programming languages (e.g. Jai, Odin) provide relative pointers. If the construct can be used with global objects, we can avoid many relative relocations.
For some modern architectures (AArch64, RISC-V, x86-64), PIC does not
have a size penalty on text sections compared to non-PIC. The number of
R_*_RELATIVE
relocations is the most significant source of
code size bloat when changing from non-PIC to PIC.
I believe that for many applications relative relocations can be decreased but that may come with costs to readability or ABI issues. Sometimes you can't have it both ways.
Compressing REL/RELA relocations
One intuive idea is to omit r_type
. We will need a new
section but the size has been cut in half compared to REL.
1 | .section naive,"a" |
Next, we can think of delta encoding and narrower entries. But note that ELF tries to avoid unaligned/packed structures. In addition, delta encoding, if not designed carefully, can make medium/large code models hard.
For a new format, the minimum is linker support and dynamic loader support. Debuggers and binary manipulation tools may need to understand the format as well.
Over the years, there have been multiple attempts compressing the ELF relocation formats.
In 2010, Mike Hommey added "elfhack" to Firefox (https://bugzilla.mozilla.org/show_bug.cgi?id=606145). Improving libxul
startup I/O by hacking the ELF format is a write-up. It appeared to
move most relative relocations from .rel.dyn/.rela.dyn
into
a custom section. The section basically has multiple pairs of a base
offset and a count of subsequent consecutive relocations. The savings
are quite significant.
In 2015, Android bionic got
DT_ANDROID_REL
/DT_ANDROID_RELA
. This is
somewhat over-engineered but does not optimize relative relocations
well.
RELR relative relocation format
In 2017, Cary Coutant proposed a prototype of the RELR relative relocation format https://sourceware.org/legacy-ml/gnu-gabi/2017-q2/msg00003.html. Rahul Chaudhry started a thread on the generic-abi forum: Proposal for a new section type SHT_RELR. Ali Bahrami refined it to the format we have today: an even entry indicates an address while an odd entry indicates a bitmap.
In 2018, Rahul Chaudhry added ld.lld/llvm-readelf support for RELR and added patches in Chrome OS's binutils-gdb and glibc repositories.
Android and Fuchsia followed up and adopted DT_RELR
.
These operating systems use ld.lld --pack-dyn-relocs=relr
and a DT_RELR
capable loader.
In 2019, I fixed an ld.lld
bug which could cause the size of .relr.dyn
to
oscillate between 2 numbers.
In 2021-10, I added DT_RELR
support to FreeBSD libexec/rtld-elf
.
kib@ reviewed/commited it and made it available to the releng/13.1
branch.
In 2021, I attempted to bring the format to the GNU toolchain
community. I reimplemented some binutils-gdb and glibc patches and
managed to submit the
GNU readelf change to the official repository in November and the
BFD change was submitted in December 2021. The BFD change is used by
gdb and objcopy. gdb before
11.2 does not include the change and gives diagnostics for RELR
binaries:
BFD: $file: unknown type [0x13] section `.relr.dyn'
. This
is not cosmetic: it seems to affect debuggability as well.
On 2022-01-11, HJ Lu implemented -z pack-relative-relocs
for GNU ld's x86 port. See {pack-relative-relocs} below. Alan
Modra implemented -z pack-relative-relocs
for GNU ld's
PowerPC64 port. Both changes are released in binutils 2.38.
We have Cary Coutant's agreement that after he converts generic-abi to Markdown and makes it open, RELR will be applied.
In 2022-04, DT_RELR
changes finally landed in glibc
(milestone: 2.36). The main ones are Add
GLIBC_ABI_DT_RELR for DT_RELR support and elf:
Support DT_RELR relative relocation format [BZ #27924]. (The change
failed to make it to 2.35, but this was not too bad. See Can
DT_RELR catch up glibc 2.35?)
In 2022-08, my musl change was merged: ldso: support DT_RELR relative relocation format. This was a retry after my upstream attempt in 2019-03 https://www.openwall.com/lists/musl/2019/03/06/3 (at that time, the BDFL said RELR was a nice improvement but not a critical change blocking anything, so the fate of the feature was unclear).
(Pre)standard wording
From Proposal for a new section type SHT_RELR
Description
SHT_RELR: The section holds an array of relocation entries, used to encode relative relocations that do not require explicit addends or other information. Array elements are of type Elf32_Relr for ELFCLASS32 objects, and Elf64_Relr for ELFCLASS64 objects. SHT_RELR sections are for dynamic linking, and may only appear in object files of type ET_EXEC or ET_DYN. An object file may have multiple relocation sections. See ``Relocation'' below for details.
[...]
The format is best described by code. When
--pack-dyn-relocs=relr
is specified (the feature is
enabled), ld.lld creates .relr.dyn
(of type
SHT_RELR
) which holds an array of relocation entries, used
to encode relative relocations that do not require explicit addends.
Regular R_*_RELATIVE
from .rel.dyn/.rela.dyn
are removed.
In the .relr.dyn
section,
- An even entry indicates a location which needs a relocation and sets
up
where
for subsequent odd entries. - An odd entry indicates a bitmap encoding up to 63 locations
following
where
. - Odd entries can be chained.
1 | relrlim = (const Elf_Relr *)((const char *)obj->relr + obj->relrsize); |
RELR can typically encode the same information in
.rela.dyn
in less than 3% space.
1 | q() { file $1 | grep -q ELF || return; local s=$(stat -c %s $1); printf "$1\t$s\t"; bc <<< "scale=2; $(readelf -Wr $1 | grep -c _RELATIVE)*24 * 100 / $s" | sed 's/^\./0./' } |
On my Arch Linux, among /usr/bin/*
executables, relative
relocations take 7.9% of the total file size. These large executables
spend 10+% in the .rela.dyn
section: as
,
audacity
, fzf
, ndisarm
,
objdump
, ocaml*
, perf_*
,
qemu-system-*
, pdftosrc
, php*
,
strace
, virt-*
. Among
/usr/lib/lib*.so*
shared objects, relative relocations take
4.92% of the total file size.
I get similar savings on my Debian Linux.
Time travel compatibility
Programs with new features run the risk of mysterious crashes or malfunction on old systems. Ali Bahrami calls new objects on old systems "time travel compatibility". For Solaris: "And yet, our experience is that although we don't go to great effort to catch time traveling objects, they very rarely cause problems for us. People seem to understand that they can't do that."
For DT_RELR
objects, they will immediately segfault on
an old glibc. For some groups (ChromeOS, FreeBSD, Fuchsia, NetBSD, and
Solaris), they just don't support "time travel compatibility"
development model and can accept that new objects may have poor
diagnostics.
However, the glibc community tends to be more cautious about a nice diagnostic. Note: such "time travel compatibility" mechanism is not consistently applied. See When to prevent execution of new binaries with old glibc for a summary.
Back in December 2021, my idea was that such a diagnostic should not
be a blocker for the DT_RELR
feature.
- We did not flip the default for GCC/Clang.
- GNU ld did not support RELR yet. For most Linux distributions, there was no impact to their packages.
- The users who opted in the feature accepted the poor diagnostic if they back ported that to older systems.
- If we merged the glibc patch early, we would make the future flip less of a problem.
I wished that the glibc community could listen to what Ali suggested in the generic-abi post (January 2018):
My free advice (worth what you paid for it) is to roll out the support, and then wait a bit before turning on the use widely, so that the support is in place before it is needed, and to not complicate things with a way to catch time travelers. The window of time where this can be a problem is finite, and once you're past it, you'll be glad to have a simpler system.
This advice applies to many things, including a GCC 5 regression for
x86-64 -fpie
I am trying to fix (but still pending) [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC
and copy
relocations on protected data.
Anyway, since there is interest on time travel compatibility, let's discuss potential solutions.
EI_ABIVERSION
Some might ask whether we could use EI_ABIVERSION
.
The System V ABI says:
Byte e_ident[EI_ABIVERSION] identifies the version of the ABI to which the object is targeted. This field is used to distinguish among incompatible versions of an ABI. The interpretation of this version number is dependent on the ABI identified by the EI_OSABI field. If no values are specified for the EI_OSABI field by the processor supplement or no version values are specified for the ABI determined by a particular value of the EI_OSABI byte, the value 0 shall be used for the EI_ABIVERSION byte; it indicates unspecified.
EI_ABIVERSION
is dependent on EI_OSABI
.
Operating systems decide their EI_ABIVERSION
. We need to
discuss ELFOSABI_NONE
/ELFOSABI_GNU
separately.
For ELFOSABI_GNU
(alias: ELFOSABI_LINUX
),
different architectures may have different EI_ABIVERSION
values. I know that mips may use EI_ABIVERSION==1
for
(e_eflags & (EF_MIPS_PIC | EF_MIPS_CPIC)) == EF_MIPS_CPIC
position-dependent executables. For glibc, we would need to bump
LIBC_ABI_MAX
.
ELFOSABI_NONE
is the domain of the generic ABI. Many
Linux executables don't use
STB_GNU_UNIQUE
/STT_GNU_IFUNC
and therefore use
ELFOSABI_NONE
. Solaris folks have ruled out the possibility
to bump EI_OSABIVERSION
because this is not a development
model they need. I think Android bionic/Chrome OS/FreeBSD fall in the
same boat.
Even if EI_ABIVERSION
could be bumped: as of glibc 2.34,
ld.so does not check EI_ABIVERSION
for kernel mapped
objects. It does check ld.so mapped objects (including all
DT_NEEDED
shared objects), though. 1
2
3
4% r2 -nwqc 'wx 01 @ 8' a # Change EI_ABIVERSION to 1
% ./a
% /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./a
./a: error while loading shared libraries: ./a: ELF file ABI version invalidELFOSABI_GNU
(r2 -nwqc 'wx 03 @ 7'
), changing
EI_ABIVERSION
to 4 or above will observe the failure with
ld.so mapped objects but not kernel mapped objects.
In addition, bumping the ABI version can immediately lock out some
ELF utilities which only deal with
e_ident[EI_ABIVERSION] == 0
objects. For example,
elflint
from elfutils reports an error.
Synthesized undefined dynamic symbol
We could let the linker synthesizes an undefined symbol in the
dynamic symbol table (.dynsym
) to indicate the usage of
DT_RELR
, then let ld.so define the symbol to indicate
feature availability. (See the
_dl_have_relr
idea by Michael Matz.)
In glibc, an unreferenced unversioned undefined dynamic symbol does
not trigger an undefined symbol
error, so the linker would
need to synthesize a fake relocation. R_*_NONE
is typically
ignored so we would need a relocation which does something, e.g.
R_X86_64_64
. Then the linker has to synthesize a
.data.rel.ro
section to accommodate the relocation.
As an alternative, the linker can make the undefined dynamic symbol
versioned, then a diagnostic will be reported even if the symbol is
unreferenced. A versioned undefined symbol can lead to a link-time error
even for -z undefs
(unfortunate default for
ld -shared
).
If GNU ld adds, say, --pack-dyn-relocs=relr-glibc
or
-z relr-glibc
, with the functionality, I will probably just
add a compatibility alias to ld.lld but not actually add the symbol (https://sourceware.org/pipermail/libc-alpha/2021-October/132460.html):
- Some users don't need "time travel compatibility".
--pack-dyn-relocs=relr
is quite popular in some platforms now.--pack-dyn-relocs=relr
would still be usable to bypass therelr-glibc
restriction on glibc.- I don't want users to migrate away from
--pack-dyn-relocs=relr
(churn) just because glibc has a different development model.
The DT_CRITICAL_*
proposal
See Critical program headers and dynamic tags.
Solaris does not support the development model, so this proposal will unlikely enter generic ABI.
AFAIK this is not in the GNU ABI and glibc does not support this feature. It is not suitable to do time travel compatibility with this proposal.
Symbol versioning
glibc 2.34 added __libc_start_main@@GLIBC_2.34
.
A *crt1.o
file references
__libc_start_main
. A new program linking against new
libc.so
cannot run on glibc older than 2.34.
*crt1.o
is for executables, but a similar trick can be
used in crti.o
or crtbegin*.o
to catch both
executables and shared objects.
This mechanism is coarse in that it does not check whether
DT_RELR
is used.
On 2022-01-11, HJ Lu added
-z pack-relative-relocs
to GNU ld. If the linked output has
a GLIBC_2.*
version dependency on a shared object named
libc.so.*
, GNU ld will add an extra version dependency
named GLIBC_ABI_DT_RELR
. In the glibc 2.36
DT_RELR
implementation, glibc reports an error if a
dynamically linked executable/shared object using DT_RELR
does not have the GLIBC_ABI_DT_RELR
version dependency.
RELR adoption among Linux distributions
I have filed tickets for several large Linux distributions to bring the size saving to their attention.
- Arch Linux: https://bugs.archlinux.org/task/72433
- Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996598
- Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=2014699. Obsoleted by https://bugzilla.redhat.com/show_bug.cgi?id=2218018
- Gentoo: https://bugs.gentoo.org/818376
Other object file formats
Mach-O
Mach-O __LINKEDIT,__rebase
serves a similar purpose as
ELF R_*_RELATIVE
relocations. It uses a bytecode encoding
to decrease space consumption. The bytecode schemes uses run-length
encoding and encodes the delta between two offsets to save bytes on
encoding the offset. Empirically I think it is inferior to ELF RELR in
the majority of cases.
PE/COFF
On Windows, .reloc
contains all base relocations in the image. This format is more
efficient than ELF's REL/RELA but less efficient than RELR (when
encoding relative relocations). When the base relocation type is (32-bit
architecture) IMAGE_REL_BASED_HIGHLOW
or (64-bit
architecture) IMAGE_REL_BASED_DIR64
, it is like a relative
relocation on ELF.