ELF's design emphasizes natural size and alignment guidelines for its control structures. This principle, outlined in Proceedings of the Summer 1990 USENIX Conference, ELF: An Object File to Mitigate Mischievous Misoneism, promotes ease of random access for structures like program headers, section headers, and symbols.
All data structures that the object file format defines follow the "natural" size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of four, etc. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an Elf32_Addr member will be aligned on a 4-byte boundary within the file. Other classes would have appropriately scaled definitions. To illustrate, the 64-bit class would define Elf64 Addr as an 8-byte object, aligned on an 8-byte boundary. Following the strictest alignment for each object allows the format to work on any machine in a class. That is, all ELF structures on all 32-bit machines have congruent templates. For portability, ELF uses neither bit-fields nor floating-point values, because their representations vary, even among pro- cessors with the same byte order. Of course the programs in an ELF file may use these types, but the format itself does not.
While beneficial for many control structures, the natural size guideline presents significant drawbacks for relocations. Since relocations are typically processed sequentially, they don't gain the same random-access advantages. The large 24-byte Elf64_Rela structure highlights the drawback. For a detailed comparison of relocation formats, see Exploring object file formats#Relocations.
Furthermore, Elf32_Rel and Elf32_Rela
sacrifice flexibility to maintain a smaller size, limiting relocation
types to a maximum of 255. This constraint has become noticeable for
AArch32 and RISC-V, and especially when platform-specific relocations
are needed. While the 24-bit symbol index field is less elegant, it
hasn't posed significant issues in real-world use cases.
In contrast, the WebAssembly object file format uses LEB128 encoding for relocations and other constrol structures, offering a significant size advantage over ELF.
Inspired by WebAssembly, I will explore real-world scenarios where relocation size is critical and propose an alternative format (RELLEB) that addresses ELF's limitations.
Use cases
Dynamic relocations
A substantial part of position-independent executables (PIEs) and dynamic shared objects (DSOs) is occupied by dynamic relocations. While RELR (a compact relative relocation format) offers size-saving benefits for relative relocations, other dynamic relocations can benefit from a compact relocation format. There are a few properties:
- There are much fewer relocation types.
- The offsets are often adjacent by
Elf_Addr. No two dynamic relocations can share the same offset. - Each symbol is associated with very few dynamic relocations,
typically 1 or 2 (
R_*_JUMP_SLOTandR_*_GLOB_DAT). When a symbol is associated with more dynamic relocations, it is typically a base class function residing in multiple C++ virtual tables.-fexperimental-relative-c++-abi-vtableswould eliminate such dynamic relocations.
Android's packed relocation format (linker implementation:
ld.lld --pack-dyn-relocs=android) was an earlier design
that applies to all dynamic relocations at the cost of complexity.
Additionally, Apple linkers and dyld use LEB128 encoding for bind opcodes.
Marker relocations
Marker relocations are utilized to indicate certain linker
optimization/relaxation is applicable. While many marker relocations are
used scarcely, RISC-V relocatable files are typically filled up with
R_RISCV_RELAX relocations. Their size contribution is quite
substantial.
.llvm_addrsig
On many Linux targets, Clang emits a special section called
.llvm_addrsig (type SHT_LLVM_ADDRSIG, LLVM
address-significance table) by default to allow
ld.lld --icf=safe. The .llvm_addrsig section
stores symbol indexes in ULEB128 format, independent of relocations.
Consequently, tools like ld -r and objcopy risk invalidate
the section due to symbol table modifications.
Ideally, using relocations would allow certain operations. However,
the size concern of REL/RELA in ELF hinders this approach. In contrast,
lld's Mach-O port chose
a relocation-based representation for
__DATA,__llvm_addrsig.
.llvm.call-graph-profile
LLVM leverages a special section called
.llvm.call-graph-profile (type
SHT_LLVM_CALL_GRAPH_PROFILE) for both instrumentation- and
sample-based profile-guided optimization (PGO). lld utilizes
this information ((from_symbol, to_symbol, weight) tuples) to
optimize section ordering within an input section description, enhancing
cache utilization and minimizing TLB thrashing.
Similar to .llvm_addrsig, the
.llvm.call-graph-profile section initially faced the symbol
index invalidation problem, which was solved by switching to
relocations. I opted for REL over RELA to reduce code size.
DWARF sections
DWARF v5 accelerated name-based access with the introduction of the
.debug_names section. However, in a
clang -g -gsplit-dwarf -gpubnames generated relocatable
file, the .rela.debug_names section can consume a
significant portion (approximately 10%) of the file size.
1 | Relocation section '.relleb.debug_names' at offset 0x65c0 contains 200 entries: |
This size increase has sparked discussions within the LLVM community about potentially altering the file format for linking purposes.
The availability of a more compact relocation format would likely alleviate the need for such format changes.
.debug_line and .debug_addr also contribute
a lot of relocations.
1 | Relocation section '.relleb.debug_addr' at offset 0x64f1 contains 51 entries: |
Many adjacent relocations share the same section symbol. We will see later that the proposed RELLEB does not utilize much about this property.
Compressed relocations
While the standard SHF_COMPRESSED feature is commonly
used for debug sections, its application can easily extend to relocation
sections. I have developed a Clang/lld prototype that demonstrates this
by compressing SHT_RELA sections.
The compressed SHT_RELA section occupies
sizeof(Elf64_Chdr) + size(compressed) bytes. The
implementation retains uncompressed content if compression would result
in a larger size.
In scenarios with numerous smaller relocation sections (such as when
using -ffunction-sections -fdata-sections), the 24-byte
Elf64_Chdr header can introduce significant overhead. This
observation raises the question of whether encoding
Elf64_Chdr fields using ULEB128 could further optimize file
sizes. With larger monolithic sections (.text,
.data, .eh_frame), compression ratio would be
higher as well.
1 | # configure-llvm is my wrapper of cmake that specifies some useful options. |
Relocations consume a significant portion (approximately 20.9%) of
the file size. Despite the overhead of
-ffunction-sections -fdata-sections, the compression
technique yields a significant reduction of 14.5%!
However, dropping in-place relocation processing is a downside.
RELLEB relocation format
The 1990 ELF paper ELF: An Object File to Mitigate Mischievous Misoneism says "ELF allows extension and redefinition for other control structures." Inspired by WebAssembly, let's explore RELLEB, a new and more compact relocation format designed to replace RELA. Our emphasis is on simplicity over absolute minimal encoding. See the end of the article for a detailed format description.
A SHT_RELLEB section (preferred name:
.relleb<name>) holds compact relocation entries that
decode to Elf32_Rela or Elf64_Rela depending
on the object file class (32-bit or 64-bit). Its content begins with a
ULEB128-encoded relocation count, followed by entries encoding
r_offset, r_type, r_symidx, and
r_addend. The entries use ULEB128 and SLEB128 exclusively
and there is no endianness difference.
Here are key design choices:
Relocation count (ULEB128):
This allows for efficient retrieval of the relocation count without
decoding the entire section. While a uint32_t (like SHT_HASH)
could be used, ULEB128 aligns with subsequent entries, removes
endianness differences, and offers a slight size advantage in most cases
when the number of symbols can be encoded in one to three bytes.
Delta encoding for r_offset (ULEB128):
Section offsets can be large, and relocations are typically ordered.
Storing the difference between consecutive offsets offers compression
potential. In most cases, a single byte will suffice. While there are
exceptions (general dynamic TLS model of s390/s390x uses a local
"out-of-order" pair:
R_390_PLT32DBL(offset=o) R_390_TLS_GDCALL(offset=o-2)), we
are optimizing for the common case. Switching to SLEB128 would increase
the total .o size by 0.1%.
For ELFCLASS32, r_offsets members are calculated using
modular arithmetic modulo 4294967296.
Delta encoding for r_type (SLEB128):
Some psABIs utilize relocation types greater than 128. AArch64's static relocation types begin at 257 and dynamic relocation types begin at 1024, necessitating two bytes with ULEB128/SLEB128 encoding in the absence of delta encoding. Delta encoding allows all but the first relocation's type to be encoded in a single byte. An alternative design is to define a base type in the header and encode types relative to the base type, which would introduce slight complexity.
If the AArch32 psABI could be redesigned, allocating
[0,64) for Thumb relocation types and [64,*)
for ARM relocation types would optimize delta encoding even further.
While sharing a single type code for multiple relocations would be efficient, it would require reordering relocations. This conflicts with order requirements imposed by several psABIs and could complicate linker implementations.
Symbol table index and type/addend presence (SLEB128):
ULEB128 optimizes for the common case when the symbol index is encodable in one or two bytes. Using SLEB128 and delta encoding instead of ULEB128 for the symbol index field would increase the total size by 0.4%.
The sign bit determines type/addend presence:
- 0: The encoded value is the symbol index. Both type and addend are present.
- 1: The bitwise NOT of the value is the symbol index. The type and the addend inherit values from the previous entry.
This scheme optimizes for consecutive static relocations with identical types and addends, a common pattern in many architectures. Example:
1 | // R_AARCH64_CALL(g0), ... |
While RISC architectures often require multiple relocations with different types to access global data, the frequent use of call instructions outweighs this in most cases. Overall, type/addend omission is generally advantageous than just addend omission (tested using a aarch64-linux-gnu build).
On RISC-V, due to frequent relocation type changes and
R_RISCV_RELAX, addend omission has slightly smaller
.relleb* than type/addend omission (by 1.9%).
With a limited number of types and frequent zero addends (except
R_*_RELATIVE and R_*_IRELATIVE),dynamic
relocations also benefit from type/addend omission.
Delta encoding for addend (SLEB128):
When the addend changes, we use an addend delta. This offers a slight size advantage (about 0.20%) and optimizes for cases like:
1 | .quad .data + 0x78 |
The .debug_line_str references in a
.debug_line section follow this pattern.
Note: when the bitwise NOT code path is taken, the zero delta type and addend is not utilized.
I have developed a prototype at https://github.com/MaskRay/llvm-project/tree/demo-relleb.
RELLEB demonstrates superrior size reduction compared to the
SHF_COMPRESSED SHT_RELA approach.
1 | configure-llvm s2-custom2 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS=-mrelleb |
The total relocation section size has decreased from 28767768 to 4872672, 16.9% of the original size. RELLEB yields a significant file size reduction of 17.2%!
In aarch64-linux-gnu builds, the total relocation
section size has decreased from 25762752 to 4698182, 18.2% of the
original size. RELLEB yields a file size reduction of 16.5%.
In riscv64-linux-gnu builds, the total relocation
section size has decreased from 91054800 to 17522812, 19.2% of the
original size. RELLEB yields a file size reduction of 32.4%.
In an x86-64 clang -g -gsplit-dwarf -gpubnames build,
.rela* sections consume 19.1% of the file size. The total
relocation section size has decreased from 105622824 to 24559569, 23.3%
of the original size. RELLEB yields a file size reduction of 14.6%.
It would be interesting to explore the potential gains of combining zstd compression with RELLEB.
1 | configure-llvm s2-custom3 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS='-mrelleb -Xclang --compress-relocations=zstd' |
While the 25.8% reduction in RELLEB section size suggests room for
further optimization, the overall decrease of only 1.10% in
.o file sizes indicates that the current compact relocation
format offers a reasonable compromise. (In the absence of the addend
presence and delta addend technique, the overall decrease is about
1.5%.)
I debated whether to name the new section SHT_RELOC
(.reloc<name>) or SHT_RELLEB
(.relleb<name>). Ultimately, I chose
SHT_RELLEB because its unique name minimizes potential
confusion, whereas SHT_RELOC could be confused with
SHT_REL and SHT_RELA.
Limitation
RELLEB is not the most optimal format for sections like
.rodata, .debug_names,
.debug_line, and .debug_addr. These sections
often have many relocations with the same type and symbol, a pattern
that the generic RELR format (discussed below for dynamic relocations)
could exploit more effectively.
Specifically for .debug_names, the RELLEB format results
in a size(.relleb.debug_names) / size(.rela.debug_names)
ratio of 27.7%. Modifying RELLEB to use the sign bit of the symbol index
for omitting the relocation type (instead of the addend) could improve
this ratio, but at the cost of larger .relleb.text
sections.
RELLEB for dynamic relocations
RELLEB excels with static relocations, but what about the dynamic case? I believe its benefits for dynamic relocations are less pronounced. An optimal dynamic relocation format would differ substantially. A generalized RELR format would leverage the dyamic relocation properties well. Here's a possible encoding:
1 | // R_*_RELATIVE group |
We need to enumerate all dynamic relocation types including
R_*_IRELATIVE, R_*_TLSDESC used by some ports.
Some R_*_TLSDESC relocations have a symbol index of zero,
but the straightforward encoding does not utilize this property.
Traditionally, we have two dynamic relocation ranges.
.rela.dyn:[DT_RELA, DT_RELA + DT_RELASZ)(or.rel.dyn:[DT_REL, DT_REL + DT_RELSZ)).rela.plt:[DT_JMPREL, DT_JMPREL + DT_PLTRELSZ).DT_PLTRELspecifiesDT_RELorDT_RELA.
Some GNU ld ports treat .rela.plt as a subset of
.rela.dyn, introducing complexity for dynamic loaders.
Android's packed relocation format replaces
.rel.dyn/.rela.dyn but does not change the
section name.
If we aim RELLEB for replacement, we'd need a new dynamic tag
(DT_RELLEB) and ensure no overlap with
DT_JMPREL. DT_RELLEBSZ is not needed, because
the relocation count can be inferred from the header. We would need to
disallow output section descriptions like
.rela.dyn : { *(.rela.dyn) *(.rela.plt) }.
In glibc, there is additional complexity for ET_EXEC
executables due to __rela_iplt_start.
I've implemented -z relleb to replace
.rel.dyn/.rela.dyn but not yet
.rel.plt/.rela.plt. Dynamic relocations are
sorted by (r_type, r_offset) to better utilize RELLEB.
Let's link clang-16-debug using RELA,
--pack-dyn-relocs=relr,
--pack-dyn-relocs=android+relr, and
--pack-dyn-relocs=relr -z relleb and analyze the
results.
1 | % llvm-readelf -S clang | grep ' \.rel.*\.' |
Analysis
- Relative relocations usually outnumber non-relative relocations.
- RELR significantly optimizes relative relocations, offering the largest size reduction.
- RELLEB further improves the non-relative portion, compressing that
portion to 9.7%. However, it's overshadowed by Android packed
relocations (6.3%). Android's advantage is due to
RELOCATION_GROUPED_BY_INFO_FLAGsharingr_info. - While Android packed relocations have a smaller footprint, their
advantage is less pronounced since
.relr.dynstill accounts for a significant portion of the size.
Decoding ULEB128/SLEB128 would necessitate more work in the dynamic loader. However, since there is no implementation yet, we don't know the performance.
Linker notes
--emit-relocs and -r necessitate combining
relocation sections. The output size may differ from the sum of input
sections. The total relocation count must be determined, a new header
written, and section content regenerated, as symbol indexes and addends
may have changed. Debug sections, .eh_frame, and
.gcc_except_table require special handling to rewrite
relocations referencing a dead symbol to R_*_NONE. This
also necessitates updating the relocation type.
--emit-relocs and -r copy RELLEB relocation
sections (e.g. .relleb.text) to the output. When
.rela.text is also present, linkers are required to merge
.rela.text into .relleb.text.
GNU ld allows certain unknown section types:
[SHT_LOUSER,SHT_HIUSER]and non-SHF_ALLOC[SHT_LOOS,SHT_HIOS]and non-SHF_OS_NONCONFORMING
but reports errors and stops linking for others (unless
--no-warn-mismatch is specified). When linking a
relocatable file using SHT_RELLEB, you might encounter
errors like the following:
1 | % clang -mrelleb -fuse-ld=bfd a.c b.c |
Older lld and mold do not report errors. I have filed:
- https://github.com/llvm/llvm-project/issues/84812 (milestone: 19.1)
- https://github.com/rui314/mold/issues/1215 (milestone: 2.4.2)
In addition, when there is one .eh_frame section with
CIE pieces but no relocation, _bfd_elf_parse_eh_frame will
report an error.
mips64el
mips64el has an incorrect r_info: a 32-bit little-endian
symbol index followed by a 32-bit big-endian type. If mips64el decides
to adopt RELLEB, they can utilize this opportunity to fix
r_info.
RELLEB proposal for the generic ABI
The initial revision has been proposed at https://groups.google.com/g/generic-abi/c/yb0rjw56ORw. I have also created an LLVM proposal and a binutils feature request.
In https://www.sco.com/developers/gabi/latest/ch4.sheader.html, make the following changes.
In Figure 4-9: Section Types,sh_type, append a row
SHT_RELLEB | 20
Add text:
SHT_RELLEB - The section holds compact relocation entries with explicit addends. An object file may have multiple relocation sections. See ''Relocation'' below for details.
In Figure 4-16: Special Sections, append
.rellebname | SHT_RELLEB | see below
Change the text below:
.relname, .relaname, and .rellebname
These sections hold relocation information, as described in ''Relocation''. If the file has a loadable segment that includes relocation, the sections' attributes will include the SHF_ALLOC bit; otherwise, that bit will be off. Conventionally, name is supplied by the section to which the relocations apply. Thus a relocation section for .text normally would have the name .rel.text, .rela.text, or .relleb.text.
In Figure 4-23: Relocation Entries, add:
1 | typedef struct { |
Add text above "A relocation section references two other sections":
A SHT_RELLEB section holds compact relocation entries
that decode to Elf32_Relr or Elf64_Relr
depending on the object file class (32-bit or 64-bit). Its content
begins with a ULEB128-encoded relocation count, followed by entries
encoding r_offset, r_type,
r_symidx, and r_addend. Note that the
r_info member in traditional REL/RELA formats has been
split into separate r_type and r_symidx
members, allowing uint32_t relocation types for ELFCLASS32
as well.
- Delta offset: (ULEB128-encoded) The difference in
r_offsetrelative to the previous entry, represented as a 32-bit or 64-bit unsigned integer for ELFCLASS32/ELFCLASS64, respectively. - Symbol index and type/addend presence: (SLEB128-encoded)
- If type and addend are equal to the previous entry's, the encoded value represents the symbol index; type and addend are omitted.
- Otherwise, the bitwise NOT of the encoded value (a 64-bit signed integer) is the symbol index; type and addend are present.
- Delta type: (SLEB128-encoded, if not omitted) The difference in relocation type relative to the previous entry, represented as a 32-bit signed integer.
- Delta addend: (SLEB128-encoded, if not omitted) The difference in
r_addendrelative to the previous entry, represented as a 32-bit or 64-bit signed integer for ELFCLASS32/ELFCLASS64, respectively.
The bitwise NOT of symbol index 0xffffffff is -0x100000000 (64-bit) instead of 0 (32-bit).
Example C++ encoder:
1 | // encodeULEB128(uint64_t, raw_ostream &os); |
For the first relocation entry, the previous offset, type, and addend members are treated as zero.
In Figure 5-10: Dynamic Array Tags, d_tag, add:
DT_RELLEB | 38 | d_ptr | optional |
optional
Add text below:
DT_RELLEB- This element is similar toDT_RELA, except its table uses the RELLEB format. The relocation count can be inferred from the header.