Evolution of the ELF object file format

The ELF object file format is adopted by many UNIX-like operating systems. While I've previously delved into the control structures of ELF and its predecessors, tracing the historical evolution of ELF and its relationship with the System V ABI can be interesting in itself.

The format consists of the generic specification, processor-specific specifications, and OS-specific specifications. Three key documents often surface when searching for the generic specification:

The TIS specification breaks ELF into the generic specification, a processor-specific specification (x86), and an OS-specific specification (System V Release 4). However, it has not been updated since 1995. The Solaris guide, though well-written, includes Solaris-specific extensions not applicable to Linux and *BSD. This leaves us primarily with the System V ABI hosted on www.sco.com, which dedicates Chapters 4 and 5 to the ELF format.

Let's trace the ELF history to understand its relationship with the System V ABI.

Read More

Clang's -O0 output: branch displacement and size increase

tl;dr Clang 19 will remove the -mrelax-all default at -O0, significantly decreasing the text section size for x86.

Span-dependent instructions

In assembly languages, some instructions with an immediate operand can be encoded in two (or more) forms with different sizes. On x86-64, a direct JMP/JCC can be encoded either in 2 bytes with a 8-bit relative offset or 6 bytes with a 32-bit relative offset. A short jump is preferred because it takes less space. However, when the target of the jump is too far away (out of range for a 8-bit relative offset), a near jump must be used.

1
2
3
4
ja foo    # jump short if above, 77 <rel8>
ja foo # jump near if above, 0f 87 <rel32>
.nops 126
foo: ret

A 1978 paper by Thomas G. Szymanski ("Assembling Code for Machines with Span-Dependent Instructions") used the term "span-dependent instructions" to refer to such instructions with short and long forms. Assemblers grapple with the challenge of choosing the optimal size for these instructions, often referred to as the "branch displacement problem" since branches are the most common type. A good resource for understanding Szymanski's work is Assembling Span-Dependent Instructions.

Read More

When QOI meets XZ

QOI, the Quite OK Image format, has been gaining in popularity. Chris Wellons offers a great analysis.

QOI's key advantages is its simplicity. Being a byte-oriented format without entropy encoding, it can be further compressed with generic data compression programs like LZ4, XZ, and zstd. PNG, on the other hand, uses DEFLATE compression internally and is typically resistant to further compression. By applying a stronger compression algorithm on QOI output, you can often achieve a smaller file size compared to PNG.

Read More

A compact section header table for ELF

ELF's design emphasizes natural size and alignment guidelines for its control structures. However, this approach has substantial size drawbacks.

In a release build of llvm-project (-O3 -ffunction-sections -fdata-sections, the section header tables occupy 13.4% of the .o file size.

I propose an alternative section header table format that is signaled by e_shentsize == 0 in the ELF header. e_shentsize == sizeof(Elf64_Shdr) (or the 32-bit counterpart) selects the traditional section header table format.

Read More

C++ exit-time destructors

In ISO C++ standards, [basic.start.term] specifies that:

Constructed objects ([dcl.init]) with static storage duration are destroyed and functions registered with std::atexit are called as part of a call to std::exit ([support.start.term]). The call to std::exit is sequenced before the destructions and the registered functions. [Note 1: Returning from main invokes std::exit ([basic.start.main]). — end note]

For example, consider the following code:

1
struct A { ~A(); } a;

The destructor for object a will be registered for execution at program termination.

Read More

A compact relocation format for ELF

This article introduces CREL (previously known as RELLEB), a new relocation format offering incredible size reduction (LLVM implementation in my fork).

ELF's design emphasizes natural size and alignment guidelines for its control structures. This principle, outlined in Proceedings of the Summer 1990 USENIX Conference, ELF: An Object File to Mitigate Mischievous Misoneism, promotes ease of random access for structures like program headers, section headers, and symbols.

All data structures that the object file format defines follow the "natural" size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of four, etc. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an Elf32_Addr member will be aligned on a 4-byte boundary within the file. Other classes would have appropriately scaled definitions. To illustrate, the 64-bit class would define Elf64 Addr as an 8-byte object, aligned on an 8-byte boundary. Following the strictest alignment for each object allows the format to work on any machine in a class. That is, all ELF structures on all 32-bit machines have congruent templates. For portability, ELF uses neither bit-fields nor floating-point values, because their representations vary, even among pro- cessors with the same byte order. Of course the programs in an ELF file may use these types, but the format itself does not.

Read More

MMU-less systems and FDPIC

This article describes ABI and toolchain considerations about systems without a Memory Management Unit (MMU). We will focus on FDPIC and the in-development FDPIC ABI for RISC-V, with updates as I delve deeper into the topic.

Embedded systems often lack MMUs, relying on real-time operating systems (RTOS) like VxWorks or special Linux configurations (CONFIG_MMU=n). In these systems, the offset between the text and data segments is often not knwon at compile time. Therefore, a dedicated register is typically set to somewhere in the data segment and writable data is accessed relative to this register.

Why is the offset not knwon at compile time? There are primarily two reasons.

First, eXecute in Place (XIP), where code resides in ROM while the data segment is copied to RAM. Therefore, the offset between the text and data segments is often not knwon at compile time.

Second, all processes share the same address space without MMU. However, it is still desired for these processes to share text segments. Therefore needs a mechanism for code to find its corresponding data.

Read More