Clang's -O0 output: branch displacement and size increase

tl;dr Clang 19 will remove the -mrelax-all default at -O0, significantly decreasing the text section size for x86.

Span-dependent instructions

In assembly languages, some instructions with an immediate operand can be encoded in two (or more) forms with different sizes. On x86-64, a direct JMP/JCC can be encoded either in 2 bytes with a 8-bit relative offset or 6 bytes with a 32-bit relative offset. A short jump is preferred because it takes less space. However, when the target of the jump is too far away (out of range for a 8-bit relative offset), a near jump must be used.

1
2
3
4
ja foo    # jump short if above, 77 <rel8>
ja foo # jump near if above, 0f 87 <rel32>
.nops 126
foo: ret

A 1978 paper by Thomas G. Szymanski ("Assembling Code for Machines with Span-Dependent Instructions") used the term "span-dependent instructions" to refer to such instructions with short and long forms. Assemblers grapple with the challenge of choosing the optimal size for these instructions, often referred to as the "branch displacement problem" since branches are the most common type. A good resource for understanding Szymanski's work is Assembling Span-Dependent Instructions.

Read More

When QOI meets XZ

QOI, the Quite OK Image format, has been gaining in popularity. Chris Wellons offers a great analysis.

QOI's key advantages is its simplicity. Being a byte-oriented format without entropy encoding, it can be further compressed with generic data compression programs like LZ4, XZ, and zstd. PNG, on the other hand, uses DEFLATE compression internally and is typically resistant to further compression. By applying a stronger compression algorithm on QOI output, you can often achieve a smaller file size compared to PNG.

Read More

A compact section header table for ELF

ELF's design emphasizes natural size and alignment guidelines for its control structures. However, this approach has substantial size drawbacks.

In a release build of llvm-project (-O3 -ffunction-sections -fdata-sections, the section header tables occupy 13.4% of the .o file size.

I propose an alternative section header table format that is signaled by e_shentsize == 0 in the ELF header. e_shentsize == sizeof(Elf64_Shdr) (or the 32-bit counterpart) selects the traditional section header table format.

Read More

C++ exit-time destructors

In ISO C++ standards, [basic.start.term] specifies that:

Constructed objects ([dcl.init]) with static storage duration are destroyed and functions registered with std::atexit are called as part of a call to std::exit ([support.start.term]). The call to std::exit is sequenced before the destructions and the registered functions. [Note 1: Returning from main invokes std::exit ([basic.start.main]). — end note]

For example, consider the following code:

1
struct A { ~A(); } a;

The destructor for object a will be registered for execution at program termination.

Read More

A compact relocation format for ELF

This article introduces CREL (previously known as RELLEB), a new relocation format offering incredible size reduction (LLVM implementation in my fork).

ELF's design emphasizes natural size and alignment guidelines for its control structures. This principle, outlined in Proceedings of the Summer 1990 USENIX Conference, ELF: An Object File to Mitigate Mischievous Misoneism, promotes ease of random access for structures like program headers, section headers, and symbols.

All data structures that the object file format defines follow the "natural" size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of four, etc. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an Elf32_Addr member will be aligned on a 4-byte boundary within the file. Other classes would have appropriately scaled definitions. To illustrate, the 64-bit class would define Elf64 Addr as an 8-byte object, aligned on an 8-byte boundary. Following the strictest alignment for each object allows the format to work on any machine in a class. That is, all ELF structures on all 32-bit machines have congruent templates. For portability, ELF uses neither bit-fields nor floating-point values, because their representations vary, even among pro- cessors with the same byte order. Of course the programs in an ELF file may use these types, but the format itself does not.

Read More

MMU-less systems and FDPIC

This article describes ABI and toolchain considerations about systems without a Memory Management Unit (MMU). We will focus on FDPIC and the in-development FDPIC ABI for RISC-V, with updates as I delve deeper into the topic.

Embedded systems often lack MMUs, relying on real-time operating systems (RTOS) like VxWorks or special Linux configurations (CONFIG_MMU=n). In these systems, the offset between the text and data segments is often not knwon at compile time. Therefore, a dedicated register is typically set to somewhere in the data segment and writable data is accessed relative to this register.

Why is the offset not knwon at compile time? There are primarily two reasons.

First, eXecute in Place (XIP), where code resides in ROM while the data segment is copied to RAM. Therefore, the offset between the text and data segments is often not knwon at compile time.

Second, all processes share the same address space without MMU. However, it is still desired for these processes to share text segments. Therefore needs a mechanism for code to find its corresponding data.

Read More

Toolchain notes on z/Architecture

This article describes some notes about z/Architecture with a focus on the ELF ABI and ELF linkers. An lld/ELF patch sparked my motivation to study the architecture and write this post.

z/Architecture is a big-endian mainframe computer architecture supporting 24-bit, 31-bit, and 64-bit addressing modes. It is the latest generation in a lineage stretching back to the 1964 with IBM System/360 (32-bit general-purpose registers and 24-bit addressing). This lineage includes System/370 (1970), System/370 Extended Architecture (1983), Enterprise Systems Architecture/370 (1988), and Enterprise Systems Architecture/390 (1990). For a deeper dive into the design choices behind z/Architecture's extension from ESA/390, you can refer to "Development and attributes of z/Architecture." IBM System/360 at Computer History Museum

Read More