# RISC-V linker relaxation in lld

On 2022-07-07, I added a RISC-V linker relaxation framework in ld.lld and implemented R_RISCV_ALIGN/R_RISCV_CALL/R_RISCV_CALL_PLT relaxation. The changes will be included in the next llvm-project release 15.0.0. This post describes the implementation.

# Everything I know about glibc

UNDER CONSTRUCTION

glibc is an implementation of the user-space side of standard C/POSIX functions with Linux extensions.

# C standard library headers in C++

In recent ISO C++ standards, [depr.c.headers] describes how a C header name.h is transformed to the corresponding C++ cname header. There is a helpful example:

[ Example: The header assuredly provides its declarations and definitions within the namespace std. It may also provide these names within the global namespace. The header <stdlib.h> assuredly provides the same declarations and definitions within the global namespace, much as in the C Standard. It may also provide these names within the namespace std. — end example ]

"may also" in the wording allows implementations to provide mix-and-match, e.g. std::exit can be used with #include <stdlib.h> and ::exit can be used with #include <cstdlib>.

libstdc++ chooses to enable global namespace declarations with C++ cname header. For example, #include <cstdlib> also includes the corresponding C header stdlib.h and we get declarations in both the global namespace and the namespace std.

The preprocessed output looks like:

The compiler knows that the declarations in std are identical to the ones in the global namespace. The compiler recognizes some library functions and can optimize them. By using the compiler can optimize some C library functions in namespace std (e.g. many std::mem* and std::str* functions).

For some C standard library headers, libstdc++ provides wrappers (libstdc++-v3/include/c_compatibility/) which take precedence over the glibc headers. The configuration of libstdc++ uses --enable-cheaders=c_global by default. if GLIBCXX_C_HEADERS_C_GLOBAL in libstdc++-v3/include/Makefile.am describes that the 6 wrappers (complex.h, fenv.h, tgmath.h, math.h, stdatomic.h, stdlib.h) shadow the C library headers of the same name. For example, #include <stdlib.h> includes the wrapper stdlib.h which includes cstdlib, therefore bringing exit into the namespace std.

# PI_STATIC_AND_HIDDEN/HIDDEN_VAR_NEEDS_DYNAMIC_RELOC in glibc rtld

Recently I have fixed two glibc rtld bugs related to early GOT relocation for retro-computing architectures: m68k and powerpc32. They are related to the obscure PI_STATIC_AND_HIDDEN macro which I am going to demystify.

In 2002, PI_STATIC_AND_HIDDEN was introduced into glibc rtld (runtime loader). This macro indicates whether accesses to the following types of variables need dynamic relocations.

• static specifier: static int a; (STB_LOCAL)
• hidden visibility attribute: __attribute__((visibility("hidden"))) int a; (STB_GLOBAL STV_HIDDEN), __attribute__((weak, visibility("hidden"))) int a; (STB_WEAK STV_HIDDEN)

PI in the macro name is an abbreviation for "position independent". This is a misnomer: a code sequence using GOT is typically position-independent as well.

In -fPIC mode, the compiler assumes that all non-local STV_DEFAULT symbols may be preemptible at run time. A GOT-generating relocation is used and the GOT is typically unavoidable at link time (on some architectures the linker can optimize out the GOT). This case is not interesting to rtld as rtld does not need to export such variables.

Excluding these cases (non-local STV_DEFAULT), all other variables are known to be non-preemptible at compile time. The compiler can generate code which is guaranteed to avoid dynamic relocations at link time.

On 2022-04-26, I replaced PI_STATIC_AND_HIDDEN with the opposite macro HIDDEN_VAR_NEEDS_DYNAMIC_RELOC.

## Non-HIDDEN_VAR_NEEDS_DYNAMIC_RELOC architectures with PC-relative instructions

To avoid dynamic relocations, the most common approach is to generate PC-relative instructions, as most modern architectures (e.g. aarch64, riscv, and x86-64) provide. Using PC-relative instructions to reference variables assumes that the distance from code to data is a link-time constant. Nowadays this condition is satisfied everywhere except the rare FDPIC ABI.

Here are some assembly fragments from architectures using PC-relative instructions. The instructions may not be familar to you, but that is fine. We can see that there is no GOT related marker. I have added some comments indicating the relocation type and the referenced symbol. var in the C code has internal linkage which lowers to the STB_LOCAL binding. References to such local symbols are often redirected to the section symbol (.bss): the link-time behaviors are identical.

## Non-HIDDEN_VAR_NEEDS_DYNAMIC_RELOC architectures without PC-relative instructions

Many older architectures do not have PC-relative instructions.

x86-32 does not have PC-relative instructions, but it provides a way to avoid a load from a GOT entry. It achieves this with a detour: compute the address of _GLOBAL_OFFSET_TABLE_ (GOT base symbol), then add an offset (S-_GLOBAL_OFFSET_TABLE_) to get the symbol address. _GLOBAL_OFFSET_TABLE_ is computed this way: compute the address of a location in code, then add an offset (_GLOBAL_OFFSET_TABLE_ - PC).

You probably see now how the x86-32 ABI was misdesigned: the involvement of _GLOBAL_OFFSET_TABLE_ is unnecessary. A relocation with the calculation of S-_GLOBAL_OFFSET_TABLE_ would achieve the same net effect.

The relocations with GOT in their names just use the GOT as an anchor. They don't indicate a load from a GOT entry.

powerpc64 does not have PC-relative instructions before POWER10. Earlier microarchitectures use TOC-relative relocations to compute the symbol address.

A pending patch [PATCH v3] powerpc64: Enable static-pie will define PI_STATIC_AND_HIDDEN.

## HIDDEN_VAR_NEEDS_DYNAMIC_RELOC architectures

A few older architectures tend to use a load from a GOT entry. The GOT entry needs a relative relocation (instead of R_*_GLOB_DAT: the symbol is non-preemptible, so no symbol search is needed). See All about Global Offset Table. In glibc, these architecture define HIDDEN_VAR_NEEDS_DYNAMIC_RELOC.

Some architectures even assume the distance from code to data may not be a link-time constant (see All about Procedure Linkage Table). They do not provide a relocation with a calculation of S-_GLOBAL_OFFSET_TABLE_ or S-P.

The first task of rtld is to relocate itself and bind all symbols to itself. Afterward, non-preemptible functions and data can be freely accessed.

On architectures where a GOT entry is used to access a non-preemptible variable, rtld needs to be careful not to reference such variables before relative relocations are applied. In rtld.c, _dl_start has the following code:

_rtld_local_ro is a hidden global variable. Taking its address may be reordered before ELF_DYNAMIC_RELOCATE by the compiler. On an architecture using a GOT entry to load the address, the reordering will make the subsequent memory store (_rtld_local_ro.dl_find_object) to crash, since the GOT address is incorrect: it's zero or the link-time address instead of the run-time address.

## powerpc32

I recently cleaned up the bootstrap code a bit with elf: Move elf_dynamic_do_Rel RTLD_BOOTSTRAP branches outside. Afterwards, GCC powerpc32 appears to reliably reorder _rtld_local_ro, causing ld.so to crash right away.

I was pretty sure there is a relocation bug but was not immediately clear which piece of code may be at fault.

# Archives and --start-lib

## .a archives

Unix-like systems represent static libraries as .a archives. A .a archive consists of a header and a collection of files with metadata. Its usage is tightly coupled with the linker. An archive almost always contains only relocatable object files and the linker has built-in support for reading it.

One may add other types of files to .a but that is almost assuredly a bad thing.

The original linker designers noticed that for many programs not every member was needed, so they tried to allow the linker to skip unused members. Therefore, they invented the interesting but confusing archive member extraction rule. See Symbol processing#Archive processing for details.