Fun with ELF dynamic loaders
MaskRay
How does the kernel load an executable?
1 | % llvm-readelf -l /tmp/RelA/bin/clang |
The kernel maps PT_LOAD program headers, and if
PT_INTERP exists, loads the interpreter's
PT_LOAD program headers. It doesn't care about relocations,
TLS, shared object dependencies, RELRO, etc.
ELF dynamic loader
A position-dependent statically linked executable does not need relocations or dependencies.
A position-independent statically linked executable needs relocations.
All other types of executables need a dynamic loader (aka dynamic linker, rtld (run-time link editor)). People usually name it ld.so because its filename is typically a variant of literal "ld.so".
- load shared object dependencies (
DT_NEEDED) - resolve relocations (
DT_RELA, DT_PLTREL) - call initialization/termination functions
(
DT_INIT_ARRAY, DT_FINI_ARRAY) - initialize TLS (
PT_TLS) dlopen, dlsym
A libc implementation typically provides ld.so along with libc/libm/libpthread.
ELF dynamic loaders
- System V release 4 (1988), Solaris 2.0 (1992)
- NetBSD (1993), FreeBSD (1998)
- glibc (1995)
- Android bionic (200x)
It's a pity that glibc did not learn from NetBSD's implementation.
It is (one of?) the most complex part in glibc.
FreeBSD rtld-elf
https://maskray.me/blog/2021-08-22-freebsd-src-browsing-on-linux-and-my-rtld-contribution
STB_WEAKin symbol lookupp_memszofPT_GNU_RELROp_vaddr % p_align != 0forPT_TLS
Build glibc 2.35 with LLD 13
https://maskray.me/blog/2021-09-05-build-glibc-with-lld
Per aspera ad astra
The next release glibc 2.35 should be buildable with LLD 13.0.0, with no test regression on aarch64/i386/x86-64 (GCC 10.3.0, 11.2.0)
Remaining work:
- Add a build configuration to
scripts/build-many-glibcs.py
scripts/build-many-glibcs.py compilers x86_64-linux-gnu
created GCC has many failures. Needs investigation.
Build glibc with Clang
https://maskray.me/blog/2021-10-10-when-can-glibc-be-built-with-clang
- GCC nested functions
- asm label after first use
- assembly (inline asm parsing, integrated assembler)
- Clang warnings are more picky
Pushed 5 commits to glibc trunk so far.
Improve sanitizer reliability related to TLS
[PATCH] elf: Add __libc_get_static_tls_bounds [BZ #16291]
(pending)
- asan/hwasan/msan/tsan need to unpoison static TLS blocks to prevent false positives due to reusing the TLS blocks with a previous thread.
- lsan needs TCB for pointers into
pthread_setspecificregions.
Relocations
1 | typedef struct { |
Dynamic relocations ordered by decreasing count:
R_*_RELATIVE: relative relocationsR_*_JUMP_SLOT: PLT relocations- symbolic (e.g.
R_X86_64_64) R_*_GLOB_DAT: GOT relocations. Identical to a symbolic relocation.R_*_COPY(unfortunate)- TLS descriptor/global-dynamic/local-dynamic/initial-exec
Relative relocations
Many psABI documents use RELA. 24 bytes for one relocation in ELFCLASS64.
Relocations have locality. R_*_RELATIVE are almost
always consecutive. For an R_*_RELATIVE, symbol index =
addend = 0.
1 | {r_offset = 0x2000, r_info = (R_X86_64_RELATIVE << 32) | 0, r_addend = 0}, |
How about a compact representation?
Brainstorm: remove relative relocations
Taking the address of a non-preemptible symbol (and does not trigger GOT optimization).
- virtual table
(
clang -fexperimental-relative-c++-abi-vtables) - jump tables
- string literal array
(
const char *a[] = {"hello", "world"};) - function pointer array
- ...
There is a (not-too-large) upper bound on the number of removable relative relocations. Some require new ABI and keeping compatibility is difficult. References to preemptible symbols are fundamentally not optimizable.
Idea A: omit r_info
This needs a new relocation section.
1 | section: |
Delta encoding with sorting? Try avoiding unaligned/packed structures for ELF. Think about medium/large code models.
Idea B: run-length encoding
Firefox's ELF hack. Improving libxul startup I/O by hacking the ELF format
A base offset followed by a count of subsequent consecutive relocations.
There are many lossless compression techniques. We need one which is easy and efficient.
Idea C: RELR
A base offset followed by 63-bit bitmaps encoding the subsequent relocations.
1 | /* Process RELR relative relocations. */ |
This is simple and efficient (usually 3% or smaller than
R_*_RELATIVE in .rela.dyn)
1 | % ~/projects/bloaty/Release/bloaty clang.pie.relr -- clang.pie |
FreeBSD rtld-elf and glibc ld.so
FreeBSD took it quickly (thanks to kib). (I think FreeBSD
libexec/rtld-elf needs more tests.) Future work: enable it
by default.
1 | * [v2] elf: Support DT_RELR relative relocation format [BZ #27924] |
Lack of GNU ld support (PR27923) is an issue. (If you are not a binutils maintainer, think again before taking a stab. Actually MaskRay has sent a message to Nick Clifton for this task.)