---
# try also 'default' to start simple
theme: default
# random image from a curated Unsplash collection by Anthony
# like them? see https://unsplash.com/collections/94734566/slidev
# background: https://source.unsplash.com/collection/94734566/1920x1080
# apply any windi css classes to the current slide
class: 'text-center'
# https://sli.dev/custom/highlighters.html
highlighter: shiki
# show line numbers in code blocks
lineNumbers: false
# persist drawings in exports and build
drawings:
  persist: false
# use UnoCSS (experimental)
css: unocss

routerMode: 'hash'
---

# Highlights of my work in 2022

<!--
The last comment block of each slide will be treated as slide notes. It will be visible and editable in Presenter Mode along with the slide. [Read more in the docs](https://sli.dev/guide/syntax.html#notes)
-->

<style>
h1 {
  background-color: #2B90B6;
  background-image: linear-gradient(45deg, #4EC5D4 10%, #146b8c 20%);
  background-size: 100%;
  -webkit-background-clip: text;
  -moz-background-clip: text;
  -webkit-text-fill-color: transparent;
  -moz-text-fill-color: transparent;
}
</style>

---
layout: 'intro'
---

<h1 text="!5xl">MaskRay (宋方睿)</h1>

<div class="leading-8 opacity-80">
<a href="https://maskray.me/portfolio/llvm/">LLVM contributor since 2017</a>, ld.lld and Clang Driver maintainer<br>
binutils, glibc, GCC<br>
</div>

<div class="my-10 grid grid-cols-[40px_1fr] w-min gap-y-4">
  <ri-github-line class="opacity-50"/>
  <div><a href="https://github.com/MaskRay" target="_blank">MaskRay</a></div>
  <ri-user-3-line class="opacity-50"/>
  <div><a href="https://maskray.me" target="_blank">maskray.me</a></div>
</div>

<!-- <img src="/img/me.jpg" class="rounded-full size-200px object-cover-top abs-tr mt-16 mr-12"/> -->

---

* RELR relative relocation format (glibc, musl, DynamoRIO)
* zstd compressed debug sections (binutils, gdb, clang, lld/ELF, lldb)
* lld/ELF (huge performance improvement, RISC-V linker relaxation, `SHT_RISCV_ATTRIBUTES`)
* Clang built glibc (get the ball rolling)
* Make protected symbols work in binutils/glibc
* Involved in sanitizers, ThinLTO, AArch64/x86 hardening features, AArch64 Memtag ABI, RISC-V psABI, etc

---

## RELR relative relocation format

```c
typedef struct {
  Elf64_Addr    r_offset;   // Address
  Elf64_Xword   r_info;	    // 32-bit relocation type; 32-bit symbol index
  Elf64_Sxword  r_addend;   // Addend
} Elf64_Rela;
```

Dynamic relocations ordered by decreasing count:

* `R_*_RELATIVE`: relative relocations
* `R_*_JUMP_SLOT`: PLT relocations
* symbolic (e.g. `R_X86_64_64`)
* `R_*_GLOB_DAT`: GOT relocations. Identical to a symbolic relocation.
* `R_*_COPY` ([unfortunate](https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected))
* TLS descriptor/global-dynamic/local-dynamic/initial-exec

---

Many psABI documents use RELA. 24 bytes for one relocation in ELFCLASS64.

Relocations have locality. `R_*_RELATIVE` are almost always consecutive.
For an `R_*_RELATIVE`, symbol index = addend = 0.

```c
{r_offset = 0x2000, r_info = (R_X86_64_RELATIVE << 32) | 0, r_addend = 0},
{r_offset = 0x2008, r_info = (R_X86_64_RELATIVE << 32) | 0, r_addend = 0},
{r_offset = 0x2010, r_info = (R_X86_64_RELATIVE << 32) | 0, r_addend = 0},
{r_offset = 0x2018, r_info = (R_X86_64_RELATIVE << 32) | 0, r_addend = 0},
```

---

```asm
section:
  .quad 0x2000
  .quad 0x2008
  .quad 0x2010
  .quad 0x2018
```

RELR: starting offset + bitmaps encoding the subsequent relocations

---

```
% ~/projects/bloaty/Release/bloaty clang.pie.relr -- clang.pie
    FILE SIZE        VM SIZE
 --------------  --------------
  [NEW]  +163Ki  [NEW]  +163Ki    .relr.dyn
  +4.9%     +32  +5.4%     +32    .dynamic
  +2.5%      +8  [ = ]       0    .shstrtab
 -99.5% -13.8Mi -99.5% -13.8Mi    .rela.dyn
  -8.3% -13.6Mi  -8.2% -13.6Mi    TOTAL
```

---

* (In 2021-10, upstreamed `DT_RELR` patch to FreeBSD rtld-elf)
* In April, upstreamed `DT_RELR` patch to glibc (2.36 highlighted feature)
* In August, upstreamed `DT_RELR` patch to musl (milestone: 1.2.4)
* Upstreamed `DT_RELR` patch to DynamoRIO
* Contributed an unmerged gold patch

[Relative relocations and RELR](https://maskray.me/blog/2021-10-31-relative-relocations-and-relr)

---

## zstd compressed debug sections

- Binary size is important
- Compressing code and data is usually not suitable
- Filesystem level compression is not sufficiently portable. It does not leverage application information well
- Debug sections are large. Compressing them is profitable
- zstd is superior to zlib

---

### Case study

Here is a `-DCMAKE_BUILD_TYPE=Debug` build directory of llvm-project where I just ran `ninja clang`.

```text
% stat -c %s **/*.o | awk '{s+=$1} END{print s}'
1464767464
% readelf -WS **/*.o | awk 'BEGIN{FPAT="\\[.*?\\]|\\S+"} $2~/\.text/{d += strtonum("0x"$6)} END{print d}'
210026370
% readelf -WS **/*.o | awk 'BEGIN{FPAT="\\[.*?\\]|\\S+"} $2~/\.debug_/{d += strtonum("0x"$6)} END{print d}'
631069751
% readelf -WS **/*.o | awk 'BEGIN{FPAT="\\[.*?\\]|\\S+"} $2~/\.rela\.debug_/{d += strtonum("0x"$6)} END{print d}'
78448968
```

It is typical that the debug information is much larger than text sections.

---

```c
typedef struct {
	Elf64_Word	ch_type;
	Elf64_Word	ch_reserved;
	Elf64_Xword	ch_size;
	Elf64_Xword	ch_addralign;
} Elf64_Chdr;
```

Added `ELFCOMPRESS_ZSTD` to the generic System V Application Binary Interface

> ELFCOMPRESS_ZSTD - The section data is compressed with the Zstandard algoritm. The compressed Zstandard data bytes begin with the byte immediately following the compression header, and extend to the end of the section. Additional documentation for Zstandard may be found at http://www.zstandard.org

---

* Added zstd support to gas, ld.bfd, gold, gdb, objcopy, readelf, objdump, addr2line, etc
* Added zstd support to clang, ld.lld, lldb, llvm-objcopy, llvm-symbolizer, llvm-dwarfdump, etc

`-gz=zstd`

---

* binutils
  + addr2line: symbolization needs to decompress debug sections
  + gas: compress debug sections
  + ld, gold: decompress compressed input sections and compress output debug sections. [gold feature request](https://sourceware.org/bugzilla/show_bug.cgi?id=29641)
  + dwp: decompress compressed `.dwo`. dwp uses gold's code
  + nm: `--line-numbers` uses debug information
  + objcopy: `--decompress-debug-sections` and `--compress-debug-sections=zstd`
  + objdump: `--dwarf` decompresses compressed debug sections
  + readelf: `--debug-dump` and `--decompress` decompress compressed sections. [feature request](https://sourceware.org/bugzilla/show_bug.cgi?id=29640)
* gdb
  + decompress compressed debug sections in executables, shared objects, separate debug files, and `.dwo` files. [Feature request](https://sourceware.org/bugzilla/show_bug.cgi?id=29563)
  + MiniDebugInfo section `.gnu_debugdata` is compressed with xz. [zstd feature request](https://sourceware.org/bugzilla/show_bug.cgi?id=29584)

[zstd compressed debug sections](https://maskray.me/blog/2022-09-09-zstd-compressed-debug-sections)

---

## lld/ELF

* 陳枝懋 added initial RISC-V support for non-PIC. I added PIC and TLS support in 2019
* The port was mature but linker relaxation was the last main piece to bring feature parity with GNU ld
* Implemented `SHT_RISCV_ATTRIBUTES` merge support which has a niche value

[RISC-V linker relaxation in lld](https://maskray.me/blog/2022-07-10-riscv-linker-relaxation-in-lld)

---

### Performance improvement

* `/tmp/out/custom0/bin/lld` is lld 13 built with latest Clang
* `/tmp/out/custom2/bin/lld` is latest lld built with latest Clang

Link a `-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=on` build of clang 16
```
lld 13:     Time (mean ± σ):     687.1 ms ±   7.1 ms    [User: 642.6 ms, System: 431.7 ms]
latest lld: Time (mean ± σ):     422.9 ms ±   5.3 ms    [User: 579.8 ms, System: 470.7 ms]

Summary
  'numactl -C 32-39 /tmp/out/custom2/bin/lld -flavor gnu @response.txt --threads=8' ran
    1.62 ± 0.03 times faster than 'numactl -C 32-39 /tmp/out/custom0/bin/lld -flavor gnu @response.txt --threads=8'
```

---

Link a `-DCMAKE_BUILD_TYPE=Debug` build of clang 16
```
lld 13:     Time (mean ± σ):      4.494 s ±  0.039 s    [User: 7.516 s, System: 2.909 s]
latest lld: Time (mean ± σ):      3.174 s ±  0.037 s    [User: 7.361 s, System: 3.202 s]

Summary
  'numactl -C 32-39 /tmp/out/custom2/bin/lld -flavor gnu @response.txt --threads=8' ran
    1.42 ± 0.02 times faster than 'numactl -C 32-39 /tmp/out/custom0/bin/lld -flavor gnu @response.txt --threads=8'
```

---

### How?

* Improve internal representation and optimize passes
* Parallelize [section](https://reviews.llvm.org/D120626) and [local symbol](https://reviews.llvm.org/D119909) initialization
* [Parallelize relocation scanning](https://reviews.llvm.org/D133003)
* [Parallelize writes of different output sections](https://reviews.llvm.org/D131247)
* [Process archives as flattened `--start-lib` relocatable files](https://reviews.llvm.org/D119074) (avoid memory accesses to archive symbol tables)
* Parallelize `--compress-debug-sections=zlib`

[lld 14 ELF changes](https://maskray.me/blog/2022-02-20-lld-14-elf-changes)

[lld 15 ELF changes](https://maskray.me/blog/2022-09-05-lld-15-elf-changes)

---

## Clang built glibc (get the ball rolling)

glibc is probably the most high-profile OSS which cannot be built with Clang.

I sent some patches last year and made a few this year.
[When can glibc be built with Clang?](https://maskray.me/blog/2021-10-10-when-can-glibc-be-built-with-clang#asm-label-after-first-use)

This year Adhemerval Zanella from Linaro maintained a local branch to fix aarch64/i386/x86_64 builds.
I reviewed some of his patches.

---

## Make protected symbols work in binutils/glibc

```c
// b.c - b.so
__attribute__((visibility("protected"))) int var;
int foo() { return var; }

// a.c - exe
extern int var;
int main() { return var; }
```

* Disallowed direct extern access to protected symbol in GNU ld's arm/aarch64/x86 ports
* Issued warnings in glibc rtld
* GCC may switch back to direct access for `b.c` (`-fPIC`) in the future

[Copy relocations, canonical PLT entries and protected visibility](https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected)

---

## llvm-project

* C++/ObjC++: switch to gnu++17 as the default standard (fixed many tests)
* `--gcc-install-dir=`: use `clang++ --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/12` to use the selected GCC installation directory
* Defaulted to `-fsanitize-address-use-odr-indicator`
* Fixed a long-term bug related to local linkage `GlobalValue` in non-prevailing COMDAT, exposed in (Thin)LTO+PGO

Reviewed XXX commits. As an under-estimation,
```
% git shortlog -sn --since=2021-12-31 --grep 'Reviewed.*MaskRay' | awk '{s+=$1}END{print s}'
124
```
Many patches don't use the `Reviewed By:` tag.
