一如既往,主要在工具链领域耕耘。给这些high-profile OSS贡献的时候,希望透过这个微小的角度改变世界。
Highlights
- RELR relative relocation format (glibc, musl, DynamoRIO)
- zstd compressed debug sections (binutils, gdb, clang, lld/ELF, lldb)
- lld/ELF (huge performance improvement, RISC-V linker relaxation,
SHT_RISCV_ATTRIBUTES
) - Clang built glibc (get the ball rolling)
- Make protected symbols work in binutils/glibc
- Involved in sanitizers, ThinLTO, AArch64/x86 hardening features, AArch64 Memtag ABI, RISC-V psABI, etc
RELR relative relocation format
- (In 2021-10, upstreamed
DT_RELR
patch to FreeBSD rtld-elf) - In April, upstreamed
DT_RELR
patch to glibc (highlighted feature for the 2.36 release) - In August, upstreamed
DT_RELR
patch to musl (milestone: 1.2.4) - Upstreamed
DT_RELR
patch to DynamoRIO - Contributed an unmerged gold patch
Carlos O'Donell said:
It's exactly that, .rela.dyn is 30x larger than .rela.plt in glibc. I applaud Fangrui Song's efforts here to move DT_RELR forward. If you're going to do one thing that has high impact and move multiple communities forward then picking .rela.dyn is the section to pick.
zstd compressed debug sections
- Added zstd support to gas, ld.bfd, gold, gdb, objcopy, readelf, objdump, addr2line, etc
- Added zstd support to clang, ld.lld, lldb, llvm-objcopy, llvm-symbolizer, llvm-dwarfdump, etc
zstd compressed debug sections
lld/ELF
RISC-V
陈枝懋 added initial RISC-V support for non-PIC in 2018. I added PIC and TLS support in 2019 (acknowledgements by lowRISC). The port was mature but linker relaxation was the last main piece to bring feature parity with GNU ld. This year I
- Implemented RISC-V linker relaxation (acknowledgement)
- Implemented
SHT_RISCV_ATTRIBUTES
merge support which has a niche value - Implemented
DT_RISCV_VARIANT_CC
RISC-V linker relaxation in lld
Performance
I spent some weekends improving the performance of lld/ELF this year.
Let's compare an lld 13 built with latest Clang
(/tmp/out/custom0/bin/lld
) with latest lld built with
latest Clang (/tmp/out/custom2/bin/lld
).
Link a
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=on
build of clang 16: 1
2
3
4
5
6lld 13: Time (mean ± σ): 687.1 ms ± 7.1 ms [User: 642.6 ms, System: 431.7 ms]
latest lld: Time (mean ± σ): 422.9 ms ± 5.3 ms [User: 579.8 ms, System: 470.7 ms]
Summary
'numactl -C 32-39 /tmp/out/custom2/bin/lld -flavor gnu @response.txt --threads=8' ran
1.62 ± 0.03 times faster than 'numactl -C 32-39 /tmp/out/custom0/bin/lld -flavor gnu @response.txt --threads=8'
Link a -DCMAKE_BUILD_TYPE=Debug
build of clang 16:
1
2
3
4
5
6lld 13: Time (mean ± σ): 4.494 s ± 0.039 s [User: 7.516 s, System: 2.909 s]
latest lld: Time (mean ± σ): 3.174 s ± 0.037 s [User: 7.361 s, System: 3.202 s]
Summary
'numactl -C 32-39 /tmp/out/custom2/bin/lld -flavor gnu @response.txt --threads=8' ran
1.42 ± 0.02 times faster than 'numactl -C 32-39 /tmp/out/custom0/bin/lld -flavor gnu @response.txt --threads=8'
This great speedup is achieved by
- Improve internal representation and optimize passes
- Parallelize section and local symbol initialization
- Parallelize relocation scanning
- Parallelize writes of different output sections
- Process archives as
flattened
--start-lib
relocatable files (avoid memory accesses to archive symbol tables) - Parallelize
--compress-debug-sections=zlib
See lld 14 ELF changes and lld 15 ELF changes for detail. As usual, I wrote the ELF port's release notes for the two releases.
Clang built glibc (get the ball rolling)
glibc is probably the most prominent OSS which cannot be built with Clang yet. I sent some patches last year and made a few this year. See my notes from the last year: When can glibc be built with Clang?
This year Adhemerval Zanella from Linaro maintained a local branch to fix aarch64/i386/x86_64 builds. I reviewed some of his patches.
It seems that such work will benefit some research projects. For example, Intel FineIBT used a GRTE branch of glibc.
llvm-project
- C++/ObjC++: switch to gnu++17 as the default standard (fixed many tests)
--gcc-install-dir=
: useclang++ --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/12
to use the selected GCC installation directory. Gentoo uses this (/etc/clang/gentoo-gcc-install.cfg
) to select the configured GCC installation- Defaulted to
-fsanitize-address-use-odr-indicator
- Fixed a long-term bug related to local linkage
GlobalValue
in non-prevailing COMDAT, exposed in (Thin)LTO+PGO - Fixed some
-fdebug-prefix-map=
issues for debug information for assembly sources - Supported
SOURCE_DATE_EPOCH
in Clang - Helped some opaque pointers migration
- Helped legacy pass manager deprecation
Reviewed many commits. A lot of people don't add a
Reviewed By:
tag. Anyway, counting commits with the tag can
give an underestimate. 1
2% git shortlog -sn bfc8f76e60a8efd920dbd6efc4467ffb6de15919.. --grep 'Reviewed .*MaskRay' | awk '{s+=$1}END{print s}'
386
My number of commits exceeded 4000 this year. Many are clean-up commits or fixup for others' work. I hope that I can do more useful work next year.
binutils
Reported many bugs and feature requests:
ar: Add --thin to avoid 'T' conflict with X/Open System Interface
ld: Make --compress-debug-sections=zlib parallel
ld: Customize output section type
ar: add 'L' modifier as a shortcut for ADDLIB
gold: --no-define-common is incompatible with GNU ld
ld: `not found for insert` error has a weird ordering
objcopy --weaken-symbol does not weaken STB_GNU_UNIQUE symbols
gas: .set should copy st_size only if src's st_size is unset
gas: -gsomething-not-already-a-long-option does not get a diagnostic
nm: add --no-weak
ld: ENTRY(no_such_symbol) in -shared mode does not report a warning
build failures with make --shuffle -j N
binutils: support zstd for SHF_COMPRESSED debug sections
ld ppc64: unneeded R_PPC64_NONE in .rela.dyn when linking Linux vdso
gdb: support zstd for SHF_COMPRESSED debug sections
elfutils: support zstd for SHF_COMPRESSED debug sections
objdump -p considers an empty .gnu.version_r invalid
gdb: support zstd compressed .gnu_debugdata
readelf: support zstd for SHF_COMPRESSED debug sections
gold, dwp: support zstd for SHF_COMPRESSED debug sections
ld: add -w to suppress warnings
ld x86: -r should not define _TLS_MODULE_BASE_
ld riscv: undefined elf_backend_obj_attrs_handle_unknown causes segfault when merging .riscv.attributes
objdump: add --show-all-symbols
My commits:
ar: Add --thin for creating thin archives
ld: Support customized output section type
objcopy --weaken-symbol: apply to STB_GNU_UNIQUE symbols
gas: copy st_size only if unset
gas: Port "copy st_size only if unset" to aarch64 and riscv
aarch64: Disallow copy relocations on protected data
aarch64: Define elf_backend_extern_protected_data to 0 [PR 18705]
aarch64: Allow PC-relative relocations against protected STT_FUNC for -shared
arm: Define elf_backend_extern_protected_data to 0 [PR 18705]
x86: Make protected symbols local for -shared
RISC-V: Remove R_RISCV_GNU_VTINHERIT/R_RISCV_GNU_VTENTRY
binutils, gdb: support zstd compressed debug sections
libctf: Add ZSTD_LIBS to LIBS so that ac_cv_libctf_bfd_elf can be true
sim: Link ZSTD_LIBS
ld: Add --undefined-version
readelf: support zstd compressed debug sections [PR 29640]
gold, dwp: support zstd compressed input debug sections [PR 29641]
gold: add --compress-debug-sections=zstd [PR 29641]
glibc
27 commits. Some work on the dynamic loader. Notable commits:
elf: Support DT_RELR relative relocation format [BZ #27924]
dlsym: Make RTLD_NEXT prefer default version definition [BZ #14932]
. When the first object providingfoo
defines bothfoo@v1
andfoo@@v2
,dlsym(RTLD_NEXT, "foo")
returnedfoo@v1
whiledlsym(RTLD_DEFAULT, "foo")
returnsfoo@@v2
. Nowdlsym(RTLD_NEXT, "foo")
returnsfoo@@v2
as wellelf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC
elf: Refine direct extern access diagnostics to protected symbol
elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA
The longstanding problem that glibc did wacky things with copy relocations/canonical PLT entries on protected data/functions symbols are correctly unsupported. I hope that future GCC can stop using indirect access for external protected symbol accesses. See Copy relocations, canonical PLT entries and protected visibility for detail.
Linux kernel
4 commits. Fixed linux-perf when unwinding ld.lld linked objects. Consulted on a number of toolchain questions.
Blog
Wrote 25 blog posts (including this one, mainly about toolchains) and revised many posts initially written in 2020 and 2021.
Misc
I was added as a collaborator of riscv-non-isa/riscv-elf-psabi-doc.
Learned some Jai programming language. It has many exciting ideas. Solved some algorithm challenges with Nim. Unfortunately my workflow transpiling Nim to standalone C somehow broke later this year.
Trips: Tucson, Greater Los Angeles, Cambridge and Boston, Denver, San Diego, Bellevue and Seattle, Kalispell, Washington, Honolulu.
Mastodon: https://hachyderm.io/@meowray