MaskRay

2025-05-17

2025-01

mold比lld快在哪里？

有完整parallel symbol resolution，不过为此单核性能下降，且CPU总时间增大(多进程链接可能会得不偿失)。

Relocation scanning快一些，简化了错误处理，对于一个架构只支持REL和RELA中的一种。 Range extension thunks不完善。 lld对于某个架构都支持REL、RELA、CREL，有一些virtual function开销。

简化--gc-sections，保守但少处理一些功能，使用了oneTBB并行化。

Assign sections快一些。由于不支持linker script SECTIONS和一些麻烦的SHF_LINK_ORDER，可以取巧。

Finalize synthetic sections和write output快一些。使用oneTBB并行，有好的调度器均衡地分配CPU资源处理synthetic sections或写大量input sections。而lld只能用穷人的llvm/Support/Parallel.h。

大多数linker script语法不支持。Symbol representation可以简化些。

用了fork trick (unless --nofork)，这可能是一种性能评测游戏。可能会让linker的parent process难以估计linker的资源消耗。

几乎所有函数都用了template <typename E>，减少了virtual function开销。但相应地，大大增加了code size。

https://maximullaris.com/awk_tech_notes.html

"AWK’s main goal was to be extremely terse yet productive language well suited for one-liners."

2025-02

https://github.com/yosefk/funtrace描述了一个小型function trace runtime。 funtrace.cpp是runtime，和program一起链接。funtrace.cpp hook了 -pg, -finstrument-functions instrument function的 fentry/return/__cyg_profile_func_enter ，记录function address和 x86 __rdtsc()值，然后由 Rust 写的 funtrace2viz/src/main.rs 转成 chrome trace event format

https://herecomesthemoon.net/2025/01/type-inference-in-rust-and-cpp/ mentions that Rust can adopt a Hindley-Milner type system because:

function overloads
lack of implicit conversions (except limited and specific conversions like lifetime shortening and references to pointers)
no inheritance
no specialization

C++, on the other hand, has these features and makes HM infeasible.

To minimize output size even without ld --gc-sections (https://maskray.me/blog/2021-02-28-linker-garbage-collection), libc implementations often aggressively separate function variants into individual files. E.g. asprintf.c sprintf.c vasprintf.c vsprintf.c vsnprintf.c

The archive processing semantics (https://maskray.me/blog/2021-06-20-symbol-processing#archive-processing) ensures that while libc.a:x.o is extracted, libc.a:y.o may remain unneeded. The linker just doesn't see the input sections from libc.a:y.o, so section-based garbage collection is not needed.

bzip2 seems like RLE+BWT+Huffman with quite small block sizes (meaningful when machines did not have large RAM back then). bzip3 tries both RLE and LZP (using a rather large minimum match length: LZP_MIN_MATCH of 40). Given the inherent slowness of the BWT, replacing Huffman with an arithmetic encoder (which has a better compression ratio) seems like a logical optimization. tANS is used in zstd and LFZSE, but later rANS becomes more popular in newer codecs. Anyhow, the bottleneck is likely in BWT.

1 2	#define LZP_DICTIONARY 18 7 refs #define LZP_MIN_MATCH 40

Further tuning of the LZP predictor might be possible, but any changes at this stage would impact codec compatibility.

The BWT step contributes significantly to the slow decompression speed of bzip2 and bzip3. LZMA and zstd using higher compression levels offer faster decompression speeds.

I haven't read bsc, another spiritual successor of bzip2. How does bzip3 compare with it? The source code does refer to bsc. https://encode.su/threads/3763-bsc-m03-(experimental-M03-sorting-compressor)

Achieved the milestone of 6000 commits in the llvm/llvm-project repository. Should identify gaps, learn more things, and pay less attention on commits.

2025-03

https://reviews.llvm.org/D23110 populated the generic assembly parser with MIPS expression modifiers (e.g. %pcrel_hi).

TLS handling in LLVM integrated assembler. AArch64 encodes the TLS kind in MCExpr: ImmVal = AArch64MCExpr::create(ImmVal, RefKind, getContext()); PowerPC encodes the TLS kind as MCSymbolRefExpr's VariantKind (encoded in MCSymbolRefExpr::SubclassData).

The 2010 LLVM MC commit https://github.com/llvm/llvm-project/commit/55992564152f0fce6758a4495cc39422f5e1cc94 introduced MCSymbolRefExpr::VariantKind with x86 relocation operators, but it's flawed - other expressions (e.g. MCBinaryExpr) need it too, and semantics get messy (e.g. (a@plt)-b). Many targets overload the generic interface. MCTargetExpr is a better fit, as AArch64 and RISC-V show. The lengthy list of PowerPC-specific VK_PPC_ entries is disheartening, though cleaning it up now feels like a daunting task.

Development on gold slowed considerably around a decade ago, following Cary's retirement. gold exhibits more issues on non-x86 architectures. I have extensive experience in this area-in 2018, I contributed to Google's migration from gold to lld.

LLD's --warn-backrefs is an excellent tool for ensuring compatibility with GNU linkers. I played a key role in Google's effort to transition from Gold to LLD. Around late 2019, I observed that many executables could no longer link with GNU Gold due to archive ordering problems stemming from missing cc_library dependencies. In 2020, I addressed numerous dependency issues and implemented --warn-backrefs. About a year later, I found that executables could still be linked with Gold—provided I omitted some options it didn’t recognize. This compatibility would not have been achievable without the deployment of --warn-backrefs.

2025-04

expr@specifier parsing is a hack in LLVM AsmParser. In GNU Assember's COFF targets, @ is allowed as part of identifiers. https://reviews.llvm.org/D1978

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 Optimizing noreturn nothrow functions by skipping callee-saved register spills is problematic. GCC's x86 port implemented the optimization and caused debug information regressions with noreturn calls like abort(). necessitating the -mnoreturn-no-callee-saved-registers addition for x86.