llvm-project
今年800+ commits,灌水很多,挑出一些值得一提的。
Updated in 2022-11.
LLD is the LLVM linker. Its ELF port is typically installed as
ld.lld
. This article makes an in-depth analysis of ld.lld's
performance. The topic has been in my mind for a while. Recently Rui
Ueyama released mold 1.0
and people wonder why with multi-threading its ELF port is faster than
ld.lld. So I finally completed the article.
First of all, I am very glad that Rui Ueyama started mold. Our world has a plethora of compilers, but not many people learn or write linkers. As its design documentation says, there are many drastically different designs which haven't been explored. In my view, mold is innovative in that it introduced parallel symbol table initialization, symbol resolution, and relocation scan which to my knowledge hadn't been implemented before, and showed us amazing results. The innovation gives existing and future linkers incentive to optimize further.
Updated in 2023-03.
In C++, dynamic initializations for non-local variables happen before
the first statement of the main
function. All (most?)
implementations just ensure such dynamic initializations happen before
main
.
As an extension, GCC supports
__attribute__((constructor))
which can make an arbitrary
function run before main
. A constructor function can have
an optional priority (__attribute__((constructor(N)))
).
Updated in 2024-02.
(In celebration of my 2800th llvm-project commit) Happy Halloween!
This article describes relative relocations and how the RELR format can greatly decrease file sizes.
An ELF linker performs the following steps to process an absolute
relocation type whose width equals the word size (e.g.
R_AARCH64_ABS64
, R_X86_64_64
).
1 | if (undefined_weak || (!preemptible && (no_pie || is_shn_abs))) |
Note: in FDPIC
ABIs, there is no single base address. Relative relocations can
still be used, but the runtime operation is not simply
*loc += base
.
In September, I wrote "So, dear glibc, will you be happy with my sending Clang patches?" in Build glibc with LLD 13. We have come to a turning point.
In Linux Plumbers Conference 2021, at the glibc Birds of a Feather session, I asked the Clang buildability question to the glibc stewards. (Interlude: I did not realize that I should attend the conference (it was a great opportunity from an outlier to meet some glibc folks). In Tuesday, Wei Wu (lazyparser) kindly gave me his account: "想去参加LPC么?我会议太多了今天参加不过来". I happily accepted it and typed the question during the glibc session.)
So I got positive responses. "Carlos: Yes, we could be happy with clang buildability." "Joseph: Patches should be split into logical changes." This is really great news! My unnesting patch had sat there for a while and I was unsure about the Clang buildability interest.
Updated in 2022-10.
In the afternoon, I came cross the Nim programming language again on Lobsters. I first learned some basics of the language in 2015, but had not touched it since then.
"Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula.", according to its website.
Basic features: parametric polymorphism. Advanced features: macros (including term-rewriting macros), compile-time function execution, effect system, concepts
An idea popped into my mind: why not solve some coding challenges in Nim?
As a niche language, it is not supported on many coding challenge websites. Fortunately, the Nim compiler generates C code. With a small amount of work, we can build a self-contained C source file suitable for submission.
Let's take a LeetCode challenge as an example. We write the main algorithm in Nim and use the emit pragma to write a C wrapper.
Updated in 2024-01.
Many architectures encode a branch/jump/call instruction with PC-relative addressing, i.e. the distance to the target is encoded in the instruction. In an executable or shared object (called a component in ELF), if the target is bound to the same component, the instruction has a fixed encoding at link time; otherwise the target is unknown at link time and there are two choices:
In All about Global Offset Table, I mentioned that linker/loader developers often frowned upon text relocations because the text segment will be unshareable. In addition, the number of relocations would be dependent on the number of calls, which can be large.
LLD is the LLVM linker. It started at the end of 2011 as a
work-in-progress rewrite of ld64 for the Mach-O binary format based on
the atom model. COFF and ELF ports based on the atom model were
contributed subsequently. They shared one symbol resolution model. (IMO
due to Mach-O's unfortunate limitation of 255 section
.subsections_via_symbols
was invented. The atom model was
an incarnation of the concept but it did not fit into ELF/PE where
sections are the better basic units.)
In 2015, both COFF and ELF ports were rewritten. (See "LLD improvement plan") Today, LLD is a mature and fast linker supporting multiple binary formats (ELF, Mach-O, PE/COFF, WebAssembly). FreeBSD, Android, and Chrome OS have adopted it as the main linker.
As a main contributor of LLD's ELF port who has fixed numerous corner cases in recent years, I consider that its x86-64 support has been mature since the 8.0.0 release and is in a great shape since 9.0.0. The AArch64 and PowerPC32/PowerPC64 support has been great since the 10.0.0 release. The 11.0.0 release has very solid linker script support. (When people complain that GNU ld's linker script is not immediately usable with LLD, it is almost assuredly the problem of the script itself.) So, what's next? Build glibc with LLD!
Updated in 2024-07.
In an executable or shared object (called a component in ELF), a text section may need the absolute virtual address of a symbol (e.g. a function or a variable). The reference arises from an address taken operation or a PLT entry. The address may be:
Before September 2020 FreeBSD could only be built on a FreeBSD host. Alexander Richardson did a lot of work making this possible: https://wiki.freebsd.org/BuildingOnNonFreeBSD.