FORTRAN 77 COMMON blocks compiled to COMMON symbols. You could
declare a COMMON block in more than one file, with each specifying the
number, type, and size of the variable. The linker allocated enough
space to satisfy the largest size.
Binary sizes are important. Filesystem compression is ergonomic but
typically does not leverage application information well. Compressing
allocable sections (text, data) increases program startup time and
introduces memory overhead. In addition, filesystem compression is not
sufficiently portable.
Debug sections are large and contribute to a significant portion of
the binary size. Therefore, it is appealing to compress debug
sections.
Here is a -DCMAKE_BUILD_TYPE=Debug build directory of
llvm-project where I just ran ninja clang (on 2022-10-21).
Here are the total sizes of .o files, text sections, and debug sections.
It is typical that the debug information is often much larger than text
sections.
Some assemblers and linkers offer a feature to compress debug
sections.
llvm-objcopy supports --compress-debug-sections=zlib to
compress debug sections. We can use the option to check what if we
compress debug sections for the assembler.
1 2
% for i in **/*.o; do /tmp/Rel/bin/llvm-objcopy --compress-debug-sections=zlib $i /tmp/c/o && readelf -WS /tmp/c/o | awk 'BEGIN{FPAT="\\[.*?\\]|\\S+"} $2~/\.debug_/{d += strtonum("0x"$6)} END{print d}'; done | awk '{s+=$1} END{print s}' 161691798
For debug sections, we have a compression ratio of 3.90! The total .o
size is 995438992 bytes, 68% of the original.
Then let's check zstd.
1 2
% for i in **/*.o; do /tmp/Rel/bin/llvm-objcopy --compress-debug-sections=zstd $i /tmp/c/o && readelf -WS /tmp/c/o | awk 'BEGIN{FPAT="\\[.*?\\]|\\S+"} $2~/\.debug_/{d += strtonum("0x"$6)} END{print d}'; done | awk '{s+=$1} END{print s}' 159341878
To check whether an object file has compressed debug sections, we can
use readelf.
1 2 3 4 5 6
% readelf -S a.o ... Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al ... [ 5] .debug_abbrev PROGBITS 0000000000000000 000080 000087 00 C 0 0 8
In the readelf -S output, the Flg column
describes sh_flags where C indicates the
SHF_COMPRESSED flag.
1 2 3 4 5 6 7 8 9 10 11
% readelf -t a.o ... Section Headers: [Nr] Name Type Address Off Size ES Lk Inf Al Flags ... [ 5] .debug_abbrev PROGBITS 0000000000000000 000080 000087 00 0 0 8 [0000000000000800]: COMPRESSED ZLIB, 0000000000000093, 1
History
In 2007-11, Craig Silverstein added
--compress-debug-sections=zlib to gold. When the option
was specified, gold compressed the content of a .debug*
section with zlib and changed the section name to
.debug*.zlib.$uncompressed_size.
Unix-like systems represent static libraries as .a
archives. A .a archive consists of a header and a
collection of files with metadata. Its usage is tightly coupled with the
linker. An archive almost always contains only relocatable object files
and the linker has built-in support for reading it.
1 2 3 4
% as /dev/null -o a.o % rm -f b.a && ar rc b.a a.o % ar t b.a a.o
One may add other types of files to .a but that is
almost assuredly a bad thing.
1 2 3 4 5 6
% rm -f a.a && ar rc a.a a.o b.a # archive in archive, bad % ar t a.a a.o b.a % echo hello > a.txt % rm -f a.a && ar rc a.a a.o a.txt # text file in archive, bad
The original linker designers noticed that for many programs not
every member was needed, so they tried to allow the linker to skip
unused members. Therefore, they invented the interesting but confusing
archive member extraction rule. See Symbol
processing#Archive processing for details.
LLD is the LLVM linker. Its ELF port is typically installed as
ld.lld. This article makes an in-depth analysis of ld.lld's
performance. The topic has been in my mind for a while. Recently Rui
Ueyama released mold 1.0
and people wonder why with multi-threading its ELF port is faster than
ld.lld. So I finally completed the article.
First of all, I am very glad that Rui Ueyama started mold. Our world
has a plethora of compilers, but not many people learn or write linkers.
As its design documentation says, there are many drastically different
designs which haven't been explored. In my view, mold is innovative in
that it introduced parallel symbol table initialization, symbol
resolution, and relocation scan which to my knowledge hadn't been
implemented before, and showed us amazing results. The innovation gives
existing and future linkers incentive to optimize further.
In C++, dynamic initializations for non-local variables happen before
the first statement of the main function. All (most?)
implementations just ensure such dynamic initializations happen before
main.
As an extension, GCC supports
__attribute__((constructor)) which can make an arbitrary
function run before main. A constructor function can have
an optional priority (__attribute__((constructor(N)))).
(In celebration of my 2800th llvm-project commit) Happy
Halloween!
This article describes relative relocations and how the RELR format
can greatly decrease file sizes.
An ELF linker performs the following steps to process an absolute
relocation type whose width equals the word size (e.g.
R_AARCH64_ABS64, R_X86_64_64).
1 2 3 4 5 6 7 8 9 10 11 12
if (undefined_weak || (!preemptible && (no_pie || is_shn_abs))) link-time constant elseif(SHF_WRITE || znotext) { if (preemptible) emit a symbolic relocation(e.g. R_X86_64_64) else emit a relative relocation(e.g. R_X86_64_RELATIVE) } elseif(!shared && (copy_relocation || canonical_plt_entry)) { ... } else { error }
Note: in FDPIC
ABIs, there is no single base address. Relative relocations can
still be used, but the runtime operation is not simply
*loc += base.
In September, I wrote "So, dear glibc, will you be happy with my
sending Clang patches?" in Build glibc with LLD
13. We have come to a turning point.
In Linux Plumbers Conference 2021, at the glibc Birds of a Feather
session, I asked the Clang buildability question to the glibc stewards.
(Interlude: I did not realize that I should attend the conference (it
was a great opportunity from an outlier to meet some glibc folks). In
Tuesday, Wei Wu (lazyparser) kindly gave me his account:
"想去参加LPC么?我会议太多了今天参加不过来". I happily accepted it and
typed the question during the glibc session.)
So I got positive responses. "Carlos: Yes, we could be happy with
clang buildability." "Joseph: Patches should be split into logical
changes." This is really great news! My unnesting patch had sat there
for a while and I was unsure about the Clang buildability interest.
In the afternoon, I came cross the Nim programming language again on
Lobsters. I first learned some basics of the language in 2015, but had
not touched it since then.
"Nim is a statically typed compiled systems programming language. It
combines successful concepts from mature languages like Python, Ada and
Modula.", according to its website.
An idea popped into my mind: why not solve some coding challenges in
Nim?
As a niche language, it is not supported on many coding challenge
websites. Fortunately, the Nim compiler generates C code. With a small
amount of work, we can build a self-contained C source file suitable for
submission.
Let's take a LeetCode challenge as an example. We write the main
algorithm in Nim and use the emit pragma to write a C wrapper.
Many architectures encode a branch/jump/call instruction with
PC-relative addressing, i.e. the distance to the target is encoded in
the instruction. In an executable or shared object (called a component
in ELF), if the target is bound to the same component, the instruction
has a fixed encoding at link time; otherwise the target is unknown at
link time and there are two choices:
text relocation
indirection
In All about
Global Offset Table, I mentioned that linker/loader developers often
frowned upon text relocations because the text segment will be
unshareable. In addition, the number of relocations would be dependent
on the number of calls, which can be large.