Updated in 2023-08.
Some random notes about toolchain testing.
Classification
Testing levels.
Updated in 2023-08.
Some random notes about toolchain testing.
Testing levels.
In C++, inline functions, template instantiations, and a few other things can be defined in multiple object files but need deduplication at link time. In the dark ages the functionality was implemented by weak definitions: the linker does not report duplicate definition errors and resolves the references to the first definition. The downside is that unneeded copies remained in the linked image.
The main task of a linker script is to define extra symbols and define how input sections map into output sections. It has other miscellaneous features which can be implemented via command line options:
ENTRY
command can be replaced by
--entry
.OUTPUT_FORMAT
command can usually be replaced by
-m
.SEARCH_DIRS
command can be replaced by
-L
.VERSION
command can be replaced by
--version-script
.INPUT
and GROUP
commands can add other
files as input. This provides a mechanism to split an archive/shared
object into multiple files.UNDER CONSTRUCTION (COFF, Mach-O)
Symbol processing is a major step in a linker. In most binary formats, the linker maintains a global symbol table and performs symbols resolution for each input file (object file, shared object, archive, LLVM bitcode file). Some command line options can define/undefine symbols as well. The symbol resolution can affect archive processing and many subsequent steps (LTO, relocation processing, as-needed shared objects, etc).
This article describes the dependency related linker options
-z defs
, --no-allow-shlib-undefined
, and
--warn-backrefs
. Deploying them in a build system can
improve build health.
-z defs
This article describes ELF interposition, the linker option
-Bsymbolic
, and its friends. In the end, it will discuss an
ambitious plan which I dubbed "the Last Alliance of ELF and Men".
Motivated by a great post by Daniel Colascione ("Python is 1.3x faster when compiled in a way that re-examines shitty technical decisions from the 1990s.") and a recent rant from Linus Torvalds on shared objects' performance issues, I have summarized the current unfortunate ELF state and filed some GCC/binutils feature requests. I believe the performance of our shared object oriented world will be no slower than one with mostly statically linked executables.
(I wrote -fno-semantic-interposition first but then realized reorganization would improve readability, so moved some parts and added some stuff to this new article.)
Updated in 2022-05.
This article is a continuation to ELF
interposition and -Bsymbolic. It focuses on the GCC/Clang option
-fno-semantic-interposition
and why it can (sometimes
incredibly) optimize -fPIC
programs.
Updated in 2023-10.
GCC and Clang support __attribute__((weak))
which marks
a symbol weak. The same effect can be achieved with a preprocessor
directive #pragma weak symbol
.
In ELF, there are three main symbol bindings. The ELF specification says:
STB_LOCAL
: Local symbols are not visible outside the
object file containing their definition. Local symbols of the same name
may exist in multiple files without interfering with each other.STB_GLOBAL
: Global symbols are visible to all object
files being combined. One file's definition of a global symbol will
satisfy another file's undefined reference to the same global
symbol.STB_WEAK
: Weak symbols resemble global symbols, but
their definitions have lower precedence.In research papers, a segment tree refers to a tree data structure allowing retrieving a list of segments which contain the given point. In competitive programming, the name "segment tree" usually refers to a data structure maintaining an array. According to http://web.ntnu.edu.tw/~algo/Sequence2.html, the data structure originated from Baltic OI 2001: Mars Maps.