COMDAT and section group

Vague linkage

In C++, inline functions, template instantiations, and a few other things can be defined in multiple object files but need deduplication at link time. In the dark ages the functionality was implemented by weak definitions: the linker does not report duplicate definition errors and resolves the references to the first definition. The downside is that unneeded copies remained in the linked image.

Read More

SECTIONS and OVERWRITE_SECTIONS

The main task of a linker script is to define extra symbols and define how input sections map into output sections. It has other miscellaneous features which can be implemented via command line options:

  • The ENTRY command can be replaced by --entry.
  • The OUTPUT_FORMAT command can usually be replaced by -m.
  • The SEARCH_DIRS command can be replaced by -L.
  • The VERSION command can be replaced by --version-script.
  • The INPUT and GROUP commands can add other files as input. This provides a mechanism to split an archive/shared object into multiple files.

Read More

Symbol processing

UNDER CONSTRUCTION (COFF, Mach-O)

Symbol processing is a major step in a linker. In most binary formats, the linker maintains a global symbol table and performs symbols resolution for each input file (object file, shared object, archive, LLVM bitcode file). Some command line options can define/undefine symbols as well. The symbol resolution can affect archive processing and many subsequent steps (LTO, relocation processing, as-needed shared objects, etc).

Read More

ELF interposition and -Bsymbolic

This article describes ELF interposition, the linker option -Bsymbolic, and its friends. In the end, it will discuss an ambitious plan which I dubbed "the Last Alliance of ELF and Men".

Motivated by a great post by Daniel Colascione ("Python is 1.3x faster when compiled in a way that re-examines shitty technical decisions from the 1990s.") and a recent rant from Linus Torvalds on shared objects' performance issues, I have summarized the current unfortunate ELF state and filed some GCC/binutils feature requests. I believe the performance of our shared object oriented world will be no slower than one with mostly statically linked executables.

(I wrote -fno-semantic-interposition first but then realized reorganization would improve readability, so moved some parts and added some stuff to this new article.)

Read More

Weak symbol

C/C++

GCC and Clang support __attribute__((weak)) which marks a symbol weak. The same effect can be achieved with a preprocessor directive #pragma weak symbol.

Binary format

In ELF, there are three main symbol bindings. The ELF specification says:

  • STB_LOCAL: Local symbols are not visible outside the object file containing their definition. Local symbols of the same name may exist in multiple files without interfering with each other.
  • STB_GLOBAL: Global symbols are visible to all object files being combined. One file's definition of a global symbol will satisfy another file's undefined reference to the same global symbol.
  • STB_WEAK: Weak symbols resemble global symbols, but their definitions have lower precedence.

Read More

The dark side of RISC-V linker relaxation

Linker optimization/relaxation

Because the linker has a global view and layout information, it can perform some peephole optimizations which are difficult/impossible to do on the compiler side. Generic link-time code sequence transformation is risky, because semantic information is lost and what the linker sees are byte streams. However, if every instruction in the candidate code sequence is associated with one ore more relocations, the ABI and the implementation can assign (additional) semantics to the relocation types and make such transformation safe. This technique is usually called linker optimization or linker relaxation. It seems that the term "linker optimization" is often used when the number of bytes does not change while "linker relaxation" is used when the number of bytes decreases.

Read More