For a GCC or Clang command, there is typically one primary output
file, specified by -o
or the default (a.out
or
a.exe
). There can also be temporary files and auxiliary
files.
Linker notes on AArch32
UNDER CONSTRUCTION
This article describes target-specific details about AArch32 in ELF linkers. I described AArch64 in a previous article.
AArch32 is the 32-bit execution state for the Arm architecture and runs the A32 and T32 instruction sets. A32 refers to the old ISA with a 32-bit fixed width, while T32 refers to the mixed 16-bit and 32-bit Thumb2 instructions.
"AArch32", "A32", and "T32" are new names. Many projects use "ARM", "Arm", or "arm" as their port name.
ELF hash function may overflow
This article describes an interesting overflow bug in the ELF hash function.
The System V Application Binary Interface (generic ABI) specifies the
ELF object file format. When producing an executable or shared object
file needing a dynamic symbol table (.dynsym
), a linker
generates a .hash
section with type SHT_HASH
to hold a symbol
hash table. A DT_HASH
tag is produced to hold the
address of .hash
.
The hash table is used by a dynamic loader to perform symbol lookup
(for dynamic relocations and dlsym
family functions). A
detailed description of the format can be found in ELF: symbol lookup via
DT_HASH
.
lld 16 ELF changes
llvm-project 16 was just released. I added some lld/ELF notes to https://github.com/llvm/llvm-project/blob/release/16.x/lld/docs/ReleaseNotes.rst. Here I will elaborate on some changes.
Linker notes on AArch64
This article describes target-specific details about AArch64 in ELF linkers. AArch64 is the 64-bit execution state for the Arm architecture. The AArch64 execution state runs the A64 instruction set. The AArch32 and AArch64 execution states use very different instruction sets, so many pieces of software use two ports for the two execution states of the Arm architecture.
There were the "ARM architecture" and the "ARM instruction set", leading to many software projects using "ARM" or "arm" as their port names. In 2011, ARMv8 introduced two execution states, AArch32 and AArch64. The previous instruction sets "ARM" and "Thumb" were renamed to "A32" and "T32", respectively. In 2017, the architecture was renamed to the "Arm architecture" to reflect the rebranding of the company name. So, the "ARMv8-A" architecture profile is now named "Armv8-A".
For the AArch64 execution state, while many projects use "AArch64" as their port name, for legacy reasons, macOS, Windows, the Linux kernel, and some BSD operating systems unfortunately use "arm64". (Support for AArch64 was added to the Linux kernel in version 3.7. Initially, the patch set was named "aarch64", but it was later changed at the request of kernel developers.)
Linker notes on Power ISA
This article describes target-specific details about Power ISA in ELF linkers. Initially there was IBM POWER. The 1991 Apple–IBM–Motorola alliance created PowerPC. In 2006, the architecture was rebranded as Power ISA. According to the ISA manual, "In 2006, Freescale and IBM collaborated on the creation of the Power ISA Version 2.03, which represented the reunification of the architecture by combining Book E content with the more general purpose PowerPC Version 2.02."
The terms "PowerPC" and "powerpc" remain popular in numerous places,
including the powerpc-*-*-*
and
powerpc64-*-*-*
in official target triple names. The
abbreviation "PPC" ("ppc") is used in numerous places as well. For
simplicity, I will refer to the 32-bit architecture as "PPC32" and the
64-bit architecture as "PPC64".
We will see how the lack of PC-relative addressing before Power10 has caused great complexity to the ABI and linkers.
Linker notes on x86
Updated in 2024-01.
This article describes target-specific details about x86 in ELF linkers. I will use "x86" to refer to both x86-32 and x86-64.
All about LeakSanitizer
Clang and GCC 4.9
implemented LeakSanitizer in 2013. LeakSanitizer
(LSan) is a memory leak detector. It intercepts memory allocation
functions and by default detects memory leaks at atexit
time. The implementation is purely in the runtime
(compiler-rt/lib/lsan
) and no instrumentation is
needed.
LSan has very little architecture-specific code and supports many 64-bit targets. Some 32-bit targets (e.g. Linux arm/x86-32) are supported as well, but there may be high false negatives because pointers with fewer bits are more easily confused with integers/floating points/other data of a similar pattern. Every supported operating system needs to provide some way to "stop the world".
Function multi-versioning
Updated in 2023-12.
GCC supports some function attributes for function multi-versioning: a way for a function to have multiple implementations, each using a different set of ISA extensions. A function attribute specifies different requirements of ISA extensions. The generated program decodes the CPU model and features at run-time, and picks the most restrictive implementation that is satisfied by the CPU, assuming that the most restrictive implementation has the best performance.
All about UndefinedBehaviorSanitizer
UndefinedBehaviorSanitizer (UBSan) is an undefined behavior detector for C/C++. It consists of code instrumentation and a runtime. Both components have multiple independent implementations.
Clang implemented
the first few checks in 2009-12, initially named
-fcatch-undefined-behavior
. In 2012
-fsanitize=undefined
was added and
-fcatch-undefined-behavior
was removed.
GCC 4.9 implemented
-fsanitize=undefined
in 2013-08.
The runtime used by Clang lives in
llvm-project/compiler-rt/lib/ubsan
. GCC from time to time
syncs its downstream fork of the sanitizers part of compiler-rt
(libsanitizer
). The end of the article lists some
alternative runtime implementations.