This article describes SHF_ALLOC|SHF_COMPRESSED
sections
in ELF and lld's linker option --compress-sections
to
compress arbitrary sections.
C++ standard library ABI compatibility
Updated in 2023-11.
For a user who only uses one C++ standard library, such as libc++, there are typically three compatibility goals, each with increasing compatibility requirements:
- Can the program, built with a specific version of libc++, work with an upgraded libc++ shared object (DSO)?
- Can an executable and its DSOs be compiled with different versions of libc++ headers?
- Can two relocatable object files, compiled with different versions of libc++ headers, be linked into the same executable or DSO?
If we replace "different libc++ versions" with a mixture of libc++ and libstdc++, we encounter additional goals:
- Can the program, built with a specific version of libstdc++, work with an upgraded libstdc++ DSO?
- Can an executable, built with libc++, link against DSOs that were built with libstdc++?
- Can two relocatable object files, compiled with libc++ and libstdc++, or two libstdc++ versions, be linked into the same executable or DSO?
Considering static linking raises another interesting question:
If libc++ is statically linked into b.so
, can it be used
with a.out
that links against a different version of
libc++? Let's focus on the first three questions, which specifically
pertain to libc++.
Port LLVM XRay to Apple systems
I do not use Apple products myself, but I sometimes delve into Mach-O due to my interest in object file formats. Additionally, my LLVM/Clang changes sometimes require some understanding of Mach-O. Occasionally, I need to understand the format to some extent to work around its quirks (the old format inherited many problems of "a.out").
Recently, there has been interest (from Oleksii Lozovskyi) in enabling XRay, a function call tracing system in LLVM, to work on Apple systems. Intrigued by this, I decided to delve into the details and investigate the necessary changes. XRay supports many 64-bit architectures on Linux and some BSDs. I became acquainted with XRay back in 2017 and made some casual contributions since then.
Relocation overflow and code models
When linking an oversized executable, it is possible to encounter
errors such as
relocation truncated to fit: R_X86_64_PC32 against `.text'
(GNU ld) or relocation R_X86_64_PC32 out of range
(ld.lld).
These diagnostics are a result of the relocation overflow check, a
feature in the linker.
1 | % gcc -fuse-ld=bfd @response.txt |
This article aims to explain why such issues can occur and provides insights on how to mitigate them.
Assemblers
This article provides a description of popular assemblers and their architecture-specific differences.
Assemblers
GCC generates assembly code and invokes GNU Assembler (also known as "gas"), which is part of GNU Binutils, to convert the assembly code into machine code. The GCC driver is also capable of accepting assembly input files. Due to GCC's widespread use, GNU Assembler is arguably the most popular assembler.
Within the LLVM project, the LLVM integrated assembler is a library that is linked by Clang, llvm-mc, and lld (for LTO purposes) to generate machine code. It supports a wide range of GNU Assembler syntax and can be used as a drop-in replacement for GNU Assembler.
On the Windows platform, the Microsoft Macro Assembler (MASM) is widely used.
On the IBM AIX platform, the AIX assembler is used. In 2019, IBM developers started to modify LLVM integrated assembler to support the AIX syntax.
On the IBM z/OS platform, the IBM High Level Assembler (HLASM) is used. In 2021, IBM developers started to modify LLVM integrated assembler to support the HLASM syntax.
Compiler output files
For a GCC or Clang command, there is typically one primary output
file, specified by -o
or the default (a.out
or
a.exe
). There can also be temporary files and auxiliary
files.
Linker notes on AArch32
UNDER CONSTRUCTION
This article describes target-specific details about AArch32 in ELF linkers. I described AArch64 in a previous article.
AArch32 is the 32-bit execution state for the Arm architecture and runs the A32 and T32 instruction sets. A32 refers to the old ISA with a 32-bit fixed width, while T32 refers to the mixed 16-bit and 32-bit Thumb2 instructions.
"AArch32", "A32", and "T32" are new names. Many projects use "ARM", "Arm", or "arm" as their port name.
ELF hash function may overflow
This article describes an interesting overflow bug in the ELF hash function.
The System V Application Binary Interface (generic ABI) specifies the
ELF object file format. When producing an executable or shared object
file needing a dynamic symbol table (.dynsym
), a linker
generates a .hash
section with type SHT_HASH
to hold a symbol
hash table. A DT_HASH
tag is produced to hold the
address of .hash
.
The hash table is used by a dynamic loader to perform symbol lookup
(for dynamic relocations and dlsym
family functions). A
detailed description of the format can be found in ELF: symbol lookup via
DT_HASH
.
lld 16 ELF changes
llvm-project 16 was just released. I added some lld/ELF notes to https://github.com/llvm/llvm-project/blob/release/16.x/lld/docs/ReleaseNotes.rst. Here I will elaborate on some changes.
Linker notes on AArch64
This article describes target-specific details about AArch64 in ELF linkers. AArch64 is the 64-bit execution state for the Arm architecture. The AArch64 execution state runs the A64 instruction set. The AArch32 and AArch64 execution states use very different instruction sets, so many pieces of software use two ports for the two execution states of the Arm architecture.
There were the "ARM architecture" and the "ARM instruction set", leading to many software projects using "ARM" or "arm" as their port names. In 2011, ARMv8 introduced two execution states, AArch32 and AArch64. The previous instruction sets "ARM" and "Thumb" were renamed to "A32" and "T32", respectively. In 2017, the architecture was renamed to the "Arm architecture" to reflect the rebranding of the company name. So, the "ARMv8-A" architecture profile is now named "Armv8-A".
For the AArch64 execution state, while many projects use "AArch64" as their port name, for legacy reasons, macOS, Windows, the Linux kernel, and some BSD operating systems unfortunately use "arm64". (Support for AArch64 was added to the Linux kernel in version 3.7. Initially, the patch set was named "aarch64", but it was later changed at the request of kernel developers.)