2023-08-25

Clang -Wunused-command-line-argument

clangDriver is the library implementing the compiler driver for Clang. It utilitizes LLVMOption to process command line options. As options are processed when required, as opposed to use a large switch, Clang gets the ability to detect unused options straightforwardly.

2023-08-20

My involvement with LLVM 17

LLVM 17 will soon be relased. This post provides a summary of my contributions in this release cycle to record my learning progress.

2023-07-30

lld 17 ELF changes

LLVM 17 will be released. As usual, I maintain lld/ELF and have added some notes to https://github.com/llvm/llvm-project/blob/release/17.x/lld/docs/ReleaseNotes.rst. Here I will elaborate on some changes.

2023-07-16

Precompiled headers

C/C++ projects can benefit from using precompiled headers to improve compile time. GCC added support for precompiled headers in 2003 (version 3.4), and the current documentation can be found at https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html.

Even with the emergence of C++ modules, precompiled headers remain relevant for several reasons:

Precompiled headers share implementation aspects with modules (e.g., AST serialization in Clang).
Many C++ projects rely on the traditional compilation model and are not converted to C++ modules.
Modules may possibly use some preamble-like technology to accelerate IDE-centric operations.
C doesn't have C++ modules.

This article focuses on Clang precompiled headers (PCH). Let's begin with an example.

2023-07-07

Compressed arbitrary sections

This article describes SHF_ALLOC|SHF_COMPRESSED sections in ELF and lld's linker option --compress-sections to compress arbitrary sections.

2023-06-25

C++ standard library ABI compatibility

Updated in 2023-11.

For a user who only uses one C++ standard library, such as libc++, there are typically three compatibility goals, each with increasing compatibility requirements:

Can the program, built with a specific version of libc++, work with an upgraded libc++ shared object (DSO)?
Can an executable and its DSOs be compiled with different versions of libc++ headers?
Can two relocatable object files, compiled with different versions of libc++ headers, be linked into the same executable or DSO?

If we replace "different libc++ versions" with a mixture of libc++ and libstdc++, we encounter additional goals:

Can the program, built with a specific version of libstdc++, work with an upgraded libstdc++ DSO?
Can an executable, built with libc++, link against DSOs that were built with libstdc++?
Can two relocatable object files, compiled with libc++ and libstdc++, or two libstdc++ versions, be linked into the same executable or DSO?

Considering static linking raises another interesting question:

If libc++ is statically linked into b.so, can it be used with a.out that links against a different version of libc++? Let's focus on the first three questions, which specifically pertain to libc++.

2023-06-18

Port LLVM XRay to Apple systems

I do not use Apple products myself, but I sometimes delve into Mach-O due to my interest in object file formats. Additionally, my LLVM/Clang changes sometimes require some understanding of Mach-O. Occasionally, I need to understand the format to some extent to work around its quirks (the old format inherited many problems of "a.out").

Recently, there has been interest (from Oleksii Lozovskyi) in enabling XRay, a function call tracing system in LLVM, to work on Apple systems. Intrigued by this, I decided to delve into the details and investigate the necessary changes. XRay supports many 64-bit architectures on Linux and some BSDs. I became acquainted with XRay back in 2017 and made some casual contributions since then.

2023-05-14

Relocation overflow and code models

Updated in 2025-12.

When linking an oversized executable, it is possible to encounter errors such as relocation truncated to fit: R_X86_64_PC32 against `.text' (GNU ld) or relocation R_X86_64_PC32 out of range (ld.lld). These diagnostics are a result of the relocation overflow check, a feature in the linker.

% gcc -fuse-ld=bfd @response.txt
...
a.o: in function `_start':
(.text+0x0): relocation truncated to fit: R_X86_64_PC32 against `.text'
% gcc -fuse-ld=lld @response.txt
ld.lld: error: a.o:(.text+0x0): relocation R_X86_64_PC32 out of range: -2147483649 is not in [-2147483648, 2147483647]; references section '.text'

This article aims to explain why such issues can occur and provides insights on how to mitigate them.

2023-05-08

Assemblers

Updated in 2025-05.

This article provides a description of popular assemblers and their architecture-specific differences.

Assemblers

GCC generates assembly code and invokes GNU Assembler (also known as "gas"), which is part of GNU Binutils, to convert the assembly code into machine code. The GCC driver is also capable of accepting assembly input files. Due to GCC's widespread use, GNU Assembler is arguably the most popular assembler.

Within the LLVM project, the LLVM integrated assembler is a library that is linked by Clang, llvm-mc, and lld (for LTO purposes) to generate machine code. It supports a wide range of GNU Assembler syntax and can be used as a drop-in replacement for GNU Assembler.

On the Windows platform, the Microsoft Macro Assembler (MASM) is widely used.

On the IBM AIX platform, the AIX assembler is used. In 2019, IBM developers started to modify LLVM integrated assembler to support the AIX syntax.

On the IBM z/OS platform, the IBM High Level Assembler (HLASM) is used. In 2021, IBM developers started to modify LLVM integrated assembler to support the HLASM syntax.

Concepts

Sections are named, contiguous blocks of code or data within an object file. They allow you to logically group related parts of your program. The assembler places code and data into these sections as it processes the source file.

Symbols are names that represent memory addresses or values.

2023-04-25

Compiler output files

For a GCC or Clang command, there is typically one primary output file, specified by -o or the default (a.out or a.exe). There can also be temporary files and auxiliary files.