# Control-flow integrity

A control-flow graph (CFG) is a graph representation of all paths that might be traversed through a program during its execution. Control-flow integrity (CFI) refers to security policy dictating that program execution must follow a control-flow graph. This article describes some features that compilers and hardware can use to enforce CFI, with a focus on llvm-project implementations.

CFI schemes are typically divided into forward-edge (e.g. indirect calls) and backward-edge (mainly function returns). It should be noted that exception handling and symbol interposition are not included in these categories, as far as my understanding goes.

In GNU ld, -r produces a relocatable object file. This is known as relocatable linking or partial linking. This mode suppresses many passes done for an executable or shared object output (in -no-pie/-pie/-shared modes). -r, -no-pie, -pie, and -shared specify 4 different modes. The 4 options are mutually exclusive.

The relocatable output can be used for analysis and binary manipulation. Then, the output can be used to link the final executable or shared object.

Let's go through various linker passes and see how relocatable linking changes the operation.

# ODR violation detection

This article describes how to detect C++ One Definition Rule (ODR) violations. There are many good resources on the Internet about how ODR violations can introduce subtle bugs, so I will not repeat that here.

# _FORTIFY_SOURCE

glibc 2.3.4 introduced _FORTIFY_SOURCE in 2004 to catch security errors due to misuse of some C library functions. The initially supported functions were fprintf, gets, memcpy, memmove, mempcpy, memset, printf, snprintf, sprintf, stpcpy, strcat, strcpy, strncat, strncpy, vfprintf, vprintf, vsnprintf, vsprintf and focused on buffer overflow detection and dangerous printf %n uses. The implementation leverages inline functions and __builtin_object_size (see [PATCH] Object size checking to prevent (some) buffer overflows). More functions were added over time and __builtin_constant_p was used as well. As of 2022-11 glibc defines 79 default version *_chk functions.

# lld linked musl on PowerPC64

• /usr/lib/ld-musl-powerpc64le.so.1 /path/to/thing worked. The kernel ELF loader loads rtld and rtld loads the executable.
• /path/to/thing segfaulted. The kernel ELF loader loads both rtld and the executable.

Therefore the bug is likely due to a difference between the two modes.

# Distribution of debug information

Note: The article will likely get frequent updates in the next few days.

This article describes some approaches to distribute debug information. Commands below will use two simple C files for demonstration.

# C minifier with Clang

I recently revamped Competitive programming in Nim. In short, I can create a C amalgamation from a Nim program and submit the C source code to various competitive programming websites.

Then I use a Clang based tool to shorten the C source code. It does two things:

• Shorten function, variables, and type names
• Use the clangFormat library to remove some whitespace

For the first step, the tool uses a derived ASTFrontendAction to traverse the AST twice, one for collecting function/var/type names and the other for renaming. Building clang::CompilerInstance from command lines needs some boilerplate. An alternative is to use clang::tooling::CommonOptionsParser and clang::tooling::ClangTool.

CMakeLists.txt

Define LLVM as the llvm-project repository and LLVMOUT as the build directory (make sure you have at least built these targets: ninja clang clangFormat clangIndex clangTooling).

If LLVM and Clang's CMake, library, and header files are installed in well-known locations, then -DCMAKE_PREFIX_PATH can be omitted.

It's certainly not straightforward to find all these APIs. I mainly use ccls as a reference which was inspired by clangIndex. For writing this tool, I read a bit code of clang-rename, clang-format, and C-Reduce clang_delta. C-Reduce provides clang_delta/RenameFun.cpp and two other passes (RenameVar, RenameParam) which do similar stuff. Its code was a bit old now as it was written based on a Clang in circa 2012.

Let's see an example. Unfortunately I don't find clangFormat options removing whitespace after = and ,. That can perhaps be done by a post-processing string substitution tool without introducing too much risk.

# Layering check with Clang

Updated in 2023-01.

This article describes some Clang header modules features which apply to #include. They enforce a more explicit dependency graph. Strict dependency information provides documentation purposes and makes refactoring convenient. Include What You Use describes the benefits of clean header inclusions well, so I will not repeat it here.

When C++20 modules are used, the features apply to #include in a global module fragment (module;) but have no effect for import declarations.

## Layering check

### -fmodules-decluse

For a #include directive, this option emits an error if the following conditions are satisfied (see clang/lib/Lex/ModuleMap.cpp diagnoseHeaderInclusion):

• The main file is within a module (called "source module", say, A).
• The main file or an included file from the source module includes a file from another module B.
• A does not have a use-declaration of B (no use B).

For the first condition, -fmodule-map-file= is needed to load the source module map and -fmodule-name=A is needed to indicate that the source file is logically part of module A.

For the second condition, the module map defining B must be loaded by specifying -fimplicit-module-maps (implied by -fmodules and -fcxx-modules) or a -fmodule-map-file=.

# zstd compressed debug sections

Updated in 2022-10.

In January I wrote Compressed debug sections. The venerable zlib shows its age and there are replacements which are better in every metric except adoption and a larger memory footprint. The obvious choice was Zstandard, but I was not so confident about adoptinig it and solving the ecosystem issue. At any rate, I slowly removed some legacy .zdebug support from llvm-project so that a new format could be more easily introduced.