All about UndefinedBehaviorSanitizer

UndefinedBehaviorSanitizer (UBSan) is an undefined behavior detector for C/C++. It consists of code instrumentation and a runtime. Both components have multiple independent implementations.

Clang implemented the first few checks in 2009-12, initially named -fcatch-undefined-behavior. In 2012 -fsanitize=undefined was added and -fcatch-undefined-behavior was removed. GCC 4.9 implemented -fsanitize=undefined in 2013-08.

The runtime used by Clang lives in llvm-project/compiler-rt/lib/ubsan. GCC from time to time syncs its downstream fork of the sanitizers part of compiler-rt (libsanitizer). The end of the article lists some alternative runtime implementations.

Read More

All about sanitizer interceptors

Many sanitizers want to know every function in the program. User functions are instrumented and therefore known by the sanitizer runtime. For library functions, some (e.g. mmap, munmap, memory allocation/deallocation functions, longjmp, vfork) need special treatment. Sanitizers leverage symbol interposition to redirect such function calls to its own implementation: interceptors. Other library functions can be treated as normal user code. Either instrumenting the function or providing an interceptor is fine.

In some cases instrumenting is infeasible:

  • Assembly source files usually do not (or are inconvenient to) call sanitizer callbacks
  • Many libc implementations cannot be instrumented. When can glibc be built with Clang?
  • Some functions have performance issues if instrumented instead of intercepted (mostly mem* and str*)

And interceptors may be the practical choice.

This article talks about how interceptors work and the requirements of sanitizer interceptors.

Read More

2022年总结

一如既往,主要在工具链领域耕耘。给这些high-profile OSS贡献的时候,希望透过这个微小的角度改变世界。

Highlights

  • RELR relative relocation format (glibc, musl, DynamoRIO)
  • zstd compressed debug sections (binutils, gdb, clang, lld/ELF, lldb)
  • lld/ELF (huge performance improvement, RISC-V linker relaxation, SHT_RISCV_ATTRIBUTES)
  • Clang built glibc (get the ball rolling)
  • Make protected symbols work in binutils/glibc
  • Involved in sanitizers, ThinLTO, AArch64/x86 hardening features, AArch64 Memtag ABI, RISC-V psABI, etc

Read More

kth element in a subarray

本文总结经典的区间第k小值数据结构题。 给定一个长为n的数组,元素为范围为[0,σ)的整数。有m个询问:求区间[l,r)中第k小的元素。

一些方法支持扩展问题:有m个操作,或者修改某个位置上的元素,或者询问区间[l,r)中第k小的元素。

Read More

Control-flow integrity

Updated in 2023-05.

A control-flow graph (CFG) is a graph representation of all paths that might be traversed through a program during its execution. Control-flow integrity (CFI) refers to security policy dictating that program execution must follow a control-flow graph. This article describes some features that compilers and hardware can use to enforce CFI, with a focus on llvm-project implementations.

CFI schemes are typically divided into forward-edge (e.g. indirect calls) and backward-edge (mainly function returns). It should be noted that exception handling and symbol interposition are not included in these categories, as far as my understanding goes.

Read More

Relocatable linking

Updated in 2023-09.

In GNU ld, -r produces a relocatable object file. This is known as relocatable linking or partial linking. This mode suppresses many passes done for an executable or shared object output (in -no-pie/-pie/-shared modes). -r, -no-pie, -pie, and -shared specify 4 different modes. The 4 options are mutually exclusive.

The relocatable output can be used for analysis and binary manipulation. Then, the output can be used to link the final executable or shared object.

1
2
3
4
5
6
clang -pie a.o b.o

# ==>

clang -r a.o b.o -o r.o
clang -pie r.o

Let's go through various linker passes and see how relocatable linking changes the operation.

Read More

_FORTIFY_SOURCE

glibc 2.3.4 introduced _FORTIFY_SOURCE in 2004 to catch security errors due to misuse of some C library functions. The initially supported functions were fprintf, gets, memcpy, memmove, mempcpy, memset, printf, snprintf, sprintf, stpcpy, strcat, strcpy, strncat, strncpy, vfprintf, vprintf, vsnprintf, vsprintf and focused on buffer overflow detection and dangerous printf %n uses. The implementation leverages inline functions and __builtin_object_size (see [PATCH] Object size checking to prevent (some) buffer overflows). More functions were added over time and __builtin_constant_p was used as well. As of 2022-11 glibc defines 79 default version *_chk functions.

Read More

lld linked musl on PowerPC64

I was asked about a segfault related to lld linked musl libc.so on PowerPC64.

  • /usr/lib/ld-musl-powerpc64le.so.1 /path/to/thing worked. The kernel ELF loader loads rtld and rtld loads the executable.
  • /path/to/thing segfaulted. The kernel ELF loader loads both rtld and the executable.

Therefore the bug is likely due to a difference between the two modes.

Read More

Distribution of debug information

Note: The article will likely get frequent updates in the next few days.

This article describes some approaches to distribute debug information. Commands below will use two simple C files for demonstration.

1
2
3
4
5
6
7
8
cat > a.c <<eof
void foo(int);
int main() { foo(42); }
eof
cat > b.c <<eof
#include <stdio.h>
void foo(int x) { printf("%d\n", x); }
eof

Read More