Unwinding through a signal handler

This post has some notes about unwinding through a signal handler. You may want to read Stack unwinding first.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// a.c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <libunwind.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

static void handler(int signo) {
unw_context_t context;
unw_cursor_t cursor;
unw_getcontext(&context);
unw_init_local(&cursor, &context);
unw_word_t pc, sp;
do {
unw_get_reg(&cursor, UNW_REG_IP, &pc);
unw_get_reg(&cursor, UNW_REG_SP, &sp);
printf("pc=0x%016zx sp=0x%016zx", (size_t)pc, (size_t)sp);
Dl_info info = {};
if (dladdr((void *)pc, &info))
printf(" %s:%s", info.dli_fname, info.dli_sname ? info.dli_sname : "");
puts("");
} while (unw_step(&cursor) > 0);
exit(0);
}

int main() {
signal(SIGUSR1, handler);
raise(SIGUSR1);
return 1;
}

(printf and dladdr are not required to be async-signal-safe functions, but here we apparently know using them can't cause problems.)

Tips: we can additionally add the following code block to get memory mappings.

1
2
3
4
5
char buf[128];
FILE *f = fopen("/proc/self/maps", "r");
while (fgets(buf, sizeof buf, f))
printf("%s", buf);
fclose(f);

Build the program with either llvm-project libunwind or nongnu libunwind:

1
2
3
4
5
6
# ninja -C /tmp/Debug unwind builtins
clang -g -I llvm-project/libunwind/include a.c -no-pie --unwindlib=libunwind --rtlib=compiler-rt -ldl -Wl,-E,-rpath,/tmp/Debug/lib/x86_64-unknown-linux-gnu -o llvm

# autoreconf -i; mkdir -p out/debug; ../../configure CFLAGS='-O0 -g' CXXFLAGS='-O0 -g'; make -j 20
libunwind=/tmp/p/libunwind
clang -g -I $libunwind/include -I $libunwind/out/debug/include a.c -no-pie $libunwind/out/debug/src/.libs/libunwind.a $libunwind/out/debug/src/.libs/libunwind-x86_64.a -llzma -ldl -Wl,-E -o nongnu
(Some targets default to -fno-asynchronous-unwind-tables. In the absence of C++ exceptions, we need at least -funwind-tables.)

glibc x86-64

With either implementation, the output looks like the following on Linux glibc x86-64. I annotated the lines with location information.

1
2
3
4
5
6
pc=0x0000000000206d2a sp=0x00007fffd366bce0 ./nongnu: # in handler, the instruction after call unw_getcontext
pc=0x00007f5962cb0920 sp=0x00007fffd366c500 /lib/x86_64-linux-gnu/libc.so.6: # __restore_rt
pc=0x00007f5962cb08a1 sp=0x00007fffd366d200 /lib/x86_64-linux-gnu/libc.so.6:gsignal # raise
pc=0x0000000000206cfd sp=0x00007fffd366d320 ./nongnu:main
pc=0x00007f5962c9b7fd sp=0x00007fffd366d340 /lib/x86_64-linux-gnu/libc.so.6:__libc_start_main
pc=0x0000000000206bba sp=0x00007fffd366d410 ./nongnu:_start # from crt1.o

__restore_rt is a signal trampoline defined in glibc sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:

1
2
3
4
5
  nop
.align 16
__restore_rt:
movq $15, %rax # __NR_rt_sigreturn
syscall
(Newer ports use VDSO.)

glibc's sigaction sets the sa_restorer field of sigaction to __restore_rt, and sets the SA_RESTORER. The kernel sets up the __restore_rt frame with saved process context information (ucontext_t structure) before jumping to the signal handler. See kernel arch/x86/kernel/signal.c:setup_rt_frame. Upon returning from the signal handler, control passes to __restore_rt. See man 2 sigreturn.

__restore_rt is implemented in assembly. It comes with DWARF call frame information in .eh_frame.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
% llvm-dwarfdump -eh-frame /lib/x86_64-linux-gnu/libc.so.6
...
00002458 00000010 00000000 CIE
Format: DWARF32
Version: 1
Augmentation: "zRS"
Code alignment factor: 1
Data alignment factor: -8
Return address column: 16
Augmentation data: 1B

DW_CFA_nop:
DW_CFA_nop:


0000246c 00000078 00000018 FDE cie=00002458 pc=0003c91f...0003c929
Format: DWARF32
DW_CFA_def_cfa_expression: DW_OP_breg7 RSP+160, DW_OP_deref
DW_CFA_expression: R8 DW_OP_breg7 RSP+40
DW_CFA_expression: R9 DW_OP_breg7 RSP+48
DW_CFA_expression: R10 DW_OP_breg7 RSP+56
DW_CFA_expression: R11 DW_OP_breg7 RSP+64
DW_CFA_expression: R12 DW_OP_breg7 RSP+72
DW_CFA_expression: R13 DW_OP_breg7 RSP+80
DW_CFA_expression: R14 DW_OP_breg7 RSP+88
DW_CFA_expression: R15 DW_OP_breg7 RSP+96
DW_CFA_expression: RDI DW_OP_breg7 RSP+104
DW_CFA_expression: RSI DW_OP_breg7 RSP+112
DW_CFA_expression: RBP DW_OP_breg7 RSP+120
DW_CFA_expression: RBX DW_OP_breg7 RSP+128
DW_CFA_expression: RDX DW_OP_breg7 RSP+136
DW_CFA_expression: RAX DW_OP_breg7 RSP+144
DW_CFA_expression: RCX DW_OP_breg7 RSP+152
DW_CFA_expression: RSP DW_OP_breg7 RSP+160
DW_CFA_expression: RIP DW_OP_breg7 RSP+168
DW_CFA_nop:
DW_CFA_nop:

0x3c91f: CFA=DW_OP_breg7 RSP+160, DW_OP_deref: RAX=[DW_OP_breg7 RSP+144], RDX=[DW_OP_breg7 RSP+136], RCX=[DW_OP_breg7 RSP+152], RBX=[DW_OP_breg7 RSP+128], RSI=[DW_OP_breg7 RSP+112], RDI=[DW_OP_breg7 RSP+104], RBP=[DW_OP_breg7 RSP+120], RSP=[DW_OP_breg7 RSP+160], R8=[DW_OP_breg7 RSP+40], R9=[DW_OP_breg7 RSP+48], R10=[DW_OP_breg7 RSP+56], R11=[DW_OP_breg7 RSP+64], R12=[DW_OP_breg7 RSP+72], R13=[DW_OP_breg7 RSP+80], R14=[DW_OP_breg7 RSP+88], R15=[DW_OP_breg7 RSP+96], RIP=[DW_OP_breg7 RSP+168]
...

The DW_OP_breg7 RSP offsets correspond to the ucontext_t offsets of these registers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
% cat sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c
...
do_cfa_expr \
do_expr (8 /* r8 */, oR8) \
do_expr (9 /* r9 */, oR9) \
do_expr (10 /* r10 */, oR10) \
% cat sysdeps/unix/sysv/linux/x86_64/ucontext_i.sym
...
#define ucontext(member) offsetof (ucontext_t, member)
#define mcontext(member) ucontext (uc_mcontext.member)
#define mreg(reg) mcontext (gregs[REG_##reg])

oRBP mreg (RBP)
oRSP mreg (RSP)
oRBX mreg (RBX)

With the information, libunwind can unwind through the trampoline without knowing the ucontext_t structure. Note that all general purpose registers are encoded. libunwind/docs/unw_get_reg.man says

However, for signal frames (see unw_is_signal_frame(3)), it is usually possible to access all registers.

Volatile registers are also saved in the saved process context information. This is different from other frames where volatile registers' information is typically lost.

glibc AArch64

The output looks like:

1
2
3
4
5
6
7
pc=0x0000000000214b10 sp=0x0000ffffe81f6050 ./nongnu: # handler
pc=0x0000ffffa55cd5b0 sp=0x0000ffffe81f6c70 linux-vdso.so.1:__kernel_rt_sigreturn
pc=0x0000ffffa5438070 sp=0x0000ffffe81f7ed0 /lib/aarch64-linux-gnu/libc.so.6:gsignal
pc=0x0000000000214bfc sp=0x0000ffffe81f8000 ./nongnu:main
pc=0x0000ffffa5425090 sp=0x0000ffffe81f8010 /lib/aarch64-linux-gnu/libc.so.6:__libc_start_main
pc=0x00000000002149cc sp=0x0000ffffe81f8160 ./nongnu: # _start
pc=0x00000000002149cc sp=0x0000ffffe81f8160 ./nongnu: # _start

As a relatively new port, Linux AArch64 defines the signal trampoline __kernel_rt_sigreturn in the VDSO (see arch/arm64/kernel/vdso/sigreturn.S). This is unlike x86-64 which defines the function in libc. We can use gdb to dump the VDSO.

1
2
3
4
5
(gdb) i proc m
process 430749
...
0xfffff7ffc000 0xfffff7ffd000 0x1000 0x0 [vdso]
(gdb) dump binary memory vdso.so 0xfffff7ffc000 0xfffff7ffd000
1
2
3
4
5
6
  nop

.globl __kernel_rt_sigreturn
__kernel_rt_sigreturn:
mov x8, #__NR_rt_sigreturn // 0xad
svc #0x0

As of Linux 5.8 (https://git.kernel.org/linus/87676cfca14171fc4c99d96ae2f3e87780488ac4), vdso.so does not have PT_GNU_EH_FRAME. Therefore unwinders (llvm-project libunwind, nongnu libunwind, libgcc_s.so.1) ignore its unwind tables. In gdb, gdb/aarch64-linux-tdep.c recognizes the two instructions and encodes how the kernel sets up the ucontext_t structure.

Previously, vdso.so generated a small set of CFI instructions to encode X29 (FP) and X30 (LR).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
% llvm-dwarfdump -eh-frame vdso.so
000000c0 0000001c 00000000 CIE
Format: DWARF32
Version: 1
Augmentation: "zRS"
Code alignment factor: 4
Data alignment factor: -8
Return address column: 30
Augmentation data: 1B

DW_CFA_def_cfa: WSP +0
DW_CFA_def_cfa: W29 +0
DW_CFA_offset: W29 0
DW_CFA_offset_extended_sf: W30 8
DW_CFA_nop:
DW_CFA_nop:
DW_CFA_nop:

CFA=W29: W29=[CFA], W30=[CFA+8]

000000e0 00000010 00000024 FDE cie=000000c0 pc=000005b0...000005b8
Format: DWARF32
DW_CFA_nop:
DW_CFA_nop:
DW_CFA_nop:

0x5b0: CFA=W29: W29=[CFA], W30=[CFA+8]

However, there was a serious problem: CFI cannot describe a signal trampoline frame. AArch64 does not define a register number for PC and provides no direct way to encode the PC of the previous frame. Instead, it sets return_address_register to X30 and the unwinder updates the PC to whatever value the saved X30 is. Actually, with unw_get_reg(&cursor, UNW_REG_IP, &pc); unw_get_reg(&cursor, UNW_AARCH64_X30, &x30);, we know pc == x30. This approach works fine when LR forms a chain since we know between two adjacent frames, the sets {PC, X30} differ by one element. However, when unwinding through the signal trampoline, the CFI can describe the previous PC but not the previous X30.

musl x86-64

src/signal/x86_64/restore.s implements a signal trampoline __restore_rt. There is no .eh_frame information.

nongnu libunwind does not know that __restore_rt is a signal trampoline (unw_is_signal_frame always returns 0). On ELF targets, -O1 and above typically imply -fomit-frame-pointer and many functions do not save RBP. Note: some functions may save RBP even with -fomit-frame-pointer.

In the absence of a valid frame chain, combined with the fact that nongnu libunwind does not recognize Linux x86-64's signal trampoline, libunwind cannot unwind through the __restore_rt frame. gdb recognizes the signal trampoline frame and with its FP-based unwinding it can retrieve several frames, but not the ones above raise.

1
2
3
4
5
6
7
8
9
10
11
% ld.lld @response.release.txt && ./nongnu
pc=0x0000000000206add sp=0x00007ffc018618a0 0 ./nongnu:
pc=0x00007f9fedcd602f sp=0x00007ffc018620c0 0 /home/ray/musl/out/release/lib/libc.so:
pc=0x0000000000000000 sp=0x00007ffc01862db0 0
% gdb ./nongnu -x =(printf 'b handler\nhandle SIGUSR1 nostop\nr\nbt')
...
#0 handler (signo=10) at a.c:9
#1 <signal handler called>
#2 0x00007ffff7fae78a in __restore_sigs () from /home/ray/musl/out/release/lib/libc.so
#3 0x00007ffff7fae8f1 in raise () from /home/ray/musl/out/release/lib/libc.so
#4 0x0000000000000000 in ?? ()

If musl is built with -fno-omit-frame-pointer, nongnu libunwind will use its FP-based fallback (see src/x86_64/Gstep.c). The output looks like:

1
2
3
4
5
pc=0x0000000000206ada sp=0x00007fffd51b1830 0 ./nongnu:
pc=0x00007f0f09352858 sp=0x00007fffd51b2040 0 /home/ray/musl/out/release-fp/lib/libc.so:__setjmp
pc=0x0000000000206aaa sp=0x00007fffd51b2db0 0 ./nongnu:main
pc=0x00007f0f092f88ec sp=0x00007fffd51b2dd0 0 /home/ray/musl/out/release-fp/lib/libc.so:
pc=0x00000000002069d6 sp=0x00007fffd51b2e00 0 ./nongnu:_start

unw_step uses the saved RBP to infer RSP/RBP/RIP in the previous frame. If the signal handler saves RBP and calls unw_step, the saved RBP is essentially the RBP value in the signal trampoline frame.

1
2
3
rbp_loc = DWARF_LOC(rbp, 0);
rsp_loc = DWARF_VAL_LOC(c, rbp + 16);
rip_loc = DWARF_LOC (rbp + 8, 0);

Actually, not every source file needs to be built with -fno-omit-frame-pointer. We just need to build the source files that transfer control to the user program, and their callers. For this example, building src/signal/raise.c with -fno-omit-frame-pointer allows us to unwind to main. Additionally rebuilding src/env/__libc_start_main.c allows us to unwind to _start.

musl's Makefile specifies -fno-asynchronous-unwind-tables (see option to enable eh_frame for a 2011 discussion). If CFLAGS -g is specified, libc.so will have .debug_frame. gdb can retrieve the caller of raise:

1
2
3
4
5
#0  handler (signo=10) at a.c:9
#1 <signal handler called>
#2 __restore_sigs (set=set@entry=0x7fffffffe240) at ../../arch/x86_64/syscall_arch.h:40
#3 0x00007ffff7fa36e0 in raise (sig=sig@entry=10) at ../../src/signal/raise.c:11
#4 0x00000000002071ff in main () at a.c:33

nongnu libunwind can be built with --enable-debug-frame to support .debug_frame. Unfortunately, since it does not recognize the signal trampoline, it cannot retrieve the main frame for this example.

Unwinders' compatibility with libc implementations

The values represent how the unwinder unwinds through the signal trampoline frame.

Linux glibc Linux musl
nongnu libunwind AArch64 recognizes signal trampoline in VDSO not tested
nongnu libunwind x86-64 .eh_frame in libc.so.6 unwindable if FP is enabled
gdb AArch64 recognizes signal trampoline in VDSO not tested
gdb x86-64 recognizes signal trampoline recognizes signal trampoline

Links to signal trampoline frame related code

  • gcc libgcc/config/aarch64/linux-unwind.h:aarch64_fallback_frame_state
  • gdb gdb/aarch64-linux-tdep.c:aarch64_linux_rt_sigframe, gdb/amd64-linux-tdep.c:amd64_linux_sigtramp_start
  • llvm-project libunwind https://reviews.llvm.org/D90898
  • Linux kernel arch/x86/kernel/signal.c:setup_rt_frame

Core dump

The kernel core dumper coredump.c is simple. The glibc __restore_rt page or the VDSO is not prioritized in the presence of a core file limit. If the page is missing in the core file, gdb prog core -ex bt -batch will not be able to unwind past the signal trampoline. A userspace core dumper may be handy.

Archives and --start-lib

.a archives

Unix-like systems represent static libraries as .a archives. A .a archive consists of a header and a collection of files with metadata. Its usage is tightly coupled with the linker. An archive almost always contains only relocatable object files and the linker has built-in support for reading it.

1
2
3
4
% as /dev/null -o a.o
% rm -f b.a && ar rc b.a a.o
% ar t b.a
a.o

One may add other types of files to .a but that is almost assuredly a bad thing.

1
2
3
4
5
6
% rm -f a.a && ar rc a.a a.o b.a  # archive in archive, bad
% ar t a.a
a.o
b.a
% echo hello > a.txt
% rm -f a.a && ar rc a.a a.o a.txt # text file in archive, bad

The original linker designers noticed that for many programs not every member was needed, so they tried to allow the linker to skip unused members. Therefore, they invented the interesting but confusing archive member extraction rule. See Symbol processing#Archive processing for details.

Read More

Why isn't ld.lld faster?

LLD is the LLVM linker. Its ELF port is typically installed as ld.lld. This article makes an in-depth analysis of ld.lld's performance. The topic has been in my mind for a while. Recently Rui Ueyama released mold 1.0 and people wonder why with multi-threading its ELF port is faster than ld.lld. So I finally completed the article.

First of all, I am very glad that Rui Ueyama started mold. Our world has a plethora of compilers, but not many people learn or write linkers. As its design documentation says, there are many drastically different designs which haven't been explored. In my view, mold is innovative in that it introduced parallel symbol table initialization, symbol resolution, and relocation scan which to my knowledge hadn't been implemented before, and showed us amazing results. The innovation gives existing and future linkers incentive to optimize further.

Read More

.init, .ctors, and .init_array

In C++, dynamic initializations for non-local variables happen before the first statement of the main function. All (most?) implementations just ensure such dynamic initializations happen before main.

As an extension, GCC supports __attribute__((constructor)) which can make an arbitrary function run before main. A constructor function can have an optional priority (__attribute__((constructor(N)))).

Read More

Relative relocations and RELR

Updated in 2022-04.

(In celebration of my 2800th llvm-project commit) Happy Halloween!

This article describes how improving the representation of ELF relative relocations can greatly decrease file sizes. My /usr/bin executables can be 8% smaller and my /usr/lib/lib*.so* shared objects can be 4.8% with the optimization.

An ELF linker performs the following steps to process an absolute relocation type whose width equals the word size (e.g. R_AARCH64_ABS64, R_X86_64_64).

1
2
3
4
5
6
7
8
9
10
11
12
if (undefined_weak || (!preemptible && (no_pie || is_shn_abs)))
link-time constant
else if (SHF_WRITE || znotext) {
if (preemptible)
emit a symbolic relocation (e.g. R_X86_64_64)
else
emit a relative relocation (e.g. R_X86_64_RELATIVE)
} else if (!shared && (copy_relocation || canonical_plt_entry)) {
...
} else {
error
}

Read More