zstd compressed debug sections

In January I wrote Compressed debug sections. The venerable zlib shows its age and there are replacements which are better in every metric except adoption and a larger memory footprint. The obvious choice was Zstandard, but I was not so confident about adoptinig it and solvin the ecosystem issue. At any rate, I slowly removed some legacy .zdebug support from llvm-project so that a new format could be more easily introduced.

In June, Cole Kissane posted [RFC] Zstandard as a second compression method to LLVM on LLVM discourse forums. I learned that other folks were investigating a better compression format for ELF compressed debug sections and told myself: it's high time to propose ELFCOMPRESS_ZSTD to the generic System V Application Binary Interface (generic ABI).

ELF is an elegant format which has passed the test of time. Many things created by the forefathers from 30 years ago carry over and are still used today. Every new feature, even a small addition like introducing a new constant has to pass a significant high bar for acceptance. There were many discussions on Add new ch_type value: ELFCOMPRESS_ZSTD.

Personally I think a selected format need to have these properties:

  • It has an open compression algorithm and implementation.
  • It provides significant benefits (compression speed, decompression speed, compression ratio) with a decent memory footprint and complexity.
  • It has full backward compatibility. In 20 years I want to be able to decompress a debug section created today.
  • It has a wide range of and active use cases. When the format value is standardized, consumers are willing to add support.
  • It has good documentation.
  • It's easy to use.

A compression format satisfying all these properties are rare. ELF does not like introducing a lot of options for one feature. It's not an experiment site for every new fancy compression format. We are wary of platform fragmentation and consumers don't like support a number of formats each claiming to be a good choice at a slightly different angle. See the appendix for my recent test of many compression utilities.

I made many arguments in the proposal thread. It took about one month and ELFCOMPRESS_ZSTD was accepted.

Toolchain support

The next step is to add toolchain support. The most important pieces are assemblers, linkers, and debuggers. Many other pieces are needed as well.

Toolchain components:

  • binutils: umbrella feature request. I have sent a patch
    • addr2line: symbolization needs to decompress debug sections
    • dwp: decompress compressed .dwo
    • gas: compress debug sections
    • ld: decompress compressed input sections and compress output debug sections
    • gold: similar to ld
    • nm: --line-numbers uses debug information
    • objcopy: --decompress-debug-sections and --compress-debug-sections=zstd
    • objdump: --dwarf decompresses compressed debug sections
    • readelf: --decompress decompresses a compressed section
  • gdb
    • decompress compressed debug sections in executables, shared objects, separate debug files, and .dwo files. Feature request
    • MiniDebugInfo section .gnu_debugdata is compressed with xz. zstd feature request
  • GCC: -gz=zstd feature request
  • llvm-project: all implemented as of 2022-09 (milestone: 16.0.0). The default LLVM_ENABLE_ZSTD=on needs a CMake config file to take effects.
    • Clang: compress .o and (if split DWARF is enabled) .dwo with level 5
    • llvm-objcopy: --decompress-debug-sections and --compress-debug-sections=zstd (level 5). Implemented in D130458 (ELFCLASS64) and D134385 (ELFCLASS32)
    • ld.lld: decompress ELFCOMPRESS_ZSTD input sections (D129406) and compress output debug sections with level 3 (D133548, D133679)
    • llvm-dwarfdump: use LLVMObject API to decompress ELFCOMPRESS_ZSTD input sections (D134116)
    • llvm-dwp: use LLVMObject API
    • llvm-symbolizer: use LLVMObject API
    • lldb: use LLVMObject API
  • elfutils: feature request
  • mold: implemented in 2022-09

Other languages:

Other utilities:

On the llvm-project side, there was a lot of debate on how the API should look like. This week we (Cole Kissane, David Blaikie, I) reached an agreement that the free function style compression API was acceptable. I have pushed some changes and llvm-objcopy --compress-debug-sections=zstd, clang -gz=std, ld.lld --compress-debug-sections=zstd are available now.

1
2
3
4
5
6
7
8
9
10
11
% cat a.cc
#include <iostream>
int main() { std::cout << "zstd"; }
% clang -c -g -gz a.cc
% readelf -x .debug_info a.o

Hex dump of section '.debug_info':
NOTE: This section has relocations against it, but these have NOT been applied to this dump.
0x00000000 02000000 00000000 1a180000 00000000 ................
0x00000010 01000000 00000000 28b52ffd 601a17ed ........(./.`...
...

ELFCOMPRESS_ZSTD (2) can be identified by the first 4 bytes. In a little-endian object file, it displays as 02000000.

If llvm-objcopy is built with zstd support, use --decompress-debug-sections to decompress an object file:

1
2
3
4
5
6
7
8
% llvm-objcopy --decompress-debug-sections a.o a.o.decompressed
% readelf -x .debug_info a.o.decompressed

Hex dump of section '.debug_info':
NOTE: This section has relocations against it, but these have NOT been applied to this dump.
0x00000000 16180000 05000108 00000000 01002100 ..............!.
0x00000010 01000000 00000000 00020000 00000000 ................
...

On the llvm-project side we should reach full feature readiness soon, after Cole's lldb patch is merged.

It would be nice that someone picks up the work items on the GNU side so that many Linux distributions can start investigating the adoption of zstd compressed debug sections.

Appendix

Recently I have redone the experiment. I have a -DCMAKE_BUILD_TYPE=Debug -DLLVM_TARGETS_TO_BUILD=all build of trunk clang. The 3 largest DWARF v5 debug sections are .debug_info, .debug_str, and .debug_line.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
% bloaty clang-16
FILE SIZE VM SIZE
-------------- --------------
32.3% 451Mi 0.0% 0 .debug_info
16.4% 229Mi 0.0% 0 .debug_str
11.4% 159Mi 0.0% 0 .debug_line
11.3% 157Mi 55.0% 157Mi .text
8.0% 112Mi 0.0% 0 .strtab
5.7% 80.2Mi 0.0% 0 .debug_str_offsets
3.6% 50.4Mi 17.6% 50.4Mi .rodata
2.4% 33.6Mi 0.0% 0 .debug_addr
2.2% 30.8Mi 10.7% 30.8Mi .eh_frame
1.7% 24.0Mi 0.0% 0 .symtab
1.0% 13.6Mi 4.7% 13.6Mi .rela.dyn
1.0% 13.4Mi 0.0% 0 .debug_rnglists
0.8% 11.0Mi 3.8% 11.0Mi .dynstr
0.7% 10.5Mi 3.7% 10.5Mi .data.rel.ro
0.6% 8.05Mi 0.0% 0 .debug_abbrev
0.5% 7.69Mi 2.7% 7.69Mi .eh_frame_hdr
0.2% 2.79Mi 1.0% 2.79Mi .dynsym
0.1% 848Ki 0.3% 848Ki .gnu.hash
0.1% 827Ki 0.1% 263Ki [21 Others]
0.0% 0 0.2% 544Ki .bss
0.0% 497Ki 0.2% 497Ki .data
100.0% 1.37Gi 100.0% 286Mi TOTAL

ninja -t commands bin/clang dumps the compiler driver command which links the executable. Invoke the command with -fuse-ld=lld -Wl,--prproduce=/tmp/clang-debug.tar to get a tarball. Use llvm-objcopy --dump-section to extract a section.

1
2
3
4
5
6
7
8
cd /tmp
tar xf clang-debug.tar
cd clang-debug
ld.lld @response.txt

llvm-objcopy --dump-section .debug_info=debug_info clang-16 /dev/null
llvm-objcopy --dump-section .debug_str=debug_str clang-16 /dev/null
llvm-objcopy --dump-section .debug_line=debug_line clang-16 /dev/null

I have tried brotli, bzip2, gzip, lz4, lzo, pigz, xz, zstd, and manually verified that zstd is the best considering compression speed, decompression speed, and compression ratio. Figuring out API for all these libraries will be inconvenient. So I take a shortcut: install these compression utilities with the package manager and hope that they use similar compiler driver options and the comparison is relative fair.

Here are some results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/bin/zsh
i=$1
o0=$i.o0
o1=$i.o1

measure() {
rm -f $o0 $o1
local a=$(/usr/bin/time -f '%e\t%M\t' ${=1} 2>&1)
${=2}
local b=$(/usr/bin/time -f '%e\t%M\t' ${=3} 2>&1)
stat -c "$a$b%s"$'\t'"$1" $o0
}
measure_no_o() {
rm -f $o0 $o1
local a=$(/usr/bin/time -f '%e\t%M\t' ${=1} > $o0 2> t0)
${=2}
local b=$(/usr/bin/time -f '%e\t%M\t' ${=3} > $o1 2> t1)
stat -c "$(<t0)$(<t1)%s"$'\t'"$1" $o0
}

do_pigz() { measure "pigz -zfk -p 1 -S .o0 $i" "cp -f $o0 $o1.gz" "pigz -dfk -p 1 $o1.gz"; }
do_brotli() { measure "brotli -f $* $i -o $o0" "" "brotli -df $o0 -o $o1"; }
do_bzip2() { measure_no_o "bzip2 -cf $* $i" "" "bzip2 -dcf $o0"; }
do_gzip() { measure "gzip -fk -S .o0 $* $i" "" "cp -f $o0 $o1.gz" "gzip -dfk $o1.gz"; }
do_lz4() { measure "lz4 -fq $* $i $o0" "" "lz4 -dfq $o0 $o1"; }
do_lzop() { measure "lzop -f $* $i -o $o0" "" "lzop -df $o0 -o $o1"; }
do_xz() { measure_no_o "xz -c $* $i" "" "xz -dc $o0"; }
do_zstd() { measure "zstd -fq $@ $i -o $o0" "" "zstd -dfq $o0 -o $o1"; }

print 'comp\tRSS\tdecomp\tRSS\tsize\tcommand'
do_pigz
do_brotli -q 1; do_brotli -q 3; do_brotli -q 5; do_brotli -q 9
do_bzip2 -1; do_bzip2 -3
do_gzip -1; do_gzip -3
do_lz4 --fast; do_lz4 -1; do_lz4 -3; do_lz4 -5; do_lz4 -9
do_lzop -1; do_lzop -3; do_lzop -9
do_xz -0; do_xz -1; do_xz -3; do_xz -6
do_zstd --fast; do_zstd -1; do_zstd -3; do_zstd -5; do_zstd -9; do_zstd -18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
% numactl -C 20 ./bench.sh debug_info
comp RSS decomp RSS size command
19.99 2568 2.37 2212 212342615 pigz -zfk -p 1 -S .o0 debug_info
2.44 4044 2.48 19188 244890939 brotli -f -q 1 debug_info -o debug_info.o0
5.92 36300 2.22 19264 211420890 brotli -f -q 3 debug_info -o debug_info.o0
22.77 110516 2.11 21128 193129564 brotli -f -q 5 debug_info -o debug_info.o0
239.99 114140 2.12 20988 191699881 brotli -f -q 9 debug_info -o debug_info.o0
36.24 2396 15.71 1604 212700609 bzip2 -cf -1 debug_info
34.29 3640 16.52 2396 221881700 bzip2 -cf -3 debug_info
9.16 2008 0.14 2604 227302010 gzip -fk -S .o0 -1 debug_info
11.57 2072 0.14 2536 221444175 gzip -fk -S .o0 -3 debug_info
1.16 8872 0.49 9116 310918343 lz4 -fq --fast debug_info debug_info.o0
1.25 8828 0.48 9168 295430369 lz4 -fq -1 debug_info debug_info.o0
6.85 8904 0.49 8896 265793678 lz4 -fq -3 debug_info debug_info.o0
8.60 9016 0.49 9000 263660439 lz4 -fq -5 debug_info debug_info.o0
14.26 8936 0.48 8848 262365618 lz4 -fq -9 debug_info debug_info.o0
1.42 2244 1.24 1720 283798959 lzop -f -1 debug_info -o debug_info.o0
1.43 2088 1.25 1632 282736380 lzop -f -3 debug_info -o debug_info.o0
71.94 2504 1.32 1756 240402339 lzop -f -9 debug_info -o debug_info.o0
27.76 5024 11.23 2560 164560076 xz -c -0 debug_info
34.19 10992 10.97 3292 163894892 xz -c -1 debug_info
56.63 34032 10.83 6296 164369948 xz -c -3 debug_info
149.35 97648 10.75 10400 149764760 xz -c -6 debug_info
2.55 12696 0.70 2916 252027101 zstd -fq --fast debug_info -o debug_info.o0
2.97 12528 0.83 2964 234661626 zstd -fq -1 debug_info -o debug_info.o0
3.95 41764 0.90 4568 215717860 zstd -fq -3 debug_info -o debug_info.o0
5.93 43960 0.91 4572 213222742 zstd -fq -5 debug_info -o debug_info.o0
9.96 90436 0.89 6668 208886078 zstd -fq -9 debug_info -o debug_info.o0
110.59 198708 1.46 10588 178996922 zstd -fq -18 debug_info -o debug_info.o0
% numactl -C 20 ./bench.sh debug_str
comp RSS decomp RSS size command
5.19 2552 0.68 2204 41086567 pigz -zfk -p 1 -S .o0 debug_str
0.75 3680 0.50 19156 45784253 brotli -f -q 1 debug_str -o debug_str.o0
1.58 36320 0.42 19116 37832846 brotli -f -q 3 debug_str -o debug_str.o0
4.05 52508 0.38 20012 30553722 brotli -f -q 5 debug_str -o debug_str.o0
18.87 80704 0.34 19924 24386096 brotli -f -q 9 debug_str -o debug_str.o0
16.70 2416 4.67 1600 37520606 bzip2 -cf -1 debug_str
16.96 3736 4.82 2392 31299002 bzip2 -cf -3 debug_str
2.46 1968 0.03 2528 52866269 gzip -fk -S .o0 -1 debug_str
2.89 1996 0.03 2560 48526083 gzip -fk -S .o0 -3 debug_str
0.52 6776 0.23 6944 68238485 lz4 -fq --fast debug_str debug_str.o0
0.53 6644 0.23 7004 65392601 lz4 -fq -1 debug_str debug_str.o0
1.86 6540 0.23 6596 47813656 lz4 -fq -3 debug_str debug_str.o0
2.70 6636 0.23 6528 46341752 lz4 -fq -5 debug_str debug_str.o0
4.92 6568 0.23 6684 45862077 lz4 -fq -9 debug_str debug_str.o0
0.56 2168 0.57 1672 71677076 lzop -f -1 debug_str -o debug_str.o0
0.56 2128 0.57 1604 71133429 lzop -f -3 debug_str -o debug_str.o0
15.73 2488 0.51 1812 47430843 lzop -f -9 debug_str -o debug_str.o0
6.87 4940 2.46 2496 38516552 xz -c -0 debug_str
7.46 10892 2.05 3256 33223964 xz -c -1 debug_str
17.02 34020 1.77 6280 29506364 xz -c -3 debug_str
68.33 97664 1.32 10396 21296880 xz -c -6 debug_str
1.14 11304 0.29 2920 50492121 zstd -fq --fast debug_str -o debug_str.o0
1.16 11096 0.29 3000 42965990 zstd -fq -1 debug_str -o debug_str.o0
1.26 37292 0.31 4512 37978434 zstd -fq -3 debug_str -o debug_str.o0
2.18 39716 0.30 4580 33960116 zstd -fq -5 debug_str -o debug_str.o0
3.87 82580 0.28 6312 29911833 zstd -fq -9 debug_str -o debug_str.o0
66.85 186704 0.28 10668 23900220 zstd -fq -18 debug_str -o debug_str.o0
% numactl -C 20 ./bench.sh debug_line
comp RSS decomp RSS size command
5.08 2524 0.63 2200 45210969 pigz -zfk -p 1 -S .o0 debug_line
0.56 3800 0.48 19104 42533806 brotli -f -q 1 debug_line -o debug_line.o0
1.15 36304 0.36 19272 30144639 brotli -f -q 3 debug_line -o debug_line.o0
3.38 56988 0.31 20344 23170193 brotli -f -q 5 debug_line -o debug_line.o0
18.64 86040 0.30 20032 20805984 brotli -f -q 9 debug_line -o debug_line.o0
12.74 2348 3.93 1548 40423655 bzip2 -cf -1 debug_line
13.48 3940 3.93 2364 34569267 bzip2 -cf -3 debug_line
2.19 1992 0.03 2584 51136260 gzip -fk -S .o0 -1 debug_line
2.73 1988 0.03 2440 49092753 gzip -fk -S .o0 -3 debug_line
0.33 7436 0.16 7848 66771211 lz4 -fq --fast debug_line debug_line.o0
0.36 7556 0.16 7656 64913560 lz4 -fq -1 debug_line debug_line.o0
1.55 7204 0.16 7288 49279291 lz4 -fq -3 debug_line debug_line.o0
2.13 7172 0.16 7212 48268337 lz4 -fq -5 debug_line debug_line.o0
3.46 7168 0.16 7080 47963094 lz4 -fq -9 debug_line debug_line.o0
0.42 2164 0.39 1636 67304513 lzop -f -1 debug_line -o debug_line.o0
0.42 2040 0.39 1672 66569062 lzop -f -3 debug_line -o debug_line.o0
24.08 2512 0.40 1704 47788969 lzop -f -9 debug_line -o debug_line.o0
5.44 4884 2.07 2544 31059860 xz -c -0 debug_line
5.71 10980 1.61 3272 24860272 xz -c -1 debug_line
12.29 33944 1.37 6344 21432224 xz -c -3 debug_line
45.04 97644 1.27 10480 18536212 xz -c -6 debug_line
0.67 11688 0.20 2940 50842109 zstd -fq --fast debug_line -o debug_line.o0
0.73 11400 0.21 3012 38849471 zstd -fq -1 debug_line -o debug_line.o0
0.87 37992 0.22 4544 28827098 zstd -fq -3 debug_line -o debug_line.o0
1.30 40180 0.22 4552 26554847 zstd -fq -5 debug_line -o debug_line.o0
2.34 83552 0.21 6624 23464249 zstd -fq -9 debug_line -o debug_line.o0
43.67 188676 0.22 10584 20195067 zstd -fq -18 debug_line -o debug_line.o0

When compressing debug sections, zstd and brotli are significantly better than the other choices. zstd slightly outperforms brotli in compression speed and compression ratio while being much fast at decompression.

xz -3 has a great compression ratio (higher levels are too slow). zstd and brotli with higher levels are extremely slow and can hardly achieve the xz compression ratio, but their decompression speed may compensate for that.

zlib (used by pigz) and bzip2 look pretty bad.

For zstd, the built-in parallel compression support is a plus.