When QOI meets XZ

QOI, the Quite OK Image format, has been gaining in popularity. Chris Wellons offers a great analysis.

QOI's key advantages is its simplicity. Being a byte-oriented format without entropy encoding, it can be further compressed with generic data compression programs like lz4, xz, and zstd. PNG, on the other hand, uses DEFLATE compression internally and is typically resistant to further compression. By applying a stronger compression algorithm on QOI output, you can often achieve a smaller file size compared to PNG.

Lasse Collin has shared some effective options for compressing uncompressed BMP/TIFF files. I tested them on the QOI benchmark images.

When the color table (palette) is used, a delta filter would increase the compressed size and should be disabled.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
% cat ~/tmp/b.sh
#!/bin/zsh -ue
f() {
pngcrush -fix -m 1 -l 0 $1 ${1/.png/.uncompressed.png}
[[ -f ${1/.png/.uncompressed.png} ]] || cp $1 ${1/.png/.uncompressed.png}
/tmp/p/qoi/qoiconv $1 ${1/.png/.qoi}
convert $1 ${1/.png/.bmp}
convert $1 -compress none ${1/.png/.tiff}
xz --lzma2=pb=0 -fk ${1/.png/.qoi}
if [[ $(file $1) =~ RGBA ]]; then
pnm=${1/.png/.pam}
convert $1 $pnm
xz --delta=dist=4 --lzma2=lc=4 -fk $pnm
xz --delta=dist=4 --lzma2=lc=4 -fk ${1/.png/.bmp}
xz --delta=dist=4 --lzma2=lc=4 -fk ${1/.png/.tiff}
elif [[ $(file $1) =~ 'colormap' ]]; then
pnm=${1/.png/.ppm}
convert $1 $pnm
xz --lzma2=pb=0 -fk $pnm
xz --lzma2=pb=0 -fk ${1/.png/.bmp}
xz --lzma2=pb=0 -fk ${1/.png/.tiff}
else
pnm=${1/.png/.ppm}
convert $1 $pnm
xz --delta=dist=3 --lzma2=pb=0 -fk $pnm
xz --delta=dist=3 --lzma2=pb=0 -fk ${1/.png/.bmp}
xz --delta=dist=3 --lzma2=pb=0 -fk ${1/.png/.tiff}
fi
stat -c '%n %s' $1 ${1/.png/.qoi.xz} $pnm.xz ${1/.png/.bmp.xz} ${1/.png/.tiff.xz}
}

f $1
1
2
3
4
5
6
7
8
9
10
11
12
cd /tmp/dc-img/images/
ls -1 **/*.png | rush ~/tmp/b.sh '"{}"'
ls -1 **/*.uncompressed.png | rush 'xz -fk --lzma2=pb=0 "{}"'

ruby -e 'puts "directory\t.png\t.png.xz\t.qoi.xz\t.bmp.xz\t.tiff.xz\t.p[ap]m.xz"; Dir.glob("*").each{|dir| next unless File.directory? dir;
png=pngxz=qoi=bmp=pnm=tiff=0; Dir.glob("#{dir}/*.qoi.xz").each{|f|
png+=File.size(f.sub(/\.qoi.xz/,".png"));
pngxz+=File.size(f.sub(/\.qoi.xz/,".uncompressed.png.xz"));
qoi+=File.size(f); bmp+=File.size(f.sub(/\.qoi/,".bmp")); ppm=f.sub(/\.qoi/,".ppm"); pnm+=File.exists?(ppm) ? File.size(ppm) : File.size(f.sub(/\.qoi/,".pam")); tiff+=File.size(f.sub(/\.qoi/,".tiff"));
};
puts "#{dir}\t#{png}\t#{pngxz}\t#{qoi}\t#{bmp}\t#{tiff}\t#{pnm}"
}'

While DEFLATE-compressed PNG files can hardly be further compressed, we can convert these PNG files to uncompressed ones then apply xz.

directory .png .png.xz .qoi.xz .bmp.xz .tiff.xz .p[ap]m.xz
icon_512 11154424 7861652 7476640 8042032 8064476 8039192
icon_64 828119 750836 708480 730472 757760 735296
photo_kodak 15394305 14464504 12902852 13612440 13616140 13610844
photo_tecnick 237834256 254803292 213268188 210591724 210508596 210468412
photo_wikipedia 88339751 100449996 86679696 86380124 86274480 86241296
pngimg 229608249 134233476 193382668 186389368 186654256 186420564
screenshot_game 266238855 237976536 218915316 216626004 216847500 216765748
screenshot_web 40272678 24690360 21321460 21458496 21532360 21533432
textures_photo 37854634 36393340 28967008 30054968 30064236 30059784
textures_pk 43523493 40868036 54117600 41990596 40632916 46695172
textures_pk01 18946769 15734348 14950836 14835648 14853420 14839312
textures_pk02 102962935 86037000 82279000 79374112 79348768 79336276
textures_plants 51765329 53044044 43681548 44913260 45021996 45048652

While compressing QOI with XZ (.qoi.xz) can achieve good results, using a delta filter directly on the uncompressed BMP format (.bmp.xz) can sometimes lead to even smaller files. (TIFF and PPM/PAM, when compressed, can achieve similar file sizes to .bmp.xz.) This suggests that QOI is probably not better than a plain delta filter.

It's important to note that uncompressed BMP/TIFF files are huge. This can be problematic if the decompressed data can't be streamed directly into the program's internal structures. In such cases, a large temporary buffer would be needed, wasting memory.

Drop LZ match finders

QOI_OP_INDEX essentially does length-1 LZ77 using a conceptual window that contains 64 unique pixels. When further compressed, another match finder seems to help very little.

Lasse Collin mentioned that the LZ layer cannot be disabled but it can be made really weak using xz --lzma2=dict=4KiB,mode=fast,nice=2,mf=hc3,depth=1. Let's try it.

1
2
3
4
5
6
% =time -f '%e' xz -fk Prune_video_game_screenshot_2.qoi && stat -c %s Prune_video_game_screenshot_2.qoi.xz
0.76
2462360
% =time -f '%e' xz --lzma2=dict=4KiB,mode=fast,nice=2,mf=hc3,depth=1 -fk Prune_video_game_screenshot_2.qoi && stat -c %s Prune_video_game_screenshot_2.qoi.xz
0.27
2526664

Indeed, weakening the LZ layer improves compression speed signicantly. Now, let's test all benchmark images.

1
2
3
4
5
6
7
8
9
10
11
12
% cat ~/tmp/qoi-weak-xz.sh
#!/bin/zsh
/tmp/p/qoi/qoiconv $1 ${1/.png/.qoi}
xz --lzma2=pb=0 -fk ${1/.png/.qoi}
xz --lzma2=dict=4KiB,mode=fast,nice=2,mf=hc3,depth=1 -c ${1/.png/.qoi} > ${1/.png/.qoi.weak-lz.xz}
% cd /tmp/dc-img/images
% ls -1 **/*.png | rush ~/tmp/qoi-weak-xz.sh '"{}"'

ruby -e 'puts "directory\tstrong\tweak\tincrease"; Dir.glob("*").each{|dir| next unless File.directory? dir;
strong=weak=0; Dir.glob("#{dir}/*.qoi.weak-lz.xz").each{|f| weak+=File.size(f); strong+=File.size(f.sub(/\.weak-lz/,""));};
puts "#{dir}\t#{strong}\t#{weak}\t#{(100.0*weak/strong-100).round(2)}%"
}'
directory strong weak increase
icon_512 7476640 8629900 15.42%
icon_64 708480 735036 3.75%
photo_kodak 12902852 13464072 4.35%
photo_tecnick 213268188 217460392 1.97%
photo_wikipedia 86679696 88609716 2.23%
pngimg 193382668 206679224 6.88%
screenshot_game 218915316 234889060 7.3%
screenshot_web 21321460 24820020 16.41%
textures_photo 28967008 31249492 7.88%
textures_pk 54117600 57956168 7.09%
textures_pk01 14950836 15749556 5.34%
textures_pk02 82279000 87747576 6.65%
textures_plants 43681548 45494084 4.15%

This size increase is small for certain directories but quite large for the others. For the directories with small size increases, relying purely on delta coding and a fast entropy encoder will give a strong competitor.

PNG

The PNG International Standard defines the compression method 0 as DEFLATE with a sliding window of at most 32768 bytes. Technically new compression methods can be defined, but that would break compatibility of existing decoders and stakeholders would just resort to new image formats. However, it would be a nice experiment to check that after the compression part is improved, how PNG compares with newer image formats.

A compact section header table for ELF

ELF's design emphasizes natural size and alignment guidelines for its control structures. However, this approach has substantial size drawbacks.

In a release build of llvm-project (-O3 -ffunction-sections -fdata-sections, the section header tables occupy 13.4% of the .o file size.

I propose an alternative section header table format that is signaled by e_shentsize == 0 in the ELF header. e_shentsize == sizeof(Elf64_Shdr) (or the 32-bit counterpart) selects the traditional section header table format.

Read More

C++ exit-time destructors

In ISO C++ standards, [basic.start.term] specifies that:

Constructed objects ([dcl.init]) with static storage duration are destroyed and functions registered with std::atexit are called as part of a call to std::exit ([support.start.term]). The call to std::exit is sequenced before the destructions and the registered functions. [Note 1: Returning from main invokes std::exit ([basic.start.main]). — end note]

For example, consider the following code:

1
struct A { ~A(); } a;

The destructor for object a will be registered for execution at program termination.

Read More

A compact relocation format for ELF

This article introduces CREL (previously known as RELLEB), a new relocation format offering incredible size reduction (LLVM implementation in my fork).

ELF's design emphasizes natural size and alignment guidelines for its control structures. This principle, outlined in Proceedings of the Summer 1990 USENIX Conference, ELF: An Object File to Mitigate Mischievous Misoneism, promotes ease of random access for structures like program headers, section headers, and symbols.

All data structures that the object file format defines follow the "natural" size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of four, etc. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an Elf32_Addr member will be aligned on a 4-byte boundary within the file. Other classes would have appropriately scaled definitions. To illustrate, the 64-bit class would define Elf64 Addr as an 8-byte object, aligned on an 8-byte boundary. Following the strictest alignment for each object allows the format to work on any machine in a class. That is, all ELF structures on all 32-bit machines have congruent templates. For portability, ELF uses neither bit-fields nor floating-point values, because their representations vary, even among pro- cessors with the same byte order. Of course the programs in an ELF file may use these types, but the format itself does not.

Read More

MMU-less systems and FDPIC

This article describes ABI and toolchain considerations about systems without a Memory Management Unit (MMU). We will focus on FDPIC and the in-development FDPIC ABI for RISC-V, with updates as I delve deeper into the topic.

Embedded systems often lack MMUs, relying on real-time operating systems (RTOS) like VxWorks or special Linux configurations (CONFIG_MMU=n). In these systems, the offset between the text and data segments is often not knwon at compile time. Therefore, a dedicated register is typically set to somewhere in the data segment and writable data is accessed relative to this register.

Why is the offset not knwon at compile time? There are primarily two reasons.

First, eXecute in Place (XIP), where code resides in ROM while the data segment is copied to RAM. Therefore, the offset between the text and data segments is often not knwon at compile time.

Second, all processes share the same address space without MMU. However, it is still desired for these processes to share text segments. Therefore needs a mechanism for code to find its corresponding data.

Read More

Toolchain notes on z/Architecture

This article describes some notes about z/Architecture with a focus on the ELF ABI and ELF linkers. An lld/ELF patch sparked my motivation to study the architecture and write this post.

z/Architecture is a big-endian mainframe computer architecture supporting 24-bit, 31-bit, and 64-bit addressing modes. It is the latest generation in a lineage stretching back to the 1964 with IBM System/360 (32-bit general-purpose registers and 24-bit addressing). This lineage includes System/370 (1970), System/370 Extended Architecture (1983), Enterprise Systems Architecture/370 (1988), and Enterprise Systems Architecture/390 (1990). For a deeper dive into the design choices behind z/Architecture's extension from ESA/390, you can refer to "Development and attributes of z/Architecture." IBM System/360 at Computer History Museum

Read More

Raw symbol names in inline assembly

For operands in asm statements, GCC has supported the constraints "i" and "s" for a long time (since at least 1992).

1
2
3
4
5
6
7
8
9
10
11
// gcc/common.md
(define_constraint "i"
"Matches a general integer constant."
(and (match_test "CONSTANT_P (op)")
(match_test "!flag_pic || LEGITIMATE_PIC_OPERAND_P (op)")))

(define_constraint "s"
"Matches a symbolic integer constant."
(and (match_test "CONSTANT_P (op)")
(match_test "!CONST_SCALAR_INT_P (op)")
(match_test "!flag_pic || LEGITIMATE_PIC_OPERAND_P (op)")))

Read More