2024-02-20

MMU-less systems and FDPIC

This article describes ABI and toolchain considerations about systems without a Memory Management Unit (MMU). We will focus on FDPIC and the in-development FDPIC ABI for RISC-V, with updates as I delve deeper into the topic.

Embedded systems often lack MMUs, relying on real-time operating systems (RTOS) like VxWorks or special Linux configurations (CONFIG_MMU=n). In these systems, the offset between the text and data segments is often not knwon at compile time. Therefore, a dedicated register is typically set to somewhere in the data segment and writable data is accessed relative to this register.

Why is the offset not knwon at compile time? There are primarily two reasons.

First, eXecute in Place (XIP), where code resides in ROM while the data segment is copied to RAM. Therefore, the offset between the text and data segments is often not knwon at compile time.

Second, all processes share the same address space without MMU. However, it is still desired for these processes to share text segments. Therefore needs a mechanism for code to find its corresponding data.

2024-02-18

lld 18 ELF changes

LLVM 18 will be released. As usual, I maintain lld/ELF and have added some notes to https://github.com/llvm/llvm-project/blob/release/18.x/lld/docs/ReleaseNotes.rst. I've meticulously reviewed nearly all the patches that are not authored by me. I'll delve into some of the key changes.

2024-02-11

Toolchain notes on z/Architecture

This article describes some notes about z/Architecture with a focus on the ELF ABI and ELF linkers. An lld/ELF patch sparked my motivation to study the architecture and write this post.

z/Architecture is a big-endian mainframe computer architecture supporting 24-bit, 31-bit, and 64-bit addressing modes. It is the latest generation in a lineage stretching back to the 1964 with IBM System/360 (32-bit general-purpose registers and 24-bit addressing). This lineage includes System/370 (1970), System/370 Extended Architecture (1983), Enterprise Systems Architecture/370 (1988), and Enterprise Systems Architecture/390 (1990). For a deeper dive into the design choices behind z/Architecture's extension from ESA/390, you can refer to "Development and attributes of z/Architecture." IBM System/360 at Computer History Museum

2024-01-30

Raw symbol names in inline assembly

For operands in asm statements, GCC has supported the constraints "i" and "s" for a long time (since at least 1992).

// gcc/common.md
(define_constraint "i"
  "Matches a general integer constant."
  (and (match_test "CONSTANT_P (op)")
       (match_test "!flag_pic || LEGITIMATE_PIC_OPERAND_P (op)")))

(define_constraint "s"
  "Matches a symbolic integer constant."
  (and (match_test "CONSTANT_P (op)")
       (match_test "!CONST_SCALAR_INT_P (op)")
       (match_test "!flag_pic || LEGITIMATE_PIC_OPERAND_P (op)")))

2024-01-28

Modified condition/decision coverage (MC/DC) and compiler implementations

Key metrics for code coverage include:

function coverage: determines whether each function been executed.
line coverage (aka statement coverage): determines whether every line has been executed.
branch coverage: ensures that both the true and false branches of each conditional statement or the condition of each loop statement been evaluated.

Condition coverage offers a more fine-grained evaluation of branch coverage. It requires that each individual boolean subexpression (condition) within a compound expression be evaluated to both true and false. For example, in the boolean expression if (a>0 && f(b) && c==0), each of a>0, f(b), and c==0, condition coverage would require tests that:

2024-01-23

RISC-V TLSDESC works!

Updated in 2024-05.

Back in 2019, I studied a bit about RISC-V and filed Support Thread-Local Storage Descriptors (TLSDESC). Last year, Tatsuyuki Ishi added a specification for TLSDESC.

LLVM

On the LLVM side, the RISC-V TLSDESC work has been completed.

The the most important patch is [RISCV] Support Global Dynamic TLSDESC in the RISC-V backend by Paul Kirth. The linker patch by me is also significant. Furthermore, Clang requires a -mtls-dialect= patch.

These patches are expected to be included in the upcoming LLVM 18.1 release. To obtain TLSDESC code sequences, compile your program with clang --target=riscv64-linux -fpic -mtls-dialect=desc.

GCC

RISC-V: Implement TLS Descriptors. landed in April 2024.

binutils

RISC-V: Initial ld.bfd support for TLSDESC. landed in February 2024.

glibc

Latest patch: https://inbox.sourceware.org/libc-alpha/20230914084033.222120-1-ishitatsuyuki@gmail.com/

musl

musl added support in February 2024.

Bionic

No patch yet.

Testing

The LLVM patches need testing. Unfortunately, I didn't have a RISC-V image at hand, so I used qemu-user.

Patch musl per Re: Draft riscv64 TLSDESC implementation

diff --git c/arch/riscv64/reloc.h w/arch/riscv64/reloc.h
index 1ca13811..7c7c0611 100644
--- c/arch/riscv64/reloc.h
+++ w/arch/riscv64/reloc.h
@@ -17,6 +17,7 @@
 #define REL_DTPMOD      R_RISCV_TLS_DTPMOD64
 #define REL_DTPOFF      R_RISCV_TLS_DTPREL64
 #define REL_TPOFF       R_RISCV_TLS_TPREL64
+#define REL_TLSDESC     R_RISCV_TLSDESC

 #define CRTJMP(pc,sp) __asm__ __volatile__( \
        "mv sp, %1 ; jr %0" : : "r"(pc), "r"(sp) : "memory" )
diff --git c/include/elf.h w/include/elf.h
index 72d17c3a..7f342a23 100644
--- c/include/elf.h
+++ w/include/elf.h
@@ -3254,6 +3254,7 @@ enum
 #define R_RISCV_TLS_DTPREL64    9
 #define R_RISCV_TLS_TPREL32     10
 #define R_RISCV_TLS_TPREL64     11
+#define R_RISCV_TLSDESC         12

 #define R_RISCV_BRANCH          16
 #define R_RISCV_JAL             17
diff --git c/src/ldso/riscv64/tlsdesc.s w/src/ldso/riscv64/tlsdesc.s
new file mode 100644
index 00000000..56d1ce89
--- /dev/null
+++ w/src/ldso/riscv64/tlsdesc.s
@@ -0,0 +1,33 @@
+.text
+.global __tlsdesc_static
+.hidden __tlsdesc_static
+.type __tlsdesc_static,%function
+__tlsdesc_static:
+       ld a0,8(a0)
+       jr t0
+
+.global __tlsdesc_dynamic
+.hidden __tlsdesc_dynamic
+.type __tlsdesc_dynamic,%function
+__tlsdesc_dynamic:
+       add sp,sp,-16
+       sd t1,(sp)
+       sd t2,8(sp)
+
+       ld t2,-8(tp) # t2=dtv
+
+       ld a0,8(a0)  # a0=&{modidx,off}
+       ld t1,8(a0)  # t1=off
+       ld a0,(a0)   # a0=modidx
+       sll a0,a0,3  # a0=8*modidx
+
+       add a0,a0,t2 # a0=dtv+8*modidx
+       ld a0,(a0)   # a0=dtv[modidx]
+       add a0,a0,t1 # a0=dtv[modidx]+off
+       sub a0,a0,tp # a0=dtv[modidx]+off-tp
+
+       ld t1,(sp)
+       ld t2,8(sp)
+       add sp,sp,16
+       jr t0
+

1	(mkdir -p out/rv64 && cd out/rv64 && ../../configure --target=riscv64-linux-gnu && make -j 50)

Adjust ~/musl/out/rv64/lib/musl-gcc.specs and update ~/musl/out/rv64/obj/musl-gcc

cat > ~/musl/out/rv64/obj/musl-gcc <<eof
#!/bin/sh
exec "${REALGCC:-riscv64-linux-gnu-gcc}" "$@" -specs ~/musl/out/rv64/lib/musl-gcc.specs
eof

I have also modified musl-clang (clang wrapper). Adjust ~/musl/out/rv64/obj/musl-clang to use --target=riscv64-linux-musl. Adjust ~/musl/out/rv64/obj/ld.musl-clang to define cc="/tmp/Rel/bin/clang --target=riscv64-linux-gnu" and invoke exec /tmp/Rel/bin/ld.lld "$@" -lc.

Prepare a runtime test mentioned at the end of https://maskray.me/blog/2021-02-14-all-about-thread-local-storage

cat > ./a.c <<eof
#include <assert.h>
int foo();
int bar();
int main() {
  assert(foo() == 2);
  assert(foo() == 4);
  assert(bar() == 2);
  assert(bar() == 4);
}
eof

cat > ./b.c <<eof
#include <stdio.h>
__thread int tls0;
extern __thread int tls1;
int foo() { return ++tls0 + ++tls1; }
static __thread int tls2, tls3;
int bar() { return ++tls2 + ++tls3; }
eof

echo '__thread int tls1;' > ./c.c

sed 's/        /\t/' > ./Makefile <<'eof'
.MAKE.MODE = meta curDirOk=true

CC := ~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w
LDFLAGS := -Wl,-rpath=.

all: a0 a1 a2

run: all
        ./a0 && ./a1 && ./a2

c.so: c.o; ${LINK.c} -shared $> -o $@
bc.so: b.o c.o; ${LINK.c} -shared $> -o $@
b.so: b.o c.so; ${LINK.c} -shared $> -o $@

a0: a.o b.o c.o; ${LINK.c} $> -o $@
a1: a.o b.so; ${LINK.c} $> -o $@
a2: a.o bc.so; ${LINK.c} $> -o $@
eof

bmake run => succeeded!

% bmake run
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -c a.c
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -c b.c
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -c c.c
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -Wl,-rpath=. a.o b.o c.o -o a0
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -Wl,-rpath=. -shared c.o -o c.so
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -Wl,-rpath=. -shared b.o c.so -o b.so
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -Wl,-rpath=. a.o b.so -o a1
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -Wl,-rpath=. -shared b.o c.o -o bc.so
~/musl/out/rv64/obj/musl-clang -O1 -g -fpic -mtls-dialect=desc -w -g  -Wl,-rpath=. a.o bc.so -o a2
./a0 && ./a1 && ./a2

Test GCC

During my development of the linker patch, the Clang Driver patch was actually not ready yet. I used a more hacky approach by compiling using GCC, replacing some assembly fragments with TLSDESC code sequences, and assemblying using Clang.

Compile b.c to bb.s. Replace general-dynamic code sequences (e.g. la.tls.gd a0,tls0; call __tls_get_addr@plt) with TLSDESC, e.g.

.Ltlsdesc_hi0:
  auipc a0, %tlsdesc_hi(tls0)
  ld  a1, %tlsdesc_load_lo(.Ltlsdesc_hi0)(a0)
  addi  a0, a0, %tlsdesc_add_lo(.Ltlsdesc_hi0)
  jalr  t0, 0(a1), %tlsdesc_call(.Ltlsdesc_hi0)
  add   a0, a0, tp

Create an alias bin/ld.lld to be used with -Bbin -fuse-ld=lld. I made some adjustment to the Makefile so that an invocation looks like:

% bmake run
~/musl/out/rv64/obj/musl-gcc -O1 -g -fpic -Bbin -fuse-ld=lld -g  -c a.c
/tmp/Rel/bin/clang --target=riscv64-linux -c bb.s -o b.o
~/musl/out/rv64/obj/musl-gcc -O1 -g -fpic -Bbin -fuse-ld=lld -g  -c c.c
~/musl/out/rv64/obj/musl-gcc -O1 -g -fpic -Bbin -fuse-ld=lld -g  -Wl,-rpath=. a.o b.o c.o -o a0
~/musl/out/rv64/obj/musl-gcc -O1 -g -fpic -Bbin -fuse-ld=lld -g  -Wl,-rpath=. -shared c.o -o c.so
~/musl/out/rv64/obj/musl-gcc -O1 -g -fpic -Bbin -fuse-ld=lld -g  -Wl,-rpath=. -shared b.o c.so -o b.so
~/musl/out/rv64/obj/musl-gcc -O1 -g -fpic -Bbin -fuse-ld=lld -g  -Wl,-rpath=. a.o b.so -o a1
~/musl/out/rv64/obj/musl-gcc -O1 -g -fpic -Bbin -fuse-ld=lld -g  -Wl,-rpath=. -shared b.o c.o -o bc.so
~/musl/out/rv64/obj/musl-gcc -O1 -g -fpic -Bbin -fuse-ld=lld -g  -Wl,-rpath=. a.o bc.so -o a2
./a0 && ./a1 && ./a2

2024-01-14

Exploring object file formats

My journey with the LLVM project began with a deep dive into the world of lld and binary utilities. Countless hours were spent unraveling the intricacies of object file formats and shaping LLVM's relevant components. Though my interests have since broadened, object file formats remain a personal fascination, often drawing me into discussions around potential changes within LLVM.

This article compares several prominent object file formats, drawing upon my experience and insights.

At the heart of each format lies the representation of essential components like symbols, sections, and relocations. For each control structure, We'll begin with ELF, a widely used format, before venturing into the landscapes of other notable formats.

2023-12-31

2023年总结

一如既往，主要在工具链领域耕耘。

llvm-project

I made 700+ commits this year. Many are clean-up commits or fixup for others' work. I hope that I can do more useful work next year.

Enabled --features=layering_check for Bazel's llvm and clang projects
Implemented LLVM_ENABLE_REVERSE_ITERATION for StringMap
Added llvm::xxh3_64bits and adopted it in llvm/clang/lld
Made AArch64 changes to msan and dfsan
Made various improvements to the clang driver, including -### exit code, auxiliary files, errors for unsupported target-specific options, -fsanitize=kcfi, and XRay
Made various sanitizer changes
Made -fsanitize=function work for C and non-x86 architectures
Fixed several major problems of assembler implementation of RISC-V linker relaxation
clangSema: %lb recognization for printf/scanf, checking the failure memory order for atomic_compare_exchange-family built-in functions, -Wc++11-narrowing-const-reference
llvm-objdump: @plt symbols for x86 .plt.got, mapping symbol improvements, --disassemble-symbols changes, etc
Supported R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 for .uleb128 directives
gcov: fix instrumentation crashes when using inline variables, #line, and #include, made llvm-cov gcov work with a smaller stack size
LTO: fixed local ifunc for ThinLTO, reported errors when parsing module-level inline assembly
CodeLayout: fixed a correctness bug and improved performance to be suitable for lld/ELF --call-graph-profile-sort=hfsort default

Reviewed many commits. A lot of people don't add a Reviewed By: tag. Anyway, counting commits with the tag can give an underestimate.

1 2	% git shortlog -sn 2679e8bba3e166e3174971d040b9457ec7b7d768...main --grep 'Reviewed .*MaskRay' \| awk '{s+=$1}END{print s}' 395

2023-12-30

reviews.llvm.org became a read-only archive

For approximately 10 years, reviews.llvm.org functioned as the code view site for the LLVM project, utilizing a Phabricator instance. This website hosted numerous invaluable code review discussions. However, following LLVM's transition to GitHub pull requests, there arises a necessity for a read-only archive of the existing Phabricator instance. (https://archive.org/ archives a subset of the reviews.llvm.org/Dxxxxx pages.)

The intent is to eliminate a SQL engine. Phabicator operates on a complex database scheme. To minimize time investment, the most feasible approach seems to involve downloading the static HTML pages and employing a lightweight scraping process.

Raphaël Gomès developed phab-archive to serve a read-only archive for Mercurial's Phabricator instance. I have modified the code to suit reviews.llvm.org.

The DNS records of reviews.llvm.org have been pointed to the archive website.

2023-12-17

Exploring the section layout in linker output

This article describes section layout and its interaction with dynamic loaders and huge pages.

Let's begin with a Linux x86-64 example involving global variables exhibiting various properties such as read-only versus writable, zero-initialized versus non-zero, and more.

#include <stdio.h>
const int ro = 1;
int w0, w1 = 1;
int *const pw0 = &w0;
int main() {
  printf("%d %d %d %p\n", ro, w0, w1, pw0);
}