Build glibc with LLD 13

LLD is the LLVM linker. It started at the end of 2011 as a work-in-progress rewrite of ld64 for the Mach-O binary format based on the atom model. COFF and ELF ports based on the atom model were contributed subsequently. They shared one symbol resolution model. (IMO due to Mach-O's unfortunate limitation of 255 section .subsections_via_symbols was invented. The atom model was an incarnation of the concept but it did not fit into ELF/PE where sections are the better basic units.)

In 2015, both COFF and ELF ports were rewritten. (See "LLD improvement plan") Today, LLD is a mature and fast linker supporting multiple binary formats (ELF, Mach-O, PE/COFF, WebAssembly). FreeBSD, Android, and Chrome OS have adopted it as the main linker.

As a main contributor of LLD's ELF port who has fixed numerous corner cases in recent years, I consider that its x86-64 support has been mature since the 8.0.0 release and is in a great shape since 9.0.0. The AArch64 and PowerPC32/PowerPC64 support has been great since the 10.0.0 release. The 11.0.0 release has very solid linker script support. (When people complain that GNU ld's linker script is not immediately usable with LLD, it is almost assuredly the problem of the script itself.) So, what's next? Build glibc with LLD!

glibc is known for tricks used here and there and tons of GNU extensions which challenge a "foreign" toolchain like llvm-project (Clang, LLD, etc). I just expected quirky linker usage which should be fixed on glibc's side, not anything to improve on LLD' side:)

My adventure concluded with the final toggle configure: Allow LD to be LLD 13.0.0 or above [BZ #26558]. The next release glibc 2.35 should be buildable with LLD 13.0.0, with all tests passing on aarch64/i386/x86-64. (I lied. It seems that you cannot assume all tests pass with GNU ld. On Debian and its derivatives, you may observe more failures than Fedora. Anyway, I just wanted to say LLD does not have more failures than GNU ld.)

Read on.

Build

librtld.map

There is a bootstrapping problem between ld.so and libc because they are separate. In a nutshell, elf/Makefile performs the following steps to build elf/ld.so:

  • Create elf/libc_pic.a from libc .os files
  • Create elf/dl-allobjs.os from a relocatable link of rtld .os files
  • Create link map elf/librtld.map from a relocatable link of elf/dl-allobjs.os, elf/libc_pic.a, and -lgcc
  • Get a list of extracted archive members (elf/librtld.mk) from elf/librtld.map and create elf/rtld-libc.a
  • Create elf/librtld.os from a relocatable link of elf/dl-allobjs.os and elf/rtld-libc.a
  • Create elf/ld.so from a -shared link of elf/librtld.os with the version script ld.map

In a link map printed by GNU ld, Archive member included to satisfy reference by file (symbol) is followed by extracted archive members. elf/Makefile made use of sed -n 's@^$(common-objpfx)\([^(]*\)(\([^)]*\.os\)) *.*$$@\1 \2@p' to extract the archive members. LLD doesn't implement Archive member included to satisfy reference by file (symbol). Fortunately, LLD's output has lines like

1
2
3
1f350            1f350       1e    16         /home/maskray/Dev/glibc/out/lld/elf/rtld-libc.a(rtld-access.os):(.text)
1f350 1f350 1e 1 __access
1f350 1f350 1e 1 access

We can use sed -n 's@^[0-9a-f ]*$(common-objpfx)\([^(]*\)(\([^)]*\.os\)) *.*$$@\1 \2@p' to extract the archive members.

scripts/output-format.sed

BZ #26559

libc.so is a linker script. On Debian GNU/Linux, it looks like:

1
2
3
4
5
6
% cat /lib/x86_64-linux-gnu/libc.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/x86_64-linux-gnu/libc_nonshared.a AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) )

The idea is that -lc can expand to something like -( libc.so.6 libc_nonshared.a --push-state --as-needed ld-linux-x86-64.so.2 --pop-state -). libc_nonshared.a contains functions which should be statically linked. ld-linux-x86-64.so.2 is mostly for __tls_get_addr used by general-dynamic/local-dynamic TLS models. Commit d3f5f87569398d11756b3dcb7a66926bfd8ee047 (in 2015) added AS_NEEDED with no description of the purpose. Retroactively, this can make a ld.so performance difference when an executable has O(1000) shared object dependencies because the overall shared object uniqueness check has quadratic time complexity.

The first non-comment line is an OUTPUT_FORMAT command, which is derived from the output of ld --verbose. In GNU ld, --verbose prints the internal linker script, which is used when an external one (-T) is not used.

1
2
3
4
5
6
7
8
9
10
11
...
using internal linker script:
==================================================
/* Script for -z combreloc -z separate-code */
/* Copyright (C) 2014-2020 Free Software Foundation, Inc.
Copying and distribution of this script, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
"elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)

Makerules extracted the OUTPUT_FORMAT line with a frightening sed script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/ld.*[  ]-E[BL]/b f
/collect.*[ ]-E[BL]/b f
/OUTPUT_FORMAT[^)]*$/{N
s/\n[ ]*/ /
}
t o
: o
s/^.*OUTPUT_FORMAT(\([^,]*\), \1, \1).*$/OUTPUT_FORMAT(\1)/
t q
s/^.*OUTPUT_FORMAT(\([^,]*\), \([^,]*\), \([^,]*\)).*$/\1,\2,\3/
t s
s/^.*OUTPUT_FORMAT(\([^,)]*\).*$)/OUTPUT_FORMAT(\1)/
t q
d
: s
s/"//g
G
s/\n//
s/^\([^,]*\),\([^,]*\),\([^,]*\),B/OUTPUT_FORMAT(\2)/p
s/^\([^,]*\),\([^,]*\),\([^,]*\),L/OUTPUT_FORMAT(\3)/p
s/^\([^,]*\),\([^,]*\),\([^,]*\)/OUTPUT_FORMAT(\1)/p
/,/s|^|*** BUG in libc/scripts/output-format.sed *** |p
q
: q
s/"//g
p
q
: f
s/^.*[ ]-E\([BL]\)[ ].*$/,\1/
t h
s/^.*[ ]-E\([BL]\)$/,\1/
t h
d
: h
h

LLD does not have an internal linker script so libc.so did not have the OUTPUT_FORMAT line. ( Personally I think an internal linker script is not useful. It would have some exposition value but the language is not powerful enough to encode all built-in logic. If LLD is to support the feature, we would need to emit a lot of conditional code which can add a huge amount of maintenance burden. )

Inspired by a Linux kernel usage, I realized that there is a better way to get the output format (bfdname): we can just parse the output of objdump -f.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
% objdump -f elf/ld.so

elf/ld.so: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000001050

% llvm-objdump -f elf/ld.so

elf/ld.so: file format elf64-x86-64

architecture: x86_64
start address: 0x0000000000001050

llvm-objdump -f somewhat printed upper-case output formats. I switched the case in D76046.

Fixed by install: Replace scripts/output-format.sed with objdump -f [BZ #26559].

--defsym suppressing archive member extraction

elf/Makefile specified -Wl,--defsym=malloc=0 and other malloc.os definitions before libc_pic.a so that libc_pic.a(malloc.os) is not extracted. This trick was used to avoid multiple definition errors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# These symbols need to be stubbed out during symbol discovery because
# their implementation is provided differently in rtld, and the symbol
# discovery mechanism is not compatible with the libc implementation
# when compiled for libc.
rtld-stubbed-symbols = \
__GI___pthread_disable_asynccancel \
__GI___pthread_enable_asynccancel \
__pthread_disable_asynccancel \
__pthread_enable_asynccancel \
calloc \
free \
malloc \
realloc

...

# The GCC arguments that implement $(rtld-stubbed-symbols).
rtld-stubbed-symbols-args = \
$(patsubst %,-Wl$(comma)--defsym=%=0, $(rtld-stubbed-symbols))

$(objpfx)librtld.map: $(objpfx)dl-allobjs.os $(common-objpfx)libc_pic.a
@-rm -f $@T
$(reloc-link) -o $@.o $(rtld-stubbed-symbols-args) \
'-Wl,-(' $^ -lgcc '-Wl,-)' -Wl,-Map,$@T
rm -f $@.o

For the interaction between a linker option and an input file, LLD generally chooses the behavior so that their relative order does not matter. Some options are inherently order dependent, e.g. --as-needed and --no-as-needed, --whole-archive and --no-whole-archive. However, reducing order dependence can improve robustness of a build system.

I had a debate with others and finally I noticed one point: --defsym defines a SHN_ABS symbol while a normal definition is relative to the image base. So a normal definition is better regardless.

I sent the patch in April 2020, pinged once in August. Since nobody responded, I sent it again in December. Finally, this issue is fixed by elf: Replace a --defsym trick with an object file to be compatible with LLD.

-static-pie and __rela_iplt_start/__rela_iplt_end

BZ #27164

glibc's static non-pie and static pie modes take different code paths for R_*_IRELATIVE relocation resolving. Its static pie mode expects that __rela_iplt_start==__rela_iplt_end, which is satisfied if ld leaves the symbols zero. Before 13.0.0, LLD defined __rela_iplt_start/__rela_iplt_end for -pie and therefore broke glibc loader's assumption, causing a program to crash. See GNU indirect function for details about the encapsulation symbols.

I made good arguments but were dismissed by "Ulrich and I designed/implemented IFUNC on x86 and for x86. I consider x86 implementation of IFUC as the gold standard."

In the end, I conceded and changed LLD to only define the two encapsulation symbols for -no-pie with grumbling. Otherwise, {gcc,clang} -fuse-ld=lld -static-pie produced static pie would crash, even if the used glibc was built with GNU ld.

_GLOBAL_OFFSET_TABLE_[0]

BZ #28203 for aarch64.

In nearly every ELF port of GNU ld, _GLOBAL_OFFSET_TABLE_[0] is the link-time address of _DYNAMIC (the start of .dynamic/PT_DYNAMIC). In glibc, sysdeps/*/dl-machine.h files used this approach to compute the load base (the virtual address of the ELF header).

1
2
runtime_DYNAMIC = PC relative address of _DYNAMIC
load_base = runtime_DYNAMIC - linktime_DYNAMIC = runtime_DYNAMIC - _GLOBAL_OFFSET_TABLE_[0]

So you may ask: why can't glibc extract the p_vaddr field of the PT_DYNAMIC program header. Well, its code has a poor organization and makes this elegant solution difficult...

Due to the glibc requirement, unfortunately _GLOBAL_OFFSET_TABLE_[0] has been a part of i386/x86-64 and PowerPC64 ELFv2 ABIs.

LLD's AArch64 port does not set _GLOBAL_OFFSET_TABLE_[0], so the trick does not work. I figured out an elegant fix without updating LLD:

In 2012, GNU ld and gold (included in binutils 2.23) started to define __ehdr_start which has the link-time address zero. Using a PC relative code sequence to take the runtime address of __ehdr_start gives us a better way to get the load base. I submitted patches to use the approach for aarch64/arm/riscv/i386/x86-64. The aarch64 code looks like the following. I originally intended to use inline assembly to avoid relying on compiler generating PC-relative addressing for hidden symbol access, but Szabolcs Nagy recommended the pure C approach.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/* Return the run-time load address of the shared object.  */

static inline ElfW(Addr) __attribute__ ((unused))
elf_machine_load_address (void)
{
extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
return (ElfW(Addr)) &__ehdr_start;
}

/* Return the link-time address of _DYNAMIC. */

static inline ElfW(Addr) __attribute__ ((unused))
elf_machine_dynamic (void)
{
extern ElfW(Dyn) _DYNAMIC[] attribute_hidden;
return (ElfW(Addr)) _DYNAMIC - elf_machine_load_address ();
}

I used this as an argument to mark LLD PR49672 (reported by the glibc creator:)) as wontfix. This also allowed the AArch64 ABI not to be polluted by glibc _GLOBAL_OFFSET_TABLE_[0].

Fixed by aarch64: Make elf_machine_{load_address,dynamic} robust [BZ #28203].

Non-default version symbols

If linked with LLD<13.0.0 or gold (PR28196), the built ld.so will define both __free_hook@GLIBC_2.2.5 and __free_hook@@GLIBC_2.2.5. This is due to an unfortunate GNU as/GNU ld implementation flaw of symbol versioning which happens to be relied upon by glibc.

1
2
3
4
5
6
7
GLIBC_2.2.5 {
global:
__free_hook;
...
local:
*;
};

Anyway, I conceded and implemented [ELF] Combine foo@v1 and foo with the same versionId if both are defined not to cause confusion for the unfortunate corner case.

See the discussion on defined non-default version symbols on All about symbol versioning.

In addition, I implemented [ELF] Apply version script patterns to non-default version symbols so that local: works correctly.

Tests

With the aforementioned issues addressed, LLD linked glibc was fully functioning. However, they did not accept allowing LLD in configure.ac. They expected that all tests passed. So I had to fix the following issues.

.tls_common

BZ #28152

The LLVM integrated assembler does not support the assembler directive. LLD does not support SHN_COMMON STT_TLS. The ELF standard common symbol uses STT_COMMON, so it would be incompatible with STT_TLS.

elf: Drop elf/tls-macros.h in favor of __thread and tls_model attributes [BZ #28152] [BZ #28205] fixed the issue.

AArch64's general-dynamic/local-dynamic TLS models

BZ #28205

AArch64 toolchains use TLS descriptors by default. For AArch64, LLD supports TLD descriptor relocation types but not relocation types for general-dynamic/local-dynamic (R_AARCH64_TLSGD_*/R_AARCH64_TLSLD_*). In addition, the LLVM integrated assembler doesn't support the general-dynamic/local-dynamic modifiers. So the following tests did not build with Clang or LLD.

  • elf/tst-tls1.c
  • elf/tst-tls2.c
  • elf/tst-tls3.c
  • elf/tst-tlsmod1.c
  • elf/tst-tlsmod2.c
  • elf/tst-tlsmod3.c
  • elf/tst-tlsmod4.c

elf: Drop elf/tls-macros.h in favor of __thread and tls_model attributes [BZ #28152] [BZ #28205] fixed the issue.

As a follow-up, I pushed Remove sysdeps/*/tls-macros.h. These files may have educational values for future architectures about how to implement TLS models:)

--no-tls-get-addr-optimize

GNU ld's PowerPC64 port has implemented a general-dynamic/local-dynamic TLS model optimization. LLD doesn't support --{,no-}tls-get-addr-optimize.

powerpc: Use --no-tls-get-addr-optimize in test only if the linker supports it skipped the test if necessary.

--audit and --depaudit

BZ #28151

gold and LLD do not support --audit or --depaudit.

elf: Skip tst-auditlogmod-* if the linker doesn't support --depaudit [BZ #28151] skipped the tests if necessary.

ifunc resolver calls a lazy binding PLT

For the two tests sysdeps/x86/tst-ifunc-isa-*, an ifunc resolver calls a lazy binding PLT. GNU ld's x86-64 port places the R_*_IRELATIVE relocation after the PLT's R_*_JUMP_SLOT relocation, so that the program can work without an ifunc scheduler. This is GNU ld doing something to make a special case work. The right approach is for glibc to fix the issue systematically.

I just XFAILed the two tests.

Epilogue

It is one giant step for me, but just one small step for glibc. When my ambition of building glibc with LLD started in April 2020, I felt frustrated quickly due to lack of review. For my configure: Allow LD to be LLD 9.0.0 or above patch series, I could not tell whether it was due to objection of a non-GNU toolchain or just sticking with strict requirements. Now I was told that different from the past there is no actively blocking in support of a toolchain beside GCC/binutils.

I tried to recover after a few months: "let me try again". I had put aside this work until someone on IRC told me that they were interested in making glibc static pie work with LLD. I was glad that I picked up the patches, persisted, and finally finished the work in the end of August 2021. Porting does not necessarily mean increased complexity. I actually happened to remove quite a bit of quirk from the code base.

Reviewer resources are never abundant. That said, perhaps a little bit more amicability could have made me feel better.

I wish that Clang can build glibc some day and the glibc community can advertise that Clang is fully supported. GCC and binutils-gdb have been buildable with Clang for a long time. Many GCC/binutils-gdb contributors are even happy to make Clang portability fixes. Hope that the GNU and LLVM communities can have more collaboration.

So, dear glibc, will you be happy with my sending Clang patches? :)

Current state