Static initialization order fiasco and ELF


Table of Contents

  • "Static initialization order fiasco" bugs
  • ELF initialization and termination functions
  • glibc behavior
  • ld.lld -Wl,--shuffle-sections=.init_array=-1,--shuffle-sections=.fini_array=-1

Initialization

  • Constant initialization and zero initialization
  • Dynamic initialization
  • main
  • Deferred dynamic initialization (e.g. optimized out, on-demand shared library)

Dynamic initialization

  • Unordered dynamic initialization (static data members and variable templates not explicitly specialized)
  • Partially-ordered dynamic initialization (inline variables that are not an implicitly or explicitly instantiated specialization)
  • Ordered dynamic initialization (other non-local variables)
1
2
3
4
5
6
7
struct A { A(); };

namespace testing {
A& a = *new A;
}

static A b;

[basic.start.dynamic]: If V and W have ordered initialization and the definition of V is appearance-ordered before the definition of W, or if V has partially-ordered initialization, W does not have unordered initialization, and for every definition E of W there exists a definition D of V such that D is appearance-ordered before E, then ... V is sequenced before the initialization of W ... otherwise the initializations of V and W are indeterminately sequenced.

If no appearance-ordered relationship, two initializations are indeterminately sequenced. Two initializations in different TUs have an unspecified order.


Static initialization order fiasco

1
2
3
4
// a.cc
A a; // A's constructor uses b
// b.cc
B b;
1
2
3
4
5
6
7
8
// registry.cc
Registry registry;
// a.cc
Register a;
// b.cc
Register b;
// c.cc
Register c;

  • SIGSEGV
  • unpredicted order of a collection
  • Use an unconstructed object (initial value is usually zero) to construct another object

Solutions

  • constant initialization
  • lazy initialization (dynamic initialization of function-locale static, llvm::ManagedStatic, etc)
  • manual initialization

  • constexpr
  • constinit (constexpr - const)
  • clang -Wglobal-constructors: warning: declaration requires a global constructor [-Wglobal-constructors]

GCC extension

GCC supports __attribute__((constructor)) which can make an arbitrary function to be called before main.

In addition, a constructor can have an optional priority. Priorities from 0 to 100 are reserved for the implementation (-Wprio-ctor-dtor).

A constructor runs before another with a larger priority.

For example, gcov uses __attribute__((destructor(100))).

Applications can use 101 to 65535. 65535 (.init_array or .ctors, without a suffix) has the same priority as a dynamic initialization in C++.


GCC supports A a __attribute__((init_priority(2000))); which can control the priority of a C++ dynamic initialization.


.ctors and .init_array

  • C++ dynamic initialization
  • GNU function attribute __attribute__((constructor))
  • Assembly (rare), e.g. .section .init_array,"aw",@init_array

.dtors and .fini_array are for finalization/termination/destruction/cleanup.


  • System V Release 4: The implementation processes the DT_INIT (_init) function, which executes .ctors backwards
  • HP-UX: The implementation processes the DT_INIT_ARRAY array forwards

For GCC, the scheme is fixed at configure time. Since 4.7 .init_array is the default for Linux.

Clang is a natural cross compiler. -fno-use-init-array (default on PS4 (FreeBSD 9) and MinGW) selects the .ctors/.dtors scheme.

-fuse-init-array (default elsewhere) selects the .init_array/.fini_array scheme.


Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Google C++ Style Guide: Dynamic initialization of nonlocal
// variables // is discouraged, and in general it is forbidden.
// However, we do permit it if no aspect of the program
// depends on the sequencing of this initialization with
// respect to all other initializations.
struct S { S(); };
S& s = *new S;

pid_t pid = getpid();

// Function attributes
__attribute__((constructor(101))) void init101() {
puts("init101");
}
__attribute__((constructor(102))) void init101() {
puts("init102");
}
__attribute__((constructor)) void init() {
puts("init65536");
}

int main(void) { return 0; }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
.section	.text.startup,"ax",@progbits
_GLOBAL__sub_I_a.cc:
... callq _Znwm
... callq _ZN1SC1Ev
... callq getpid

.section .init_array.101,"aw",@init_array
## legacy: .section .ctors.65434,"aw",@progbits
.p2align 3
.quad _Z7init101v

.section .init_array.102,"aw",@init_array
## legacy: .section .ctors.65433,"aw",@progbits
.p2align 3
.quad _Z7init102v

.section .init_array,"aw",@init_array
## legacy: .section .ctors,"aw",@progbits
.p2align 3
.quad _Z4initv
.quad _GLOBAL__sub_I_a.cc

GNU ld

GNU ld can merge input .init_array* and .ctors* into the output .init_array. The sorting may not be sound if both are used, though...

1
2
3
4
5
6
7
8
/* Fragment of GNU ld's internal linker script */
.init_array :
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors))
PROVIDE_HIDDEN (__init_array_end = .);
}

The next slide uses ld.lld, which does not do the smart merging.


Linker behavior

1
2
3
4
5
6
7
8
9
10
a.o:(.init_array.101) b.o:(.init_array.101)
a.o:(.init_array.102) b.o:(.init_array.102)
...
a.o:(.init_array.65533) b.o:(.init_array.65533)
a.o:(.init_array.65534) b.o:(.init_array.65534)
a.o:(.init_array) b.o:(.init_array)

ld collects .init_array and .init_array.* into
the output section .init_array and creates
DT_INIT_ARRAY/DT_INIT_ARRAYSZ.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## ctors_priority = 65535-init_array_priority
crtbegin.o(.ctors)
a.o:(.ctors) b.o:(.init_array)
a.o:(.ctors.00001) b.o:(.ctors.00001)
a.o:(.ctors.00002) b.o:(.ctors.00002)
...
a.o:(.ctors.65433) b.o:(.init_array.65533)
a.o:(.ctors.65434) b.o:(.init_array.65534)
...
crtend.o(.ctors)

ld collects .ctors and .ctors.* into
the output section .ctors .

crtbegin.o defines __CTOR_LIST__ and
crtend.o defines __CTOR_LIST_END__.

Note the leading zero digits. crtbegin/crtend are sentinels. ld uses strcmp to compare two non-sentinel sections.


The .ctors approach uses magic sentinel values. _init uses fragmented functions which is a bad practice.

Its elements are run in the reversed order. This is a clever trick to make static linking similar to dynamic linking.


Dedicated section types (SHT_INIT_ARRAY SHT_FINI_ARRAY) are a good practice.

The elements are run in the forward order.


C++ dynamic initialization does not promise an order between two TUs. However, an implementation has to define an order for determinism. (read "reproducible builds")

Determinism became promise. Promise became bugs.


glibc ld.so and libc behavior

  • ld.so runs DT_INIT and DT_INIT_ARRAY in shared objects. If a.so depends on b.so, a.so's ctors run after b.so's
  • libc_nonshared.a runs DT_INIT and DT_INIT_ARRAY in the executable
  • crtbegin.o defines a _init fragment which calls .ctors constructors

Order:

  • ld.so runs c.so:DT_INIT. The crtbegin.o fragment of _init calls .ctors
  • ld.so runs c.so:DT_INIT_ARRAY
  • ld.so runs b.so:DT_INIT. The crtbegin.o fragment of _init calls .ctors
  • ld.so runs b.so:DT_INIT_ARRAY
  • libc_nonshared.a runs a:DT_INIT. The crtbegin.o fragment of _init calls .ctors
  • libc_nonshared.a runs a:DT_INIT_ARRAY

On modern Linux/*BSD distributions, crtbegin.o no longer calls .ctors. glibc's RISC-V port doesn't even define ELF_INIT_FINI, i.e. DT_INIT is gone.


Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// a.cc -> a
#include <stdio.h>
__attribute__((constructor)) void init() { puts("init"); }
extern "C" void ctors() { puts("ctors"); }
asm(".pushsection .ctors,\"aw\"; .quad ctors; .popsection");
int main() {}

// b.cc -> b.so
#include <stdio.h>
__attribute__((constructor)) void init_b() { puts("init b"); }
extern "C" void ctors_b() { puts("ctors b"); }
asm(".pushsection .ctors,\"aw\"; .quad ctors_b; .popsection");

// c.cc -> c.so
#include <stdio.h>
__attribute__((constructor)) void init_c() { puts("init c"); }
extern "C" void ctors_c() { puts("ctors c"); }
asm(".pushsection .ctors,\"aw\"; .quad ctors_c; .popsection");
1
2
3
4
5
6
7
8
9
10
11
12
13
% clang -fpic -shared b.cc -o b.so
% clang -fpic -shared c.cc -o c.so
% clang a.cc ./b.so ./c.so -o a
% readelf -d a
... (NEEDED) Shared library: [./b.so]
... (NEEDED) Shared library: [./c.so]
% ./a
ctors c
init c
ctors b
init b
ctors
init

A Bazel example

  • --dynamic_mode=off uses *.a (or --start-lib which has similar semantics)
  • --dynamic_mode=fully uses *.so

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat > ./a0.cc <<eof
#include <stdio.h>
void use_b0(); void use_b1(); void use_c0(); void use_c1();
__attribute__((constructor)) void ctor_a0() { puts("ctor a0"); }
int main() { use_b0(); use_b1(); use_c0(); use_c1(); }
eof
cat > ./a1.cc <<eof
#include <stdio.h>
__attribute__((constructor)) void ctor_a1() { puts("ctor a1"); }
eof
cat > ./x.cc <<eof
#include <stdio.h>
__attribute__((constructor)) void ctor_X() { puts("ctor X"); }
void use_X() {}
eof
1
2
3
4
5
6
7
8
9
10
11
sed 's/X/b0/g' x.cc > b0.cc
sed 's/X/b1/g' x.cc > b1.cc
sed 's/X/c0/g' x.cc > c0.cc
sed 's/X/c1/g' x.cc > c1.cc

touch ./WORKSPACE
cat > ./BUILD <<eof
cc_binary(name = "a", srcs = ["a0.cc", "a1.cc"], deps = [":b", ":c"])
cc_library(name = "b", srcs = ["b0.cc", "b1.cc"], deps = [":c"])
cc_library(name = "c", srcs = ["c0.cc", "c1.cc"])
eof

Re-think .ctors vs .init_array

  • b.a + c.a with .ctors: c1, c0, b1, b0, a1, a0
  • b.so + c.so with .ctors: c1, c0, b1, b0, a1, a0
  • b.a + c.a with .init_array: a0, a1, b0, b1, c0, c1
  • b.so + c.so with .init_array: c1, c0, b1, b0, a1, a0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
## .init_array
bazel run :a --dynamic_mode=off
# a0,a1,b0,b1,c0,c1

bazel run :a --dynamic_mode=fully
# c0,c1,b0,b1,a0,a1

bazel run :a --dynamic_mode=off --linkopt=-fuse-ld=lld --linkopt=-Wl,--shuffle-sections=.init_array=-1
# c1,c0,b1,b0,a1,a0

bazel run :a --dynamic_mode=fully --linkopt=-fuse-ld=lld --linkopt=-Wl,--shuffle-sections=.init_array=-1
# c1,c0,b1,b0,a1,a0

## .ctors
bazel run :a --repo_env=CC=clang :a --copt=-fno-use-init-array --dynamic_mode={fully,off} --linkopt=-fuse-ld=lld
# If libgcc supports .ctors: c1,c0,b1,b0,a1,a0
# Otherwise: no output

Re-think .ctors vs .init_array (cont)

With .ctors, --dynamic_mode=off and --dynamic_mode=fully have similar orders. The orders may be different when the glibc topological order does not agree with the build system dependency order.

Corollary: We don't get sufficient coverage. Re-organizing libraries may expose lurking bugs.

With .init_array, --dynamic_mode=off and --dynamic_mode=fully have very different orders. cc_test+cc_binary give a good coverage. However, the order within a cc_library is identical.

With .init_array and --dynamic_mode=off, testing both the regular order and the reversed order gives a good coverage.

If both a/b/c and c/b/a work, we have confidence that a, b, c unlikely have order dependency.

The ELF specification guarantee "if a.so depends on b.so, then b.so's constructors run first" can hardly be leveraged.


--shuffle-sections=.init_array=-1

--shuffle-sections=<section-glob>=<seed>: Shuffle matched input sections using the given seed before mapping them to the output sections.

If -1, reverse the section order.

--shuffle-sections=.init_array=-1: reverse the order of .init_array input sections. .init_array.priority sections are unaffected.


AddressSanitizer check_initialization_order=true

Enabled by default due to strict_init_order=true

This checks a dynamic initialization does not touch memory regions of other global variables.

My feeling: it can catch less than 15% bugs.