This article describes how to detect C++ One Definition Rule (ODR) violations. There are many good resources on the Internet about how ODR violations can introduce subtle bugs, so I will not repeat that here.
Debug information based
gold
--detect-odr-violations
In 2007,
the gold linker implemented an option
--detect-odr-violations
to detect ODR violation based on
debug information. This option collects symbols starting with
_Z
and finds two STB_WEAK
definitions with
different st_size
or different st_type
values.
These symbols are candidates of ODR violation.
gold parses DWARF line tables. For a candidate, if both its definitions have associated line table information (if any definition does not have debug info, no warning) and disjoint file:line sets. If yes, gold issues a warning.
The check uses source locations as a proxy as an ODR violation. The proxy is usually good but not precisely ODR violation. The first line of a function may change among relocatable object files due to different optimization behaviors. And we may see spurious ODR violations. The check does not find differing class/enum definitions and templates.
The option uncovered some Chromium bugs, see https://crbug.com/449754.
The feature is not implemented in other linkers. This idea of using debug information is interesting. See "Future direction" for a non-debug-information alternative, which may possibly be better.
A debug information based approach is potentially slow. As an
analogy, constructing .gdb_index
is easily the slowest path
in the linker.
adobe/orc
https://github.com/adobe/orc is a tool for finding violations of C++'s One Definition Rule on the OSX toolchain. It wraps libtool/ld on macOS to collect input object files and parses debugging information entries and verifies that certain attributes match.
As a standalone program, it can focus on user interface.
Finding ODR Violations with ORC was a talk on ACCU 2022.
LTO based
GCC
-Wlto-type-mismatch
and -Wodr
-Wlto-type-mismatch
is an enabled-by-default warning for
LTO about mismatched types of types (structs, function signatures, etc).
See gcc/lto/lto-symtab.cc:warn_type_compatibility_p
.
1 | echo 'struct A { int x; } a; int main(void) {}' > a.c |
1 | % gcc -flto -fuse-ld=bfd a.c b.c |
C++ has an additional diagnostic -Wodr
about mismatched
types of C++ global declarations.
GCC also reports -Wodr
diagnostics for mismatched
external linkage enumeration types. (lld/ELF
example) 1
2echo 'enum A {X} a; int main() {}' > a.cc
echo 'enum A {Y} b;' > b.cc1
2
3
4
5
6
7
8
9
10
11
12
13% g++ -fuse-ld=bfd -flto a.cc b.cc
a.cc:1:8: warning: type ‘struct A’ violates the C++ One Definition Rule [-Wodr]
1 | struct A {int x;} a; int main() {}
| ^
b.cc:1:8: note: a different type is defined in another translation unit
1 | struct A {} b;
| ^
a.cc:1:15: note: the first difference of corresponding definitions is field ‘x’
1 | struct A {int x;} a; int main() {}
| ^
b.cc:1:8: note: a type with different number of fields is defined in another translation unit
1 | struct A {} b;
| ^
1 | echo 'enum A {X} a; int main() {}' > a.cc |
1 | % g++ -fuse-ld=bfd -flto a.cc b.cc |
1 | % g++ -fuse-ld=bfd -flto a.cc b.cc |
It cannot detect the case when a member function is changed from inline to non-inline.
1 | cat > a.cc <<'eof' |
-Wodr
does not detect
inline int foo() { return 1; }
vs
inline int foo() { return 2; }
.
See block:618550 ODR for Gentoo's bug list.
Implement -Wlto-type-mismatch is an llvm-project feature request.
ThinLTO
ThinLTO computes a module summary for functions, global variables, aliases, and ifuncs. Technically the module summary can be overloaded to record ODR hashes, but coupling this with an optimization-targeted feature seems weird and adds size overhead even when the feature is not used.
Clang ODR hash
In 2017, Clang implemented an AST-based ODR hash feature. Each definition is given a hash value. When definitions are merged, the hash values are compared and an error is reported if mismatching.
This feature works with both Clang header modules and C++ modules.
1 | echo 'module B { header "B.h" } module C { header "C.h" }' > module.modulemap |
-fmodules
implies -fimplicit-modules
to
load module.modulemap
. The two #include
directives are translated to module loads. When foo
in B
and C are merged, an error is issued.
1 | In module 'C' imported from A.cc:2: |
Let's see an example of C++ modules.
1 | echo 'import B; import C; int main() { return foo(); }' > A.cc |
1 | In file included from A.cc:1: |
Compiler instrumentation based
#pragma detect_mismatch
This pragma is supported by MSVC and Clang. See https://devblogs.microsoft.com/oldnewthing/20160803-00/?p=94015.
This is implemented using the linker option
/failifmismatch:
.
In ELF, this feature can be emulated with SHN_ABS
symbols. GNU ld and lld do not report a duplicate definition error when
two SHN_ABS
symbols have the same value.
AddressSanitizer
detect_odr_violation
For an instrumented translation unit, there is a global constructor
which calls __asan_register_globals
to register some types
of global variables (non-thread-local, defined,
external/private/internal
LLVM linkage, and a few other
conditions). This can be used to check whether two global variables of
the same name are defined in different modules. In 2014,
detect_odr_violation
was implemented for this idea. Note:
functions and vague linkage symbols are not instrumented, so the
interesting case is skipped.
Poisoning based detection
The runtime poisons the red zone of a to-be-registered global
variable (compiler-rt/lib/asan/asan_globals.cpp
). If the
variable was poisoned when attempting a registration, it means that the
variable has been registered by another linkage unit. The runtime will
report an ODR violation error.
1 | echo 'int var; int main() { return var; }' > a.cc |
1 | % ./a |
The default detect_odr_violation=2
mode additionally
disallows symbol interposition on variables. Change long
in
b.cc
to int
and we will still see an
odr-violation
error. detect_odr_violation=1
suppresses errors if the registered variable is of the same size.
1 | % ASAN_OPTIONS=detect_odr_violation=1 ./a |
This approach has a drawback when a global variable is defined in a non-instrumented TU and an instrumented TU, and the linker selects the non-instrumented TU.
The variable metadata references the interposable variable symbol. If an instrumented global variable is interposed by an uninstrumented one, the runtime may poison bytes not belonging to the global variable. Since poisoning writes to shadow memory, this is usually benign. However, global variable instrumentation increases the alignment of a global variable (to at least 32) and checks that the metadata-referenced variable symbol has an alignment of at least shadow granularity (8). If the referenced variable symbol resolves to a non-instrumented module, the alignment check may fail (if the symbol is less aligned) and in this case the runtime reports a bogus odr-violation error as well.
Let's see an example. I add a dummy variable to make var
not aligned by 8 in a.o
(no guarantee but working in
practice).
1 | echo 'char pad, var; int main() { return pad + var; }' > a.cc |
1 | % ./a |
ODR indicator
http://reviews.llvm.org/D15642 introduced a new mode:
for a variable var
, a one-byte variable
__odr_asan_gen_var
is created with the original linkage
(essentially only external
). If var
is defined
in two instrumented modules, their __odr_asan_gen_var
symbols reference to the same copy due to symbol interposition. When
registering var
, set the associated
__odr_asan_gen_var
to 1. The runtime checks whether
__odr_asan_gen_var
is already 1, and if yes, the variable
has an ODR violation.
To prevent the metadata-referenced symbol from interposed to another
linkage unit, create a private alias for var
to be
referenced in the metadata. This ensures that the metadata refers to the
self copy.
1 | echo 'int var; int main() { return var; }' > a.cc |
For Clang 16, I landed https://reviews.llvm.org/D137227 to use
-fsanitize-address-use-odr-indicator
by default for
non-Windows targets.
https://reviews.llvm.org/D127911 changed the ODR
indicator symbol name to __odr_asan_gen_$demangled
.
KCFI
Clang has recently implemented a indirect call control flow integrity instrumentation which does not require link-time optimization: KCFI. One side product of this feature is related to ODR violation detection.
For an address-taken function, a weak absolute symbol
__kcfi_typeid_<function>
is defined. The symbol is
weak. But imagine we use a STB_GLOBAL
symbol, a linker can
find differing values. GNU ld has a hack that duplicate absolute
definitions do not trigger an error and ld.lld has ported the behavior.
While such a scheme would work, using magic symbols is not proper usage
of a linker and I would object to such an attempt.
MSVC LNK2022
In MSVC, the /clr
switch seems to insert some metadata.
The linker is able to report
LNK2022: metadata operation failed
.
lld --no-allow-shlib-undefined
DSO undef and non-exported definition describes a case when a relocatable object file provides a non-exported definition and a DSO provides another definition.
lld COMDAT resolution
While implementing parallel section initialization for ld.lld, I changed ld.lld symbol
resolution to disregard COMDAT resolution (COMDAT resolution was
moved to a later pass). This combined with AddressSanitizer
-fsanitize-address-globals-dead-stripping -fsanitize-address-use-odr-indicator
turns out to catch some violations of classes with vtables. This is an
interesting side product I did not anticipate.
With
-fsanitize-address-globals-dead-stripping -fsanitize-address-use-odr-indicator
,
a global variable is placed in a COMDAT group so that the global
variable (var
) along with the metadata
(__asan_gen_var
) can be discarded by the linker if
unreferenced. The associated ODR indicator is not in a COMDAT.
1 | cat > a.cc <<'eof' |
1 | ; b.ll |
1 | % clang++ -fsanitize=address -fuse-ld=lld a.o b.o |
In a.o
, the vtable for A
has vague linkage
and compiles to a weak symbol _ZTV1A
in a COMDAT. In
b.o
, the vtable for A
has regular external
linkage and compiles to a STB_GLOBAL
symbol
_ZTV1A
. Normally b.o:_ZTV1A
is not in a COMDAT
but in
-fsanitize-address-globals-dead-stripping -fsanitize-address-use-odr-indicator
mode the symbol is in a COMDAT.
During linking, as a.o
precedes b.o
, the
COMDAT group in a.o
is prevailing while the COMDAT group in
b.o
is non-prevailing. With lld's new symbol resolution
rule, b.o:_ZTV1A
overrides a.o:_ZTV1A
but
later the definition is discarded (the symbol becomes undefined) because
it is in a non-prevailing COMDAT. A relocation referencing the noew
undefined _ZTV1A
will cause an error.
There are similar errors for typeinfo for A
and
typeinfo name for A
unless -fno-rtti
is
specified.
If we make the COMDAT with the STB_GLOBAL
symbol
prevailing, the errors will be suppressed. 1
clang++ -fsanitize=address -fuse-ld=lld b.o a.o
Since typeinfo
and typeinfo name
can
trigger this error, we can change a.cc
to not have a
vtable. Just use typeid
.
1 | cat > a.cc <<'eof' |
1 | % clang++ -fuse-ld=lld -fsanitize=address a.o b.o |
Summary
There are several ways we can categorize these tools.
- run-time analysis: AddressSanitizer
detect_odr_violation
- static analysis: the others
By scope:
- single translation unit: Clang ODR hash
- linkage unit: most tools
- cross linkage unit: AddressSanitizer
detect_odr_violation
By entity:
- GCC
-Wlto-type-mismatch
and-Wodr
: type - gold
--detect-odr-violations
: C++ vague linkage functions - lld with
-fsanitize-address-globals-dead-stripping -fsanitize-address-use-odr-indicator
: C++ vague linkage functions and variables: (partial) - adobe/orc: many
- Clang ODR hash: almost all
Future direction
Having an approach for LTO is nice, but we probably want an approach usable without LTO or debug information.
As Clang has implemented the heavylifting work of ODR hashes, we can implement a feature to collect the hashes into a custom section. We can change lld to scan this section and find differing values.
See RFC:
ODR checker for Clang and LLD for a 2017 RFC. The effort added
SHT_LLVM_ODRTAB
and was not upstreamed.
I think the section can be a table holding ODR hash values. Each
value is associated with a R_*_NONE
relocation referencing
the associated symbol table entry. For classes which do not produce a
symbol (i.e. no vtable), the compiler can generate a symbol solely for
ODR violation detection.
I will probably implement this feature if people find this useful.