DSO undef and non-exported def
If a DSO has an undefined STB_GLOBAL symbol that is
defined in a relocatable object file but not exported, should the
--no-allow-shlib-undefined feature report an error? You may
want to check out Dependency
related linker options for a discussion of this option and the symbol
exporting rule.
For quite some time, the --no-allow-shlib-undefined
feature has been implemented in lld/ELF as follows:
1 | for (SharedFile *file : ctx.sharedFiles) { |
Recently I noticed that GNU ld implemented a related error in April 2003 (discussion).
1 | echo '.globl _start; _start: call shared' > main.s && clang -c main.s |
1 | % ld.bfd main.o a.so def.o |
A non-local default or protected visibility symbol can satisfy a DSO
reference. The linker will export the symbol to the dynamic symbol
table. Therefore, ld.bfd main.o a.so def.o succeeds as
intended.
We encounter an error for
ld.bfd main.o a.so def-hidden.o because a symbol with
hidden visibility cannot be exported, and it's unable to satisfy the
reference in a.so at run-time.
Here is another interesting case: we use a version script to change
the binding of a defined symbol to STB_LOCAL, causing it to
be unable to satisfy the reference in a.so at run-time. GNU
ld also reports an error in this case. 1
2
3% ld.bfd --version-script=local.ver main.o a.so def.o
ld.bfd: a.out: local symbol `foo' in def.o is referenced by DSO
ld.bfd: final link failed: bad value
My recent commit https://github.com/llvm/llvm-project/commit/1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40
strengthened LLD's --no-allow-shlib-undefined to detect
cases in which the non-exported definitions are garbage-collected. I
have landed https://github.com/llvm/llvm-project/pull/70769 to cover
non-garbage-collected cases for LLD 18.
DSO undef, non-exported def, and DSO def
A variation of the scenario mentioned above occurs when a DSO
definition is also present. Even if the executable does not export
foo, another DSO (def.so) may provide it. GNU
ld's check allows for this case.
1 | ld.bfd main.o a.so def-hidden.o def.so # succeeded |
It turns out that https://github.com/llvm/llvm-project/commit/1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40
unexpectedly strengthened --no-allow-shlib-undefined to
also catch this ODR violation. More precisely, when all three conditions
are met, the new --no-allow-shlib-undefined code reports an
error.
- There is a DSO undef that can be satisfied by a definition from
another DSO (referred to as
SharedSymbolin lld/ELF). - The
SharedSymbolis overridden by a non-exported (usually of hidden visibility) definition in a relocatable object file (Defined). - The section containing the
Definedis garbage-collected (it is not part of.dynsymand is not marked as live).
An exported symbol is a GC root, making its section live. A non-exported symbol, however, can be discarded when its section is discarded.
So, is this error legitimate? At run-time, the undefined symbol
foo in a.so will be bound to
def.so, even if the executable does not export
foo, so we are fine. This suggests that the
--no-allow-shlib-undefined code probably should not report
an error.
However, both def-hidden.o and def.so
define foo, and we know the definitions are different and
less likely benign. At the very least, they are not exactly the same due
to different visibilities or one being localized by a version
script.
A real-world report boils down to 1
2
3
4
5
6
7% ld.lld @response.txt -y _Znam
...
libfdio.so: reference to _Znam
libclang_rt.asan.so: shared definition of _Znam
libc++.a(stdlib_new_delete.cpp.obj): definition of _Znam
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: _Znam
>>> referenced by libfdio.so
How does libfdio.so obtain a reference to
_Znam? Well, libfdio.so is linked against both
libclang_rt.asan.so and libc++.a. Due to
symbol processing rules, the definition from
libclang_rt.asan.so takes precedence. (See Symbol processing#Shared
object overriding archive.)
An appropriate solution is to replace libc++a with an
AddressSanitizer-instrumented version that does not define
_Znam.
I have also encountered issues stemming from the combination of
multiple definitions from libgcc.a (with hidden visibility)
and libclang_rt.builtins.a (with default visibility),
relying on archive member extraction rules. 1
2
3
4
5
6
7
8% ld.lld @response.txt -y __divti3
...
a.so: reference to __divti3
libgcc.a(_divdi3.o): definition of __divti3
libc++.so: shared definition of __divti3
# A lazy symbol in libclang_rt.builtins.a is not reported by -y
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: __divti3
>>> referenced by a.so
a.so is linked against libc++.so and
libclang_rt.builtins.a and obtains a reference to
__divti3 due to libc++.so. For the executable
link, the undesired situation arises as the definition in
libgcc.a takes precedence. What we actually want is for
libgcc.a to provide the missing components from
libclang_rt.builtins.a.
Some users compile relocatable object files with
-fvisibility=hidden to disallow dynamic linking. However,
when their system includes specific shared objects, it increases the
risk of conflicting multiple definition symbols.
While this additional check introduced in https://github.com/llvm/llvm-project/commit/1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40
may not perfectly fit into --no-allow-shlib-undefined, I
believe it has value. As a result, I have proposed --[no-]allow-non-exported-symbols-shared-with-dso.
However, I am also on the fence that we introduce a new option, as it
may not get used.
Technically, the check can be extended to default visibility to catch all link-time symbol interposition. However, I suspect that there are a lot of benign violations and in the absence of an ignore list mechanism, this extension will not be useful.