DSO undef and non-exported def
If a DSO has an undefined STB_GLOBAL
symbol that is
defined in a relocatable object file but not exported, should the
--no-allow-shlib-undefined
feature report an error? You may
want to check out Dependency
related linker options for a discussion of this option and the symbol
exporting rule.
For quite some time, the --no-allow-shlib-undefined
feature has been implemented in lld/ELF as follows:
1 | for (SharedFile *file : ctx.sharedFiles) { |
Recently I noticed that GNU ld implemented a related error in April 2003 (discussion).
1 | echo '.globl _start; _start: call shared' > main.s && clang -c main.s |
1 | % ld.bfd main.o a.so def.o |
A non-local default or protected visibility symbol can satisfy a DSO
reference. The linker will export the symbol to the dynamic symbol
table. Therefore, ld.bfd main.o a.so def.o
succeeds as
intended.
We encounter an error for
ld.bfd main.o a.so def-hidden.o
because a symbol with
hidden visibility cannot be exported, and it's unable to satisfy the
reference in a.so
at run-time.
Here is another interesting case: we use a version script to change
the binding of a defined symbol to STB_LOCAL
, causing it to
be unable to satisfy the reference in a.so
at run-time. GNU
ld also reports an error in this case. 1
2
3% ld.bfd --version-script=local.ver main.o a.so def.o
ld.bfd: a.out: local symbol `foo' in def.o is referenced by DSO
ld.bfd: final link failed: bad value
My recent commit https://github.com/llvm/llvm-project/commit/1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40
strengthened LLD's --no-allow-shlib-undefined
to detect
cases in which the non-exported definitions are garbage-collected. I
have landed https://github.com/llvm/llvm-project/pull/70769 to cover
non-garbage-collected cases for LLD 18.
DSO undef, non-exported def, and DSO def
A variation of the scenario mentioned above occurs when a DSO
definition is also present. Even if the executable does not export
foo
, another DSO (def.so
) may provide it. GNU
ld's check allows for this case.
1 | ld.bfd main.o a.so def-hidden.o def.so # succeeded |
It turns out that https://github.com/llvm/llvm-project/commit/1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40
unexpectedly strengthened --no-allow-shlib-undefined
to
also catch this ODR violation. More precisely, when all three conditions
are met, the new --no-allow-shlib-undefined
code reports an
error.
- There is a DSO undef that can be satisfied by a definition from
another DSO (referred to as
SharedSymbol
in lld/ELF). - The
SharedSymbol
is overridden by a non-exported (usually of hidden visibility) definition in a relocatable object file (Defined
). - The section containing the
Defined
is garbage-collected (it is not part of.dynsym
and is not marked as live).
An exported symbol is a GC root, making its section live. A non-exported symbol, however, can be discarded when its section is discarded.
So, is this error legitimate? At run-time, the undefined symbol
foo
in a.so
will be bound to
def.so
, even if the executable does not export
foo
, so we are fine. This suggests that the
--no-allow-shlib-undefined
code probably should not report
an error.
However, both def-hidden.o
and def.so
define foo
, and we know the definitions are different and
less likely benign. At the very least, they are not exactly the same due
to different visibilities or one being localized by a version
script.
A real-world report boils down to 1
2
3
4
5
6
7% ld.lld @response.txt -y _Znam
...
libfdio.so: reference to _Znam
libclang_rt.asan.so: shared definition of _Znam
libc++.a(stdlib_new_delete.cpp.obj): definition of _Znam
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: _Znam
>>> referenced by libfdio.so
How does libfdio.so
obtain a reference to
_Znam
? Well, libfdio.so
is linked against both
libclang_rt.asan.so
and libc++.a
. Due to
symbol processing rules, the definition from
libclang_rt.asan.so
takes precedence. (See Symbol processing#Shared
object overriding archive.)
An appropriate solution is to replace libc++a
with an
AddressSanitizer-instrumented version that does not define
_Znam
.
I have also encountered issues stemming from the combination of
multiple definitions from libgcc.a
(with hidden visibility)
and libclang_rt.builtins.a
(with default visibility),
relying on archive member extraction rules. 1
2
3
4
5
6
7
8% ld.lld @response.txt -y __divti3
...
a.so: reference to __divti3
libgcc.a(_divdi3.o): definition of __divti3
libc++.so: shared definition of __divti3
# A lazy symbol in libclang_rt.builtins.a is not reported by -y
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: __divti3
>>> referenced by a.so
a.so
is linked against libc++.so
and
libclang_rt.builtins.a
and obtains a reference to
__divti3
due to libc++.so
. For the executable
link, the undesired situation arises as the definition in
libgcc.a
takes precedence. What we actually want is for
libgcc.a
to provide the missing components from
libclang_rt.builtins.a
.
Some users compile relocatable object files with
-fvisibility=hidden
to disallow dynamic linking. However,
when their system includes specific shared objects, it increases the
risk of conflicting multiple definition symbols.
While this additional check introduced in https://github.com/llvm/llvm-project/commit/1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40
may not perfectly fit into --no-allow-shlib-undefined
, I
believe it has value. As a result, I have proposed --[no-]allow-non-exported-symbols-shared-with-dso
.
However, I am also on the fence that we introduce a new option, as it
may not get used.
Technically, the check can be extended to default visibility to catch all link-time symbol interposition. However, I suspect that there are a lot of benign violations and in the absence of an ignore list mechanism, this extension will not be useful.