Dependency related linker options

This article describes the dependency related linker options -z defs, --no-allow-shlib-undefined, and --warn-backrefs. Deploying them in a build system can improve build health.

-z defs

-z defs (alias --no-undefined) tells the linker to report an error for an unresolved undefined symbol from a relocatable object file. Executable links (-no-pie and -pie) default to -z defs while shared object links (-shared) default to -z undefs.

"Unresolved" means that (a) no other relocatable object file linked into the component provides a definition, and (b) no shared object linked into the component provides a definition.

When an undefined symbol in A is provided by a link-time shared object B, we can say that A depends on B. The linker will record this fact by adding a DT_NEEDED dynamic tag. If B has a DT_SONAME, the DT_NEEDED value is the SONAME; otherwise the DT_NEEDED value is the path of B (either absolute or relative).

I can think of several reasons that the loose default for -shared links was chosen.

  1. ELF supports interposition. Say, a shared object has an undefined symbol. The symbol may be provided by an arbitrary shared object or by the executable at runtime. By not requiring the dependencies to be fully specified, runtime can have flexibility.

  2. There may be mutual references between two shared objects A and B. We cannot break the tie with a regular approach if fully dependencies need to be specified. A variant is that the executable has A as its dependency while A also references symbols from the executable.

For (a), such unbounded flexibility does not quite fit into a build system. With a modular design, the libraries should have well-defined roles and dependencies. We do not substitute an arbitrary shared object for a link-time shared object. When we need such flexibility, we can define an interface and make several shared objects implement the interface.

Having fully specified dependencies makes the shipped shared object A convenient to use. It is likely that an executable link does not use A's dependency B. It would feel awkward if the executable link needs to additionally link against B when linking against A.

If you have read my article about ELF interposition, you should know that having the dependency information makes direct bindings possible, which can improve symbol lookup time for the dynamic loader. No ELF system other than Solaris has implemented direct bindings, though.

  1. indicates a bad layering of libraries. A and B are no longer isolated components. A change in A may affect B and vice versa. The unit testing for A needs to involve B. With archives, you will need --start-group A.a B.a --end-group with GNU ld and gold. Actually merging A and B is often a better strategy.

If we don't merge A and B, http://blog.darlinghq.org/2018/07/mach-o-linking-and-loading-tricks.html mentions that the Mach-O approach for such circular dependencies is usually to link the libraries twice.

1
2
3
4
ld -o libfoo.dylib foo.o -flat_namespace -undefined suppress
ld -o libbar.dylib bar.o -flat_namespace -undefined suppress
ld -o libfoo.dylib foo.o libbar.dylib
ld -o libbar.dylib bar.o libfoo.dylib

The ELF counterpart is:

1
2
3
4
ld -shared foo.o -o foo.so
ld -shared bar.o -o bar.so
ld -shared -z defs foo.o bar.so -o foo.so
ld -shared -z defs bar.o foo.so -o bar.so

A build system may support archives as well as shared objects. An archive is a collection of regular object files, with special archive member selection semantics in the linker. (We will discuss the archive member selection later.) If A.a needs definitions from B.a and you do not supply B.a when linking a dependent executable (ld ... A.a instead of ld ... A.a b.a), you know that you may be welcomed by undefined reference to (GNU ld) or undefined symbol: (LLD). In practice a build system needs to track dependencies for archives.

--no-allow-shlib-undefined

--no-allow-shlib-undefined tells the linker to report an error for an unresolved undefined symbol from a shared object. Executable links (-no-pie and -pie) default to -z defs while shared object links (-shared) default to -z undefs.

E.g. for ld -shared a.o b.so -o a.so, if b.so has an undefined symbol not defined by a.so or b.so's dependencies, the linker will report an error.

gold and ld.lld do not recursively load DT_NEEDED tags. Instead, they report an error only when b.so's DT_NEEDED list is all on the linker command line. Say, b.so depends on c.so and d.so. ld.lld -shared a.o b.so c.so -o a.so will not error for an unresolved undefined symbol in b.so because d.so is not the linker command line. ld.lld -shared a.so b.so c.so d.so -o a.so may error.

If a build system is -z defs clean, it will also be --no-allow-shlib-undefined clean. If a build system cannot use -z defs, --no-allow-shlib-undefined can catch some propagated problems.

1
2
3
4
5
6
7
// a.cc => a.o => a.so (not linked with -z defs)
void f(); // f is undefined
void g() { f(); }

// b.cc => b.o
void g();
int main() { g(); }

ld b.o a.so will report an error for the undefined symbol f in a.so.

If we use Bazel layering check features as an analogy, -z defs (link what you use) is like layering_check (include what you use) while --no-allow-shlib-undefined is a bit like hdrs_check.

Archive processing

This concept is required to understand --warn-backrefs. Please check out Symbol processing#Archive processing.

--warn-backrefs

VMS (now OpenVMS), Mach-O ld64, Windows link.exe, and ld.lld use a different design.

For ld.lld ... definition.a reference.o, the archive index of definition.a lists the defined symbols. When processing definition.a, ld.lld uses "lazy symbols" to represent the lazy definitions. Each lazy symbol has an associated archive(member) name. When processing reference.o, an undefined symbol can cause the lazy symbol to be fetched, i.e. a previous definition.a member will be extracted.

This archive processing strategy is nice because in the absence of duplicate definitions, ld.lld ... definition.a reference.o and ld.lld ... reference.o definition.a cause no ordering difference.

Moreover, --start-group is a no-op in ld.lld. The traditional approach may iterate over the archive members more than once, and --start-group can exacerbate the problem. The ld.lld approach turns out to be easier to implement and can improve archive processing performance.

However, the lazy symbol representation (without --warn-backrefs) loses one major advantage of the traditional ELF approach: loose layering check.

Say, we have 4 libraries: a, b, c, and d. The dependency edges are: a->b, a->c, b->d, c->d. There is an unspecified edge: b->c. Let's consider two link orders.

  • ld ... a.a c.a b.a d.a may lead to an error, because some members of c.a may be dropped if they do not resolve a previously undefined symbol.
  • ld ... a.a b.a c.a d.a is fine, because after b.a is processed, we should have seen all the symbol requirements on c.a's members. a, b, c, d is a topological order of the full dependency list.

The layering check is loose because it only checks one particular topological order. Nevertheless, a build system using this option can catch many missing dependency edges.

Working at the binary level, the option can catch some problems not detected by modules based layering check clang -fmodule-name=X -fmodules-strict-decluse (error: module X does not depend on a module exporting 'string.h'), e.g. using a declaration without including a header.

For ld.lld, --warn-backrefs was added to check archive processing compatibility problems with GNU ld. In LLD 11.0.0, I significantly improved --warn-backrefs to make the compatibility checking reliable.

Because the archive name is remembered, the diagnostic (e.g. warning: backward reference detected: foo in a1.o refers to a2.o) is better than GNU ld's (no filename).