Weak symbol

C/C++

GCC and Clang support __attribute__((weak)) which marks a symbol weak. The same effect can be achieved with a preprocessor directive #pragma weak symbol.

Binary format

In ELF, there are three main symbol bindings. The ELF specification says:

  • STB_LOCAL: Local symbols are not visible outside the object file containing their definition. Local symbols of the same name may exist in multiple files without interfering with each other.
  • STB_GLOBAL: Global symbols are visible to all object files being combined. One file's definition of a global symbol will satisfy another file's undefined reference to the same global symbol.
  • STB_WEAK: Weak symbols resemble global symbols, but their definitions have lower precedence.

In the GNU ABI, there is another binding STB_GNU_UNIQUE, which is like STB_GLOBAL with extra semantics (unique even with RTLD_LOCAL, nodelete).

In GNU as flavored assembly, you can set the binding of a symbol via .weak sym.

Since a symbol has only one binding, a symbol cannot be global and weak at the same time. However, in GNU as, .weak overrides .globl since 1996. In the LLVM integrated assembler, the last directive wins. Since LLVM 12 (https://reviews.llvm.org/D90108), the integrated assembler errors/warns for changed binding. For .globl sym; .weak sym, it reports a warning instead of an error because the behavior actually matches GNU as, but relying on directive overridding is error-prone.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# error: local changed binding to STB_GLOBAL
local:
.local local
.globl local

## `.globl x; .weak x` matches the GNU as behavior. llvm-mc issues a warning.
# warning: global changed binding to STB_WEAK
global:
.global global
.weak global

# error: weak changed binding to STB_LOCAL
weak:
.weak weak
.local weak

A weak symbol can be either a definition or a reference (i.e. undefined). This is distinguished by the section index:

  • st_shndx==SHN_UNDEF: weak reference
  • st_shndx!=SHN_UNDEF: weak definition

Semantics required by the ELF specification

The specification says very little about a weak symbol.

When the link editor combines several relocatable object files, it does not allow multiple definitions of STB_GLOBAL symbols with the same name. On the other hand, if a defined global symbol exists, the appearance of a weak symbol with the same name will not cause an error. The link editor honors the global definition and ignores the weak ones. Similarly, if a common symbol exists (that is, a symbol whose st_shndx field holds SHN_COMMON), the appearance of a weak symbol with the same name will not cause an error. The link editor honors the common definition and ignores the weak ones.

When the link editor searches archive libraries [see ``Archive File'' in Chapter 7], it extracts archive members that contain definitions of undefined global symbols. The member's definition may be either a global or a weak symbol. The link editor does not extract archive members to resolve undefined weak symbols. Unresolved weak symbols have a zero value.

Weak definitions allow multiple definitions. While a global definition can override a weak definition, it is unspecified what the linker should do when there are two weak definitions of the same name but no global definition. In GNU ld/gold/ld.lld, the linker selects the first weak definition and resolves all references to it.

An undefined weak symbol does not extract archive members. (LLD uses a weak LazyArchive/LazyObject to represent such a symbol.)

There is a remark

The behavior of weak symbols in areas not specified by this document is implementation defined. Weak symbols are intended primarily for use in system software. Applications using weak symbols are unreliable since changes in the runtime environment might cause the execution to fail.

Weak definitions

In C++, inline functions, template instantiations and a few other things can be defined in multiple object files but need deduplication at link time.

Before .gnu.linkonce.*/GRP_COMDAT were invented, the implementations used weak definitions to avoid linker multiple definition errors. The weak definition convention remains post GRP_COMDAT. The weak definition can be used for compatibility for linkers which do not understand pre-COMDAT .gnu.linkonce.* or COMDAT. Using STB_GLOBAL for COMDAT definitions can detect ODR violations caused by a COMDAT definition and a non-COMDAT definition. This will however be a significant behavior change. gold has an option --detect-odr-violations: the option checks whether there are two weak definitions with different file:line debugging information.

A replaceable definition can be declared weak. A STB_GLOBAL definition from another translation unit can override it. This is a great way providing a default/fallback definition in a library which can be redefined by applications.

1
2
3
4
5
6
7
8
9
10
11
12
13
// lib.cc
__attribute__((weak)) void fun() {
...
}

void feature() {
fun();
}

// app.cc - override the default implementation
void fun() {
...
}

A weak alias is a special form of weak definitions. It defines a weak symbol by reusing an existing definition. A useful technique creates a weak symbol aliasing a local definition.

1
2
static void impl() {}
__attribute__((weak,alias("impl"))) void fun();

( An example interprocedural optimization bug introduced in GCC 4.8.2 and fixed in 4.9.2/5.0

1
2
3
4
5
// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61144
// foo is an inexact definition but ipa misoptimizes bar() to always return 0.
static int dummy = 0;
extern int foo __attribute__((__weak__, __alias__("dummy")));
int bar() { if (foo) return 1; return 0; }
)

PE-COFF does not have a direct counterpart but can emulate a weak definition with an IMAGE_SYM_CLASS_WEAK_EXTERNAL IMAGE_SYM_UNDEFINED symbol with a defined IMAGE_SYM_CLASS_EXTERNAL auxiliary symbol (named .weak.<weaksymbol>.<relatedstrongsymbol> in GNU). Is the symbol does not have a regular definition in another object file, the linker will select the auxiliary definition. However, if you are concerned about MSVC compatibility, there is a not-so-greak linker solution. MSVC link.exe supports /alternatename: which can achieve similar effects. The option specifies a fallback definition for a symbol. If the symbol is otherwise undefined, the linker will pick up the fallback definition.

Weak references

The ELF specification says "Unresolved weak symbols have a zero value." This property can be used to check whether a definition is provided. A common pattern is to implement an optional hook.

1
2
3
4
__attribute__((weak)) void undef_weak_fun();

if (&undef_weak_fun)
undef_weak_fun();

Historically, toolchains, especially for the lesser-used architectures, tend to have more bugs with weak references, so musl refrains from using weak references. The above pattern can be replaced with a weak alias.

1
2
3
4
static void noop() {}
__attribute__((weak,alias("noop"))) void undef_weak_fun();

undef_weak_fun();

In most cases weak references resolve to a GOT entry for the symbol.

For ELF -fno-pic, there is an optimization: the emitted code may use an absolute relocation to check whether the address is zero. However, if the symbol turns out to be defined in a shared object and the linked image needs a dynamic section, there will be a canonical PLT entry.

PE-COFF can emulate this feature with an IMAGE_SYM_CLASS_WEAK_EXTERNAL IMAGE_SYM_UNDEFINED symbol with an IMAGE_SYM_CLASS_EXTERNAL IMAGE_SYM_ABSOLUTE auxiliary symbol (named .weak.<weaksymbol>.<relatedstrongsymbol> in GNU).

Weak references and shared objects

A weak reference can be satisfied by a shared object definition. The weak reference is no different from a regular reference.

Weak references and archives

This is a lesser-known rule. When an ELF linker sees a weak reference, it does not extract an archive member to satisfy the weak reference. Please make sure the archive member is extracted due to other symbols.

There is a related longstanding problem in libstdc++: because references to pthread_* are weak, the relevant members in -lpthread may not be extracted in a static link:

1
2
3
4
5
6
% cat a.cc
#include <condition_variable>
int main() { std::condition_variable a; }
% g++ -static -pthread a.cc
% ./a.out
Segmentation fault

You can use -Wl,-y to understand why this happens. GNU ld discards an archive if it does not need to be extracted, so I have to use LLD to show lazy definition.

1
2
3
% g++ -fuse-ld=lld -static a.cc -lpthread -Wl,-y,pthread_cond_destroy
/usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/libpthread.a: lazy definition of pthread_cond_destroy
/usr/lib/gcc/x86_64-linux-gnu/10/libstdc++.a(condition_variable.o): reference to pthread_cond_destroy

Binding of an undefined symbol in the linked image

If an object file has an undefined symbol not defined by other object files (for archives, we consider extracted archive members the same as object files), the symbol is undefined in the linked image. The symbol may be defined by a shared object, but the linked image still has an undefined symbol. If all relocations to the symbol are discarded by --gc-sections, the undefined symbol will be removed from the linked image. If the undefined symbol is retained, we say it is unresolved.

The linker will usually report an undefined symbol error if the symbol is STB_GLOBAL. (-z undefs and --no-allow-shlib-undefined are the default when linking an executable; see Explain GNU style linker options for details). If the symbol is unversioned and weak, the linker will suppress the diagnostic. (If the symbol is versioned weak, the linker will still report an error. See All about symbol versioning)

A STB_LOCAL undefined symbol is not allowed, so the binding can be either STB_GLOBAL or STB_WEAK. The binding is STB_WEAK if all undefined symbols in object files are STB_WEAK, otherwise the binding is STB_GLOBAL. Note: symbols in shared objects do not affect the binding.

Relocation types

-fno-pic

GCC and Clang -fno-pic for most targets emit absolute relocations. ppc64 uses TOC-generating relocations. aarch64 uses .rodata.cst8 which is similar to GOT.

clang -fno-pic -fno-direct-access-external-data emits GOT-generating relocations even for hidden undefined weak symbols.

-fpie and -fpic

Compilers do not emit PC-relative relocations.

GCC and Clang -fpie and -fpic emit GOT-generating relocations, even for hidden undefined weak symbols. ppc64 -fno-pic emits TOC-generating relocations.

Address taken when initializing static storage data may emit absolute relocations.

Unresolved weak references and R_*_GLOB_DAT

Absolute relocations

Statically resolves to 0.

In GNU ld, there is disagreement about whether a dynamic relocation should be emitted if the output has .dynsym.

GNU ld aarch64 before 2017-11 might emit a dynamic relocation R_AARCH64_ABS64 for static PIE. GNU ld arm before 2020-11 might emit a dynamic relocation R_ARM_RELATIVE for static PIE. GNU ld ppc since 2021-05 emits a dynamic relocation by default if no text relocation is needed.

PC-relative relocations.

Modern compilers do not emit non-branch PC-relative relocations.

GCC<5 (at least x86_64 and arm) may emit PC-relative relocations for hidden undefined weak symbols. GCC<5 i386 may optimize if (&foo) foo(); to unconditional foo();.

Branch relocations

Resolves to the current instruction, the next instruction, or relative zero address, depending on the architecture and somtimes the relocation type.

GOT-generating relocations

The linker may or may not emit a dynamic relocation R_*_GLOB_DAT for the GOT entry. If there is no R_*_GLOB_DAT, the GOT entry is always zero at runtime. If there is an R_*_GLOB_DAT, the GOT entry may be non-zero if an immediately loaded shared object defines the symbol at runtime.

Whether GNU ld emits an R_*_GLOB_DAT is dependent on the linking mode. For x86, the choice is "If all references to undefined weak symbols are PIC, dynamic relocations against undefined weak symbols will be generated in executable unless -z nodynamic-undefined-weak is passed to linker." For -fpie and -fpic code there are generally R_*_GLOB_DAT in the linker output; for -fno-pic code there are no. ppc supports -z {,no}dynamic-undefined-weak since 2021-05.

For static PIE, they think there should be no dynamic relocations.

Remarks

Targets have different opinions on whether dynamic relocations should be emitted. GNU ld x86 can disable the relocation with -z dynamic-undefined-weak (https://sourceware.org/bugzilla/show_bug.cgi?id=19636).

I don't understand some decisions made in this area. I think it would be easier sticking with a simple rule. LLD takes such a simple approach: LLD emits an R_*_GLOB_DAT for -shared links and suppresses the dynamic relocation for -no-pie and -pie. LLD's dynamic -pie behavior is usually different from GNU ld. Portable code should not depend on whether there is an R_*_GLOB_DAT.

About -z {,no}dynamic-undefined-weak, I am of the opinion that it is not useful. Its implication is much more complex than what the help message can explain, and users can unlikely use it correctly.

ld.so

A STB_GLOBAL definition and STB_WEAK definition in the dynamic symbol table is equivalent in glibc ld.so and musl ld.so. If the symbol lookup finds a STB_WEAK definition, it will stop and return that symbol, instead of searching shared objects. glibc before 2.2 provided a different behavior: a STB_WEAK definition could be overridden by a subsequent STB_GLOBAL definition. FreeBSD ld.so still uses the legacy glibc behavior.

A weak reference in the dynamic symbol table does not cause a symbol lookup error if no definition is found.