As mainly an LLVM person, I occasionally contribute to GNU toolchain projects. This is sometimes for fun, sometimes for investigating why an (usually ancient) feature works in a particular way, sometimes for pushing forward a toolchain feature with the mind of both communities, or sometimes just for getting sense of how things work with mailing list+GNU make.
GNU indirect function (ifunc) is a mechanism making a direct function call resolve to an implementation picked by a resolver. It is mainly used in glibc but has adoption in FreeBSD.
-fno-piccan only be used by executables. On most platforms and architectures, direct access relocations are used to reference external data symbols.
-fpiccan be used by both executables and shared objects. Windows has
__declspec(dllimport)but most other binary formats allow a default visibility external data to be resolved to a shared object, so generally direct access relocations are disallowed.
-fpiewas introduced as a mode similar to
-fpicfor ELF: the compiler can make the assumption that the produced object file can only be used by executables, thus all definitions are non-preemptible and thus interprocedural optimizations can apply on them.
extern int a;
-fno-pic typically produces an absolute relocation (a PC-relative relocation can be used as well). On ELF x86-64 it is usually
R_X86_64_32 in the position dependent small code model. If
a is defined in the executable (by another translation unit), everything works fine. If
a turns out to be defined in a shared object, its real address will be non-constant at link time. Either action needs to be taken:
- Emit a dynamic relocation in every use site. Text sections are usually non-writable. A dynamic relocation applied on a non-writable section is called a text relocation.
- Emit a single copy relocation. Copy relocations only work for executables. The linker obtains the size of the symbol, allocates the bytes in .bss (this may make the object writable. On LLD a readonly area may be picked.), and emit an
R_*_COPYrelocation. All references resolve to the new location.
Multiple text relocations are even less acceptable, so on ELF a copy relocation is generally used. Here is a nice description from Rich Felker: "Copy relocations are not a case of overriding the definition in the abstract machine, but an implementation detail used to support data objects in shared libraries when the main program is non-PIC."
Copy relocations have drawbacks:
- Break page sharing.
- Make the symbol properties (e.g. size) part of ABI.
- If the shared object is linked with
--dynamic-listand defines a data symbol copy relocated by the executable, the address of the symbol may be different in the shared object and in the executable.
What went poorly was that
-fno-pic code had no way to avoid copy relocations on ELF. Traditionally copy relocations could only occur in
-fno-pic code. A GCC 5 change made this possible for x86-64. Please read on.
x86-64: copy relocations and
-fpic using GOT indirection for external data symbols has cost. Making
-fpie similar to
-fpic in this regard incurs costs if the data symbol turns out to be defined in the executable. Having the data symbol defined in another translation unit linked into the executable is very common, especially if the vendor uses fully/mostly statically linking mode.
In GCC 5, "x86-64: Optimize access to globals in PIE with copy reloc" started to use direct access relocations for external data symbols on x86-64 in
extern int a;
movq a@GOTPCREL(%rip), %rax; movl (%rax), %eax(8 bytes)
movl a(%rip), %eax(6 bytes)
This change is actually useful for architectures other than x86-64 but is never implemented for other architectures. What went wrong: the change was implemented as an inflexible configure-time choice (
HAVE_LD_PIE_COPYRELOC), defaulting to such a behavior if ld supports PIE copy relocations (most binutils installations). Keep in mind that such a
-fpie default breaks
--dynamic-list in shared objects.
Clang addressed the inflexible configure-time choice via an opt-in option
I noticed that:
- The option can be used for
-fno-piccode as well to prevent copy relocations on ELF. This is occasionally users want (if their shared objects use
-Bsymbolicand export data symbols (usually undesired from API perspecitives but can avoid costs at times)), and they switch from
-fpicjust for this purpose.
- The option name should describe the code generation behavior, instead of the inferred behavior at the linking stage on a partibular binary format.
- The option does not need to tie to ELF.
- On COFF, the behavior is like always
__declspec(dllimport)is needed to enable indirect access.
- On Mach-O, the behavior is like
-fno-pic(only available on arm) and the opposite for
- On COFF, the behavior is like always
- H.J. Lu introduced
R_X86_64_REX_GOTPCRELXas GOT optimization to x86-64 psABI. This is great! With the optimization, GOT indirection can be optimized, so the incured cost is very low now.
So I proposed an alternative option
-f[no-]direct-access-external-data: https://reviews.llvm.org/D92633 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112. My wish on the GCC side is to drop
HAVE_LD_PIE_COPYRELOC and (x86-64) default to GOT indirection for external data symbols in
Please keep in mind that
-f[no-]semantic-interposition is for definitions while
-f[no-]direct-access-external-data is for undefined data symbols. GCC 5 introduced
-fno-semantic-interposition to use local aliases for references to definitions in the same translation unit.
Now let's consider how
STV_PROTECTED comes into play. Here is the generic ABI definition:
A symbol defined in the current component is protected if it is visible in other components but not preemptable, meaning that any reference to such a symbol from within the defining component must be resolved to the definition in that component, even if there is a definition in another component that would preempt by the default rules. A symbol with STB_LOCAL binding may not have STV_PROTECTED visibility. If a symbol definition with STV_PROTECTED visibility from a shared object is taken as resolving a reference from an executable or another shared object, the SHN_UNDEF symbol table entry created has STV_DEFAULT visibility.
STV_DEFAULT defined symbol is by default preemptible in a shared object on ELF.
STV_PROTECTED can make the symbol non-preemptible. You may have noticed that I use "preemptible" while the generic ABI uses "preemptable" and LLVM IR uses "dso_preemptable". Both forms work. "preemptible" is my opition because it is more common.
Protected data symbols and copy relocations
Many folks consider that copy relocations are best-effort support provided by the toolchain.
STV_PROTECTED is intended as an optimization and the optimization can error out if it can't be done for whatever reason. Since copy relocations are already oftentimes unacceptable, it is natural to think that we should just disallow copy relocations on protected data symbols.
However, GNU ld 2.26 made a change which enabled copy relocations on protected data symbols for i386 and x86-64.
A glibc change "Add ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA to x86" is needed to make copy relocations on protected data symbols work. "[AArch64][BZ #17711] Fix extern protected data handling" and "[ARM][BZ #17711] Fix extern protected data handling" ported the thing to arm and aarch64.
Despite the glibc support, GNU ld aarch64 errors
relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `foo' which may bind externally can not be used when making a shared object; recompile with -fPIC.
powerpc64 ELFv2 is interesting: TOC indirection (TOC is a variant of GOT) is used everywhere, data symbols normally have no direct access relocations, so this is not a problem.
gcc -fuse-ld=bfd -fpic -shared b.c -o b.so
gold does not allow copy relocations on protected data symbols, but it misses some cases: https://sourceware.org/bugzilla/show_bug.cgi?id=19823.
Protected data symbols and direct accesses
If a protected data symbol in a shared object is copy relocated, allowing direct accesses will cause the shared object to operate on a different copy from the executable. Therefore, direct accesses to protected data symbols have to be disallowed in
-fpic code, just in case the symbols may be copy relocated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 changed GCC 5 to use GOT indirection for protected external data.
__attribute__((visibility("protected"))) int foo;
This caused unneeded pessimization for protected external data. Clang always treats protected similar to hidden/internal.
For older GCC (and all versions of Clang), direct accesses are produced in
-fpic code. Mixing such object files can silently break copy relocations on protected data symbols. Therefore, GNU ld made the change https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5 to error in
% cat a.s
This led to a heated discussion https://sourceware.org/legacy-ml/binutils/2016-03/msg00312.html. Swift folks noticed this https://bugs.swift.org/browse/SR-1023 and their reaction was to switch from GNU ld to gold.
GNU ld's aarch64 port does not have the diagnostic.
binutils commit "x86: Clear extern_protected_data for GNU_PROPERTY_NO_COPY_ON_PROTECTED" introduced
GNU_PROPERTY_NO_COPY_ON_PROTECTED. With this property,
ld -shared will not error for
relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object.
The two issues above are the costs enabling copy relocations on protected data symbols. Personally I don't think copy relocations on protected data symbols are actually leveraged. GNU ld's x86 port can just (1) reject such copy relocations and (2) allow direct accesses referencing protected data symbols in
-shared mode. But I am not really clear about the glibc case. I wish
GNU_PROPERTY_NO_COPY_ON_PROTECTED can become the default or be phased out in the future.
Protected function symbols and canonical PLT entries
GNU ld's aarch64 and x86 ports rejects the above code. On many other architectures including powerpc the code is supported.
% gcc -fpic -shared b.c -fuse-ld=bfd b.c -o b.so
/usr/bin/ld.bfd: /tmp/cc3Ay0Gh.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
/usr/bin/ld.bfd: final link failed: bad value
collect2: error: ld returned 1 exit status
% gcc -shared -fuse-ld=bfd -fpic b.c -o b.so
/usr/bin/ld.bfd: /tmp/ccXdBqMf.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `foo' which may bind externally can not be used when making a shared object; recompile with -fPIC
/tmp/ccXdBqMf.o: in function `foo':
a.c:(.text+0x0): dangerous relocation: unsupported relocation
collect2: error: ld returned 1 exit status
The rejection is mainly a historical issue to make pointer equality work with
-fno-pic code. The GNU ld idea is that:
- The compiler emits GOT-generating relocations for
-fpiccode (in reality it does it for declarations but not for definitions).
-fno-picmain executable uses direct access relocation types and gets a canonical PLT entry.
- glibc ld.so resolves the GOT in the shared object to the canonical PLT entry.
Actually we can take the interepretation that a canonical PLT entry is incompatible with a shared
STV_PROTECTED definition, and reject the attempt to create a canonical PLT entry (gold/LLD). And we can keep producing direct access relocations referencing protected symbols for
STV_PROTECTED is no different from
On many architectures, a branch instruction uses a branch specific relocation type (e.g.
R_RISCV_CALL_PLT). This is great because the address is insignificant and the linker can arrange for a regular PLT if the symbol turns out to be external.
On i386, a branch in
-fno-pic code emits an
R_386_PC32 relocation, which is indistinguishable from an address taken operation. If the symbol turns out to be external, the linker has to employ a tricky called "canonical PLT entry" (st_shndx=0, st_value!=0). The term is a parlance within a few LLD developers, but not broadly adopted.
% gcc -m32 -shared -fuse-ld=bfd -fpic b.c -o b.so
This used to be a problem for x86-64 as well, until "x86-64: Generate branch with PLT32 relocation" changed
call/jmp foo to emit
R_X86_64_PLT32 instead of
R_X86_64_PC32. Note: (
call/jmp foo@PLT always emits
The relocation type name is a bit misleading,
_PLT32 does not mean that a PLT will always be created. Rather, it is optional: the linker can resolve
_PLT32 to any place where the function will be called. If the symbol is preemptible, the place is usually the PLT entry. If the symbol is non-preemptible, the linker can convert
_PC32. A function symbol can be either branched or taken address. For an address taken operation, the function symbol is used in a manner similar to a data symbol.
R_386_PLT32 cannot be used. LLD and gold will just reject the link if text relocations are disabled.
On i386, my proposal is that branches to a default visibility function declaration should use
R_386_PLT32 instead of
R_386_PC32, in a manner similar to x86-64. Originally I thought an assembler change sufficed: https://sourceware.org/bugzilla/show_bug.cgi?id=27169. Please read the next section why this should be changed on the compiler side.
Non-default visibility ifunc and
For a call to a hidden function declaration, the compiler produces an
R_386_PC32 relocation. The relocation is an indicator that EBX may not be set up.
If the declaration refers to an ifunc definition, the linker will resolve the
R_386_PC32 to an IPLT entry. For
-shared links, the IPLT entry references EBX. If the call site does not set up EBX to be
_GLOBAL_OFFSET_TABLE_, the IPLT call will be incorrect.
GNU ld has implemented a diagnostic ("i686 ifunc and non-default symbol visibility") to catch the problem. If we change
call/jmp foo to always use
R_386_PLT32, such a diagnostic will be lost.
Can we change the compiler to emit
call/jmp foo@PLT for default visibility function declarations? If the compiler emits such a modifier but does not set up EBX, the ifunc can still be non-preemptible (e.g. hidden in another translation unit or -Bsymbolic) and we will still have a dilemma.
Personally, I think avoiding a canonical PLT entry is more useful than a ld ifunc diagnostic. i386 ABI is legacy and the x86 maintainer will not make the change, though.
I hope the above give an overview to interested readers. Symbol interposition is subtle. One has to think about all the factors related to symbol interposition and the relevant toolchain fixes are like a whack-a-mole game. I appreciate all the prior discussions and I believe many unsatisfactory things can be fixed in a quite backward-compatible way.
Some features are inherently incompatible. We make the trade-off in favor of more important features. Here are two things that should not work. However, if
-fno-direct-access-external-data is specified, both limitations will be circumvented.
- Copy relocations on protected data symbols.
- Canonical PLT entries on protected function symbols. With the
R_386_PLT32change, this issue will only affect function pointers.
People sometimes simply just say: "protected visibility does not work." I'd argue that Clang+gold/LLD works quite well.
The things on GCC+GNU ld side are inconsistent, though. Here is a list of changes I wish can happen:
- GCC: add
- GCC: drop
HAVE_LD_PIE_COPYRELOCin favor of
- GCC x86-64: default to GOT indirection for external data symbols in
- GCC or GNU as i386: emit
R_386_PLT32for branches to undefined function symbols.
- GNU ld x86: disallow copy relocations on protected data symbols. (I think canonical PLT entries on protected symbols have been disallowed.)
- GCC aarch64/arm/x86/...: allow direct access relocations on protected symbols in
- GNU ld aarch64/x86: allow direct access relocations on protected data symbols in
The breaking changes for GCC+GNU ld:
- The "copy relocations on protected data symbols" scheme has been supported in the past few years with GNU ld on x86, but it did not work before circa 2015, and should not work in the future. Fortunately the breaking surface may be narrow: this scheme does not work with gold or LLD. Many architectures don't work.
- ld is not the only consumer of
R_386_PLT32. The Linux kernel has code resolving relocations and it needs to be fixed (patch uploaded: https://github.com/ClangBuiltLinux/linux/issues/1210).
I'll conclude thie article with random notes on other binary formats:
__declspec(dllimport) gives us a different perspecitive how external references can be designed. The annotation is verbose but differentiates the two cases (1) the symbol has to be defined in the same linkage unit (2) the symbol can be defined in another linkage unit. If we lift the "the symbol visibility is decided by the most constrained visibility" requirement for protected->default, a COFF undefined/defined symbol is quite like a protected undefined/defined symbol in ELF.
__declspec(dllimport) gives the undefined symbol default visibility (i.e. the LLVM IR
dllimport is redundant).
__declspec(dllexport) is something which cannot be modeled with the existing ELF visibilities.
For an undefined variable, Mach-O uses
__attribute__((visibility("hidden"))) to say "a definition must be available in another translation unit in the same linkage unit" but does not actually mark the undefined symbol anyway. COFF uses
__declspec(dllimport) to convey this. In ELF,
__attribute__((visibility("hidden"))) additionally makes the undefined symbol unexportable. The Mach-O notation actually resembles COFF: it can be exported by the definition in another translation unit. From its behavior, I think it would be more appropriately mapped to LLVM IR protected instead of hidden.
Subtitle: Is LLD a drop-in replacement for GNU ld?
The motivation for this article was someone challenging the "drop-in replacement" claim on LLD's website (the discussion was about Linux-like ELF toolchain):
LLD is a linker from the LLVM project that is a drop-in replacement for system linkers and runs much faster than them. It also provides features that are useful for toolchain developers.
99.9% pieces of software work with LLD without a change. Some linker script applications may need an adaption (such adaption is oftentimes due to brittle assumptions: asking too much from GNU ld's behavior which should be fixed anyway). So I defended for this claim.
I wrote an article a few weeks ago to introduce stack unwinding in detail. Today I will introduce C++ exception handling, an application of stack unwinding. Exception handling has a variety of ABI (interoperability of C++ implementations), the most widely used of which is Itanium C++ ABI: Exception Handling
In 1995, Solaris' link editor and ld.so introduced the symbol versioning mechanism. Ulrich Drepper and Eric Youngdale borrowed Solaris symbol versioning in 1997 and designed the GNU style symbol versioning for glibc.
(首先庆祝一下LLVM 2000 commits达成！)
clang，指定的都是driver options。一些driver options会影响传递给链接器的选项。 有些driver options和链接器重名，它们往往在传递给链接器同名选项之外还有额外功效，比如：