Updated in 2025-02
(首先庆祝一下LLVM 2000 commits达成!)
Compiler driver options
Before describing the linker options, let's introduce the concept of
driver options. The user-facing options of gcc and
clang are called driver options. Some driver options affect
the options passed to the linker. Many such options have the same name
as the linker's, and they often have additional functions in addition to
the options of the same name passed to the linker, such as:
-shared: Don't set-dynamic-linker; don't linkcrt1.o-static: Don't set-dynamic-linker; usecrtbegint.oinstead ofcrtbegin.o, use--start-grouplink-lgcc -lgcc_eh -lc(they have (bad) circular dependency)
-Wl,--foo,value,--bar=value will pass the three options
--foo, value, and --bar=value to
the linker. If there are a lot of link options, you can put each line in
a text file response.txt, and then specify
-wl,@response.txt.
Note that -O2 will not pass -O2 to the
linker, but -Wl,-O2 will.
-fno-pic,-fno-PICare synonymous and generate position-dependent code.-fpie,-fPIEare called small PIE and large PIE respectively. They introduce an optimization on the basis of PIC: the compiled .o can only be used for executable files. See-Bsymbolicbelow.-fpic,-fPICare called small PIC and large PIC. They generate position-independent code respectively. There are differences in code generation between the two modes on 32-bit powerpc and sparc (architectures that are about to retire). There are no differences in most architectures.
Input files
The linker accepts several types of input. For symbols, the symbol table of each input file will affect symbol resolution. For sections, only sections (called input sections) in regular object files will contribute to the sections of the output file (called output sections).
- .o (regular object files)
- .so (shared objects): only affects symbol resolution
- .a (archive files)
See Symbol processing for symbol resolution detail.
Modes
The linker is in one of the following four modes. The mode controls the output type (executable file/shared object/relocatable object).
-no-pie(default): Generate a position-dependent executable (ET_EXEC). This mode has the most loose requirements: the source files can be compiled with-fno-pic, -fpie/-fPIE, -fpic/-fPIC.-pie: Generate a position-independent executable (ET_DYN). Source files need to be compiled with-fpie/-fPIE, -fpic/-fPIC-shared: Generate a position-independent shared object (ET_DYN). The most restrictive mode: source files need to be compiled with-fpic/-fPIC.-r: Generate a relocatable file. This is called a relocatable link and is special. It suppresses various linker synthesized sections and reserves relocations. See Relocatable linking.
Confusingly, the compiler driver provides several options with the
same name: -no-pie, -pie, -shared, -r. GCC 6 introduced the
configure-time option --enable-default-pie: such builds
enable -fPIE and -pie by default. Now, many
Linux distributions have enabled this option as the basic security
hardening.
Executable link
(-no-pie and -pie)
A defined symbol is non-preemptible. For a branch instruction using a PLT-generating relocation, the branch can be bound directly the the definition, avoiding a PLT. For a code sequence involving a GOT-generating relocation, the code sequence may be optimized to use direct access. See All about Global Offset Table for detail.
A non-local STV_DEFAULT/STV_PROTECTED defined symbol is
by default not exported to the dynamic symbol table.
-no-pie
-no-pie indicates that the link-time address equals the
run-time address. This property is leveraged by a linker: all
relocations referencing a non-preemptible symbol can be resolved,
including absolute GOT-generating (e.g.
R_AARCH64_LD64_GOT_LO12_NC), PC-relative GOT-generating
(e.g. R_X86_64_REX_GOTPCRELX), etc. In the absence of a GOT
optimization, the GOT entry for a non-preemptible symbol is a constant,
avoiding a dynamic relocation. The image base is an arch-specific
non-zero value by default.
- Some architectures have different PLT code sequences (i386, ppc32 .glink).
R_X86_64_GOTPCRELXandR_X86_64_REX_GOTPCRELXcan be further optimized- ppc64
.branch_lt(long branch addresses) can be optimized
-pie
-pie is very similar to -shared -Bsymbolic,
but it produces an executable file. The following behavior is close to
-no-pie but different from -shared:
- Allow copy relocation and canonical plt.
- Allow relax general dynamic/local dynamic tls models and tls descriptors to initial exec/local exec.
- Will resolve undefined weak symbols to zeroes. ld.lld does not generate dynamic relocation. Whether GNU ld generates dynamic relocation has very complicated rules and is architecture dependent.
Shared object link
(-shared)
A non-local STV_DEFAULT definition is preemptible
(interposable) by default, that is, the definition may be replaced by
the definition in the executable file or another shared object at
runtime. The compiler and the linker cooperate by using GOT and PLT
entries to reference such symbols.
A non-local STV_DEFAULT/STV_PROTECTED symbol is exported
to the dynamic symbol table. E.g. in the following program,
foo is exported to the dynamic symbol table if linked with
-shared but (by default) not if linked with
-no-pie or -pie. 1
void foo() {}
PIC link (-pie and
-shared)
A symbolic relocation (absolute relocation & width matches the word size) referencing a non-preemptible non-TLS symbol converts to a relative relocation.
See Relative relocations and RELR for detail.
Symbol related
-Bsymbolic
The -Bsymbolic family options make non-local
STV_DEFAULT definitions in a shared object non-preemptible.
They are a no-op for executable output. See "Mode" above for an
introduction of "preemptible".
-Bsymbolicmakes all definitions except those matched by--dynamic-list/--export-dynamic-symbol-list/--export-dynamic-symbolnon-preemptible-Bsymbolic-functionsis similar to-Bsymbolic, but only applies toSTT_FUNCdefinitions-Bsymbolic-non-weak-functionsis similar to-Bsymbolic, but only applies to non-STB_WEAKSTT_FUNCdefinitions
See ELF interposition and -Bsymbolic for detail.
--defsym
Define a symbol.
Similar to ld64 -alias.
--exclude-libs
If a matched archive (either regular or surrounded by
--whole-archive/--no-whole-archive) defines a
non-local symbol, don't export the symbol.
As an example,
clang++ -static-libstdc++ -Wl,--export-dynamic,--exclude-libs=libstdc++.a a.cc
does not export libstdc++ defined symbols.
--export-dynamic
The option puts non-local STV_DEFAULT/STV_PROTECTED
defined symbols to the dynamic symbol table in an executable output. The
option is a no-op when
-sharedis specified, because a shared object does this by default.-no-pieis specified without any input shared object, because the dynamic symbol table is not present.
Here are the rules (logical AND) that a symbol is exported to the dynamic symbol table:
- non-local
STV_DEFAULT/STV_PROTECTED(this means it can be hidden by--exclude-libs) - logical OR of the following:
- undefined symbol
- (
--export-dynamic||-shared) &&! (unnamed_addr linkonce_odr GlobalValue || local_unnamed_addr linkonce_odr (constant GlobalVariable || Function))(in LTO, certainlinkonce_odrsymbols can be hidden) - matched by
--dynamic-list/--export-dynamic-symbol-list/--export-dynamic-symbol - defined or referenced by a shared object as
STV_DEFAULT STV_PROTECTEDdefinition in a shared object preempted by copy relocation/canonical PLT when--ignore-{data,function}-address-equality}is specified-z ifunc-noplt&& has at least one relocation
If the executable file defines a symbol that is referenced by a link-time shared object, the linker exports the symbol so that the undefined symbol in the shared object can be bound to the definition in the executable file at runtime. If the executable file defines a symbol that is also defined by a link-time shared object, the linker exports the symbol to enable symbol interposition at runtime.
In LLVM, certain unnamed_addr GlobalValues are not
affected by --export-dynamic for executable linking. See
lld/test/ELF/lto/internalize-exportdyn.ll and
lld/test/ELF/lto/unnamed-addr-comdat.ll.
--export-dynamic-symbol=glob,
--export-dynamic-symbol-list, and
--dynamic-list
These options have different semantics for an executable and a shared object.
- executable: put matched non-local defined symbols to the dynamic
symbol table (
--export-dynamicapplies to all non-local defined symbols.) - shared object: references to matched non-local
STV_DEFAULTsymbols shouldn't be bound to definitions within the shared object, even if they would otherwise be due to-Bsymbolic,-Bsymbolic-functions, or--dynamic-list
--dynamic-list additionally implies
-Bsymbolic.
For the shared object case, I usually call the operation "make a symbol preemptible".
One may use --export-dynamic-symbol=foo* to match all
non-local STV_DEFAULT symbols foo*. ld.lld
before 11 uses an exact match instead of a glob.
--export-dynamic-symbol-list is implemented since GNU
ld 2.35 and ld.lld
14.
In the following example, we shall see a GLOB_DAT
dynamic relocation iff var is preemptible.
1
2
3// a.c
int var;
int inc() { return ++var; }
1 | # Preemptible by default in a shared object. |
If a symbol matched by local: in a version script is
specified by a dynamic list, the version script takes precedence and the
symbol will be made local.
--discard-none,
--discard-locals, and --discard-all
If .symtab is produced, a local symbol defined in a live
section is preserved if:
1 | if ((--emit-relocs or -r) && referenced) || --discard-none |
These .L symbols (temporary labels in MC) can be emitted
in clang -Wa,-L builds. Then if
ld --discard-locals is not specified, the linker will emit
these symbols to the executable file.
In addition, RISC-V linker relaxation may emit .L0 (with
a trailing space) symbols (llvm-project#89693).
For RISC-V, newer GCC and Clang pass -X
(--discard-locals) to the linker.
--no-undefined-version
If a version script specifies an exact pattern which does not match a defined symbol, report an error.
Say we have the following version script, if foo is not
a defined symbol, the linker will report an error. For a glob pattern
(e.g. bar*) matching no symbol, there is no error. This is
a compromise. 1
2
3
4v1 {
foo;
bar*;
};
GNU ld has supported --no-undefined-version since 2002-08,
but --undefined-version was a late addition in 2022-10
(milestone: binutils 2.40).
--strip-all
Do not create .strtab or .symtab.
-u symbol
If an archive file defines the symbol specified by -u,
then pull the relevant member (convert from archive file to object file,
and then the file will be the same as normal .o).
For example: ld -u foo ... a.a. If a.a does
not define the symbols referenced by previous object files,
a.a will not be pulled. If -u foo is
specified, then the archive member with foo defined in
a.a will be pulled.
Another usage of -u is to specify a GC root.
--version-script=script
The version script has three purposes:
- Define versions
- Specify some patterns so that the matched, defined, unversioned symbols have the specified version
- local version:
local:can make matched defined symbolsSTB_LOCAL
The binding of the unversioned symbol is STB_LOCAL and
will not be exported to the dynamic symbol table
If a symbol matched by local: is specified by a dynamic
list, the version script takes precedence and the symbol will be made
local.
All about symbol versioning describes symbol versioning in detail.
-y symbol
Often used for debugging. Output where the specified symbol is referenced and defined.
-z muldefs
Alias: --allow-multiple-definition
A symbol is allowed to be defined in multiple files. By default, the linker does not allow two non-local regular definitions (non-weak, non-common) with the same name.
-z unique-symbol
Rename local symbols so that there are no duplicates.
Some Intel folks were working on function granular kernel address space layout randomization and wanted such a feature (https://sourceware.org/bugzilla/show_bug.cgi?id=26391). GNU ld since 2.36 supports this option. I closed the ld.lld feature request.
I don't think this is a good design.
First, the stability problem. Say, the old kernel has
foo.1 foo.2. If there is a new local foo
symbol, the new kernel will have foo.1 foo.2 foo.3.
However, The new symbols don't necessarily correspond to local symbols
of the same names in the old kernel. Such disturbence is probably more
likely with LTO or PGO. For Clang LTO, the kernel Makefile currently
specifies -mllvm -import-instr-limit=5. If a function close
to the boundary happens to cross the boundary, if inlined into other
translation units, the stability issue may affect many translation
units.
The implementation has to perform an iteration on all local symbols, which can affect link speed.
In addition, the .[0-9]+ scheme has been used by C++
mangling. The Itanium C++ ABI says "A
1 | % c++filt <<< $'_ZL3foov\n_ZL3foov.1' |
As an alternative, I suggest that the FGASLR developer uses the
STT_FILE symbol: 1
2
3
4STT_FILE a.c
STT_NOTYPE foo
STT_FILE b.c
STT_NOTYPE foo
The ELF specification says:
Conventionally, the symbol's name gives the name of the source file associated with the object file. A file symbol has STB_LOCAL binding, its section index is SHN_ABS, and it precedes the other STB_LOCAL symbols for the file, if it is present.
I mentioned my concern on a reply to [PATCH v9 02/15] livepatch: use `-z unique-symbol` if available to nuke pos-based search.
Library related
--as-needed and
--no-as-needed
Normally each link-time shared object has a DT_NEEDED
tag. Such a shared object will be loaded by the dynamic loader.
--as-needed can avoid unneeded DT_NEEDED
tags. --as-needed and --no-as-needed are
position-dependent options (informally called, but no more appropriate
adjectives). In ld.lld, a shared object is needed, if one of the
following conditions is true:
- it is linked at least once in
--no-as-neededmode (i.e.--as-needed a.so --no-as-needed a.so=> needed) - or it has a definition resolving a non-weak reference from a live
section (not discarded by
--gc-sections)
In gold, the rule is probably:
- it is linked at least once in
--no-as-neededmode (i.e.--as-needed a.so --no-as-needed a.so=> needed) - or it has a definition resolving a non-weak reference
In GNU ld, the rules are quite complex. The basic looks like the following:
- it is linked at least once in
--no-as-neededmode (i.e.--as-needed a.so --no-as-needed a.so=> needed) - or it has a definition resolving a non-weak reference by a previous input file (it works similar to archive selection)
In ld.bfd ... a.so --as-needed b.so --no-as-needed, if
a.so references a symbol defined by b.so but
a.so does not need b.so, the final output will
need b.so. This is probably used as a workaround for
underlinking problems. When the missing dependency (b.so)
by a shared object is seen, the output will get the
DT_NEEDED entry to satisfy b.so's requirement,
even if itself doesn’t need the dependency.
-Bdynamic and
-Bstatic
These two options are position-dependent options, which affect
-lname that appears on the command line later.
-Bdynamic(default): Search forlibfoo.soandlibfoo.ain the directory list specified by-l-Bstatic: Searchlibfoo.ain the directory list specified by-l
Historically -Bstatic and -static are
synonymous in GNU ld. The compiler driver option -static is
a different option. In addition to passing -static to ld,
it also removes the default --dynamic-linker, which affects
the linking of libgcc, libc, etc.
--no-dependent-libraries
ld.lld specific. Ignore sections of type
SHT_LLVM_DEPENDENT_LIBRARIES (conventionally named
.deplibs) in object files.
This section contains a list of filenames. The filenames will be add by ld.lld as additional input files.
-soname=name
Set the DT_SONAME dynamic tag in the dynamic table of
the generated shared object.
The linker will record the shared objects at link time, and use a
DT_NEEDED record in the dynamic table of the generated
executable file/shared object to describe each shared object at link
time.
- If the shared object contains
DT_SONAME, this field provides the value ofDT_NEEDED - Otherwise, if the link is through
-l, the value is the base file name - Otherwise, the value is the path name (there is a difference between absolute/relative paths)
For example:
ld -shared -soname=a.so.1 a.o -o a.so; ld b.o ./a.so,
a.out has a DT_NEEDED tag of
a.so.1. If the first command does not contain
-soname, a.out will have a
DT_NEEDED tag of ./a.so.
--start-group and
--end-group
If there is a mutual reference between a.a and
b.a, and you are not sure which one will be pulled into the
link first, you have to use this pair of options. An example is given
below:
For an archive linking order: main.o a.a b.a, assuming
that main.o refers to b.a, and
a.a does not satisfy a previous undefined symbol, then the
linking order will cause an error. Can the link order be replaced by
main.o b.a a.a? If main.o references
a.a after the change, and b.a does not satisfy
one of the previous undefined symbols, then the link sequence will also
cause an error.
One solution is main.o a.a b.a a.a. In many cases, it is
enough to repeat a.a once, but if only
a.a(a.o) is loaded when linking the first a.a, only
b.a(b.o) is loaded when linking b.a, and only is loaded
when linking the second a.a(c.o) and a.a(c.o)
needs another member in b.a, this link sequence will still
cause undefined symbol error.
We can repeat b.a again, that is
main.o a.a b.a a.a b.a, but a better solution is
main.o --start-group a.a b.a --end-group, or
main.o -( a.a b.a -).
--start-lib and
--end-lib
See Archives and
--start-lib. If a.a contains b.o c.o,
ld ... --start-lib b.o c.o --end-lib works like
ld ... a.a.
--sysroot
This is different from the --sysroot driver option. In
GCC/Clang, the driver option --sysroot does two things:
- Decide include/library search paths (e.g.
$sysroot/usr/include,$sysroot/lib64) - Pass
--sysrootto ld.
In ld,
-l =fooand-l=foofindlibfoo.soorlibfoo.aunder the sysroot directory.fooinINPUTorGROUPfindsfoounder the sysroot directory.- If a linker script is in the sysroot directory, when it opens an
absolute path file (
INPUTorGROUP), add sysroot before the absolute path.
-t --trace
Print relocatable object files, shared objects, and extracted archive members.
--whole-archive
and --no-whole-archive
The .a after the --whole-archive option will be treated
as .o without lazy semantics. If a.a contains
b.o c.o, then
ld --whole-archive a.a --no-whole-archive has the same
effect as ld b.o c.o.
--push-state and
--pop-state
GNU ld implemented the options in binutils 2.25.
-Bstatic, --whole-archive, --as-needed, etc. are all
position-dependent options that represent the boolean state.
--push-state can save the boolean state of these options,
and --pop-state will restore it.
When inserting a new option in the link command line to change the
state, you usually want to restore it. At this time, you can use
--push-state and --pop-state. For example, to
make sure to link libc++.a and libc++abi.a,
you can use
-wl,--push-state,-Bstatic -lc++ -lc++abi -wl,--pop-state.
Dependency related
See Dependency related linker options for details.
-z defs and
-z undefs
Whether to report an error for an unresolved undefined symbol from a
regular object. "unresolved" means that the symbol is not defined by a
regular object file or a link-time shared object. Executable links
default to -z defs/--no-undefined (not allowed) and
-shared links default to -z undefs
(allowed).
Many build systems enable -z defs, requiring shared
objects to specify all dependencies when linking (link what you
use).
--allow-shlib-undefined
and --no-allow-shlib-undefined
Whether to report an error for an unresolved STB_GLOBAL
undefined symbol from a shared object. Executable links default to
--no-allow-shlib-undefined (report errors) and
-shared links default to
--allow-shlib-undefined (do not report errors).
For the following code, an error will be reported when linking the
executable file: 1
2
3
4
5
6
7// a.so
void f();
void g() {f();}
// exe
void g()
int main() {g();}
If you specify --allow-shlib-undefined when linking the
executable, the link will succeed, but ld.so will report an error at
runtime. In glibc, the error is
symbol lookup error: ... undefined symbol:.
GNU ld has a complex algorithm to find transitive closures. Only when
shared objects of transitive closures cannot resolve an undefined
symbol, an error will be reported. gold and lld use a simplified rule:
if all DT_NEEDED dependencies of a shared object are
directly linked, an error is enabled; if some of the dependencies are
not linked, then gold/lld cannot accurately determine whether an
indirectly shared object can provide a definition, so they are
conservative and do not report errors.
It is worth mentioning that
-z defs/-z undefs/--no-undefined and
--[no-]allow-shlib-undefined can be controlled by an option
--unresolved-symbols.
--warn-backrefs
See Dependency related linker options#--warn-backrefs.
Layout related
--no-rosegment
By default ld.lld places read-only data sections (e.g.
.rodata) and text sections (e.g. .text) into
two PT_LOAD segments.
- R
PT_LOAD - RX
PT_LOAD - RW
PT_LOAD(overlaps withPT_GNU_RELRO) - RW
PT_LOAD
Specify this option to combine the R PT_LOAD and the RX
PT_LOAD. The RX PT_LOAD segment is
traditionally called the text segment and is the first segment.
ld.lld places rodata and data on both sides of text. This layout has the advantage that the distance between text and data is shorter, decreasing the relocation overflow pressure.
gold is the first linker which implements
--rosegment.
--xosegment
This option enables support for execute-only memory.
- AArch32 uses the
SHF_ARM_PURECODEsection flag to desginate sections with pure program instructions and no data. - AArch64 uses the
SHF_AARCH64_PURECODEsection flag.
By default, LLD treats sections with the
SHF_ALLOC|SHF_EXECINSTR|SHF_AARCH64_PURECODE flags as
compatible with those having SHF_ALLOC|SHF_EXECINSTR flags,
merging them into a single PT_LOAD segment. When
--xosegment ois specified, LLD separates these sections
into distinct PT_LOAD segments: one for sections with
SHF_ALLOC|SHF_EXECINSTR|SHF_AARCH64_PURECODE and another
for sections with SHF_ALLOC|SHF_EXECINSTR.
-z noseparate-code
This is GNU ld's classic layout allowing some file content to be
mapped as more than one PT_LOAD segments, with one being
executable and another one being non-executable. In this layout, two
adjacent PT_LOAD program headers may overlap in file
offsets. This trick avoids padding before the start of the next program
header.
In the absence of linker script fragments, there are typically just
two PT_LOAD segments:
- RX
PT_LOAD: encompassing both read-only sections (SHF_ALLOC) and executable sections (SHF_ALLOC|SHF_EXECINSTR). - RW
PT_LOAD- The prefix part is
PT_GNU_RELRO. This part of mprotect becomes readonly after rtld processes dynamic relocations. - The part that is not
PT_GNU_RELRO. This part is always writable at runtime.
- The prefix part is
The first PT_LOAD is often called the text segment. The
term is somewhat inaccurate because the segment has read-only data as
well.
This layout is used by default since ld.lld 10 for its size benefits.
Note: when a SHT_NOBITS section is followed by another
section, the SHT_NOBITS section behaves as if it occupies
the file offset range. This is because ld.lld does not implement a file
size optimization. This optimization unused by almost all linked images
because it's rare to add SHF_ALLOC sections after a
SHT_NOBITS SHF_ALLOC section.
-z separate-code
The option is introduced in binutils 2.31 and enabled by default on Linux/x86. GNU ld has such a layout:
- R
PT_LOAD - RX
PT_LOAD - R
PT_LOAD - RW
PT_LOADPT_GNU_RELROpart- Non-
PT_GNU_RELROpart
In this layout, two adjacent PT_LOAD program headers
cannot overlap in file offsets. That is, a byte (RX
PT_LOAD) in the file that is mapped to the executable
section will not be mapped to an R PT_LOAD at the same
time. The idea is that since read-only memory cannot be executed so ROP
gadgets there cannot be used. However, this is pretty much a secure
theatre as executable memory has plenty of ROP gadgets anyway.
Due to implementation complexity, the adopted layout is not so great
in that there is another read-only PT_LOAD after the
RX PT_LOAD. A better layout is to merge this R
with the first R (PR23704).
Another issue is that when there is no RW PT_LOAD, the
first few non-SHF_ALLOC` sections' content may be mapped to
the RX memory.
I introduced this option in ld.lld 10. The semantics are similar to
GNU ld but the layout is different: the two RW PT_LOAD are
allowed to overlap, which means that the address of the second
PT_LOAD does not need to be aligned, and max-page-size*2
bytes can be wasted at most.
GNU ld's -z separate-code is essentially split into two
options in lld: -z separate-code and
--rosegment.
-z separate-loadable-segments
This is ld.lld's traditional layout: all PT_LOAD
segments do not overlap (a byte will not be loaded into two memory
mappings at the same time). I added the option in 2019.
The implementation is that the address of each new
PT_LOAD is aligned to max-page-size. lld presets 4
PT_LOAD(r,rx,rw(relro),rw(non-relro)). Three alignments in
the output file may waste some bytes. On aarch64 and powerpc, because
the max-page-size specified by abi is larger (65536), up to 65536*3
bytes can be wasted.
-z relro
Place RELRO sections in the PT_GNU_RELRO program
header.
GNU ld uses one RW PT_LOAD program header with padding
at the start. The first half of the PT_LOAD overlaps with
PT_GNU_RELRO. The padding is added so that the end of
PT_GNU_RELRO is aligned by
max-page-size. (See ld.bfd --verbose output.) Prior to
GNU ld 2.39, the end was aligned by common-page-size. GNU ld's one RW
PT_LOAD layout makes the alignment increase the file size.
max-page-size can be large, such as 65536 for many systems, causing wasted
space.
lld utilitizes two RW PT_LOAD program headers: one for
RELRO sections and the other for non-RELRO sections. Although this might
appear unusual initially, it eliminates the need for alignment padding
as seen in GNU ld's layout. Key changes:
- https://reviews.llvm.org/D58892 switched from
PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)toPT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss). - The end of the
PT_GNU_RELROsegment and the associated RWPT_LOADsegment is padded to a common-page-size boundary. The padding section.relro_paddingis like mold. Before LLD 18, there is an issue that runtime_page_size < common-page-size does not work.
The layout used by mold is similar to that of lld. In mold's case,
the end of PT_GNU_RELRO is padded to max-page-size by
appending a SHT_NOBITS .relro_padding section.
This approach ensures that the last page of PT_GNU_RELRO is
protected, regardless of the system page size. However, when the system
page size is less than max-page-size, the map from the first
RW PT_LOAD is larger than needed.
In my opinion, losing protection for the last page when the runtime
page size is larger than common-page-size is not really an issue. Double
mapping a page of up to max-common-page for the protection could cause
undesired VM waste. Protecting .got.plt is the main purpose
of -z now. Protecting a small portion of
.data.rel.ro doesn't really make the program more secure,
given that .data and .bss are so huge and full
of attack targets. If users are really anxious, they can set
common-page-size to match their system page size.
GNU ld's internal linker scripts place RELRO sections between
DATA_SEGMENT_ALIGN and DATA_SEGMENT_RELRO_END
(built-in functions). DATA_SEGMENT_ALIGN is where padding
is added so that DATA_SEGMENT_RELRO_END aligns to a
max-page-size boundary. 1
2
3. = DATA_SEGMENT_ALIGN(CONSTANT(MAXPAGESIZE), CONSTANT(COMMONPAGESIZE));
. = DATA_SEGMENT_RELRO_END(0, .);
. = DATA_SEGMENT_END(.);
ld.lld emulates these built-in functions:
DATA_SEGMENT_ALIGN: set the current location toalignTo(script->getDot(), + align)DATA_RELRO_END: set the current location toalignTo(script->getDot(), MAXPAGESIZE)..relro_paddingis placed immediately beforeDATA_RELRO_END.
-z lrodata-after-bss
See Relocation overflow and code models#x86-64 linker requirement.
--execute-only
This is a ld.lld specific option for AArch64. The option requires
--rosegment and makes the RX
PT_LOAD segment executable-only (PF_X).
Relocation related
--apply-dynamic-relocs
Some psABI use the RELA format (AArch64, PowerPC, RISC-V, x86-64,
etc): relocations contain the addend field. On such targets,
--apply-dynamic-relocs requires the linker to set the
initial value of the relocated location to the addend instead of 0. If
the executable file/shared objects uses compression,
--no-apply-dynamic-relocs can improve compression.
--apply-dynamic-relocs is supported for all ports in
ld.lld. As of August 2023, only
the aarch64 port of GNU ld supports
--apply-dynamic-relocs.
--emit-relocs
This option makes -no-pie/-pie/-shared links to keep
input relocations, in a way similar to -r. It can be used
for binary analysis after linking. The only two uses I know are
config_relocatable and bolt of linux kernel x86.
The output section order may be different with
--emit-relocs. .rela.eh_frame sections are
kept. See https://reviews.llvm.org/D44679 that the first
.rela.eh_frame input section may cause
.eh_frame to be placed before other read-only output
sections.
GNU ld's powerpc uses the transformed relocation types.
--pack-dyn-relocs=value
relr can enable DT_RELR, a more compact
relative relocation (R_*_RELATIVE) encoding format.
Relative relocations are common in position independent executables.
-z rel and
-z rela
Each architecture has a prevailing relocation format. ld.lld
implements -z rel to use REL for dynamic relocations even
on architecture using RELA as the prevailing format. This option can
save some space.
- COPY, GLOB_DAT and J[U]MP_SLOT always have 0 addend. A rtld implementation does not need to read the implicit addend. REL is strictly better.
- A RELATIVE has a non-zero addend. It can use an implicit addend as well. Alternative, such relocations can be packed compactly with the RELR relocation entry format.
- For other dynamic relocation types (e.g. symbolic relocation R_X86_64_64), a ld.so implementation needs to read the implicit addend. REL may have minor performance impact, because implicit addends force random access reads instead of being able to blast out a bunch of writes while chasing the relocation array.
-z report-relative-reloc
Dump information about R_*_RELATIVE and
R_*_IRELATIVE relocations.
-z text and
-z notext
-z text does not allow text relocations.
-z notext allows text relocations.
Starting from binutils 2.35, GNU ld on linux/x86 enables the
configure-time option --enable-textrel-warning=warning by
default, and a warning will be given if there are text relocations.
The wording of the concept of text relocations is inaccurate. The
actual meaning is the general term for dynamic relocations acting on
sections without the SHF_WRITE flag. If the value of
relocations in .o cannot be determined at link time, it needs to be
converted to dynamic relocations and calculated by ld.so at runtime
(type and .o are the same). If the active section does not have the
SHF_WRITE flag, ld.so will have to temporarily execute
mprotect to change the permissions of the memory maps,
modify, and restore the previous read-only permissions, which hinders
page sharing.
Shared objects form text relocations more than executable files. Executable files have canonical plt and copy relocations to avoid certain text relocations.
Different linkers allow different relocation types of text
relocations on different architectures. GNU ld may allow quite a few
relocation types supported by glibc ld.so. On x86-64, the linker will
allow R_X86_64_64 and R_X86_64_PC64. However,
most loaders don't support R_X86_64_PC64.
In the following assembler, defined_in_so is a symbol
defined in a shared object. The scenario of each text relocation is
given in the comments.
1 | .globl global |
In -no-pie or -pie mode, the linker will
make different choices according to the symbol type of
defined_in_so:
STT_FUNC: generate canonical pltSTT_OBJECT: Generate copy relocationSTT_NOTYPE: gnu ld will generate copy relocation. lld will generate text relocation
Section related
--gc-sections
Specify -ffunction-sections or
-fdata-sections at compile time to have an effect. The
linker will do liveness analysis to remove unused sections from the
output.
See Linker garbage collection for detail.
-z start-stop-gc
and -z nostart-stop-gc
-z start-stop-gc means a __start_foo or
__stop_foo reference from a live section does not retain
all foo input sections.
-z nostart-stop-gc means a __start_foo or
__stop_foo reference from a live section retains all
foo input sections.
See Metadata sections, COMDAT and SHF_LINK_ORDER for detail.
--icf=all and
--icf=safe
Enable identical code folding. The name originated from MSVC linker
/OPT:ICF where "ICF" stands for "identical COMDAT folding".
gold named it after "identical code folding".
This name is slightly misleading:
- The feature operates on sections instead of functions.
- The feature apply to readonly data as well.
We define identical sections as they have identical content and their
outgoing relocation sets cannot be distinguished: they need to have the
same number of relocations, with the same relative locations, with the
referenced symbols indistinguishable. This is a recursive definition: if
.text.a and .text.b reference different
symbols at the same location, they can still be indistinguishable if the
referenced symbols satisfy the identical code/rodata requirement.
Among a group of identical sections, the linker may conservatively
suppress folding for some. --keep-unique=<symbol>
makes the section defining <symbol> unique. In
ld.lld, a readonly section is by default foldable (gold does not fold
readonly data). However, a readonly section defining a
.dynsym symbol is not.
For the rest sections, in a set of identical sections, the linker picks one representative and drops the rest, then redirect references to the representative.
gold implements --icf=safe based on relocation.
ld.lld --icf=safe uses a special section
.llvm_addrsig (LLVM address significance table, type
SHT_LLVM_ADDRSIG) produced by Clang -faddrsig.
As of 2023-01, -faddrsig is the default on most Linux
targets but disabled for Android, Gentoo, and
-fintegrated-as. If the section is absent, ld.lld is
conservative and assumes every section defining a symbol in the table is
address significant.
SHT_LLVM_ADDRSIG encodes symbol indexes as ULEB128.
objcopy, ld -r, and other binary manipulation
tools may alter the symbol table. An interesting property is that
objcopy and ld -r sets sh_link=0
for an known section type SHT_LLVM_ADDRSIG. ld.lld uses
sh_link!=0 to check the validity and reports a warning in
case of sh_link==0.
I am somewhat sad that the design tradeoff leans toward the code size
but not the generality, and the current state that a number of Linux
distributions default to -faddrsig in Clang Driver while
others default to -fno-addrsig. It might be better if we
used R_*_NONE relocations (and use REL instead of RELA to
decrease size bloat) to encode symbol indexes. Then that perhaps means
we should default to -fno-addrsig and let users opt in the
feature.
lld's Mach-O port chose
a relocation-based representation for
__DATA,__llvm_addrsig.
ld.lld --icf=all ignores .llvm_addrsig.
For a -shared link of 1
2int foo() { return 1; }
int bar() { return 1; }
foo and bar are in .dynsym.
ld.lld --icf=safe assumes that symbols in
.dynsym are address significant and two symbols cannot
share the same address, so ld.lld conservatively suppresses merging
.text.foo and .text.bar.
gold --icf=safe does merge .text.foo and
.text.bar. Such a choice will be unsafe if a program uses a
map and expects that map[dlsym(h, "foo")] and
map[dlsym(h, "bar")] resolve to different objects.
In LLVMCodeGen, a global value with a
{,local_}unnamed_addr attribute does not go into
.llvm_addrsig.
--icf=all gives up C++ language guarantee about pointer
equality. Some think this is fair as some portion of the guarantee is
sabotaged anyway (-fvisibility-inlines-hidden). See ELF
interposition and -Bsymbolic.
In https://reviews.llvm.org/D141310, an opt-in Clang
diagnostic -Wcompare-function-pointers is proposed to catch
some problems which will cause --icf=all to fail.
ICF can make debugging more difficult as the debugger might not be able to distinguish between the folded instances.
- Debug information associated to folded functions is essentially redirected.
- Setting a breakpoint on a function also affects folded functions.
ICF can alter function names in stack traces and make profiling inaccurate.
The debug information regression could be alleviated if you enable
DW_AT_LLVM_stmt_sequence and use the caller to disambiguate
the address in a folded function.
https://github.com/llvm/llvm-project/pull/139493#issuecomment-2896493771
When we have
- Section A1, with a relocation on a symbol S1, where S1 is at offset K in section B1.
- Section A2, with a relocation on a symbol S2, where S2 is at offset K in section B2.
When the relocation type is R_AARCH64_ADR_GOT_PAGE, and
sections B1 and B2 are merged while keeping symbols S1 and S2 separate,
sections A1 and A2 can currently be merged, which leads to correctness
issues.
--symbol-ordering-file=<file>
Specify a text file with one defined symbol per line. Within an input
section description (e.g. *(.text .text.*)), sort the
matched input sections: if symbol A is before symbol B in the ordering
file, place the section defining A before the section defining B.
If a symbol is not defined or the section in which it is located is
discarded, the linker will output a warning unless
--no-warn-symbol-ordering is specified. However, the
default --no-warn-symbol-ordering seems to often get in the
way.
--symbol-ordering-file= is primarily used for two goals:
performance or compression.
If one function frequently calls another, and the input sections where the two functions are located in the linked image are close, the probability that they will fall on the same page will increase. By considering references among functions and placing corelated functions together, the page working set can be reduced, and the TLB thrashing can be reduced. See profile guided code positioning by Karl Pettis and Robert C. Hansen.
Mobile applications often prioritize compressed code size. For cold functions, their compressed size matters much more than their performance. To improve compressed size, similar functions can be grouped together to enhance compression algorithms (like the Lempel-Ziv family).
This option is unique to ld.lld. gold has a
--section-ordering-file, sorted by section name. In
practice, text and data sections mostly have different names. However,
clang -fno-unique-section-names (GCC feature
request) can create sections of the same that defeat
--section-ordering-file.
1 | cat > a.s <<e |
1 | % readelf -x .rodata a |
GNU ld 2.43 introduced --section-ordering-file with
different semantics. The section ordering script must specify output
sections already defined in the linker script. The specified extra
mapping will be prepended to the output section. 1
2
3
4cat > b.txt <<e
.rodata : { *(.rodata.3) *(.rodata.[2]) *(.rodata.1) }
e
ld.bfd --section-ordering-file=b.txt a.o -o a
--call-graph-profile-sort
When --call-graph-profile-sort (default) is in effect,
ld.lld inspects SHT_LLVM_CALL_GRAPH_PROFILE sections (call
graph profile) in input relocatable object files. A
SHT_LLVM_CALL_GRAPH_PROFILE section consists of
(from_symbol, to_symbol, weight) tuples. LLD utilitizes the information
to compute a call graph with input sections as nodes and (from_section,
to_section, weight) as edges, then sorts sections within an input
section description. The sorting algorithm is based on Optimizing
Function Placement for Large-Scale Data-Center Applications.
LLD sorts input sections by decreasing density, where density is computed as the weight divided by the size. Initially, each input section is placed in a cluster by itself. When processing each input section, its cluster is appended to the cluster containing its most likely predecessor in the call graph. A merge can be blocked if any of the following conditions are satisfied:
- The edge is unlikely (the edge weight is too small considering the sum of input edge weights).
- The total size of the two clusters is larger than a threshold.
- The merged density would make the predecessor cluster's density much smaller.
Finally, the clusters are sorted by decreasing density.
If --symbol-ordering-file= is specified,
--symbol-ordering-file= specified sections are placed
first. The call graph profile is still used for other sections (lld
>= 20).
takes precedence over --call-graph-profile-sort.
When both --call-graph-profile-sort and
--print-symbol-order= are specified, ld.lld will dump the
symbol order in the specified file. The file can be used with
--symbol-ordering-file=.
--bp-compression-sort=
and --bp-startup-sort=
Both options instruct the linker to optimize section layout with the following goals:
--bp-compression-sort=[data|function|both]: Improve Lempel-Ziv compression by grouping similar sections together, resulting in a smaller compressed app size.--bp-startup-sort=function --irpgo-profile=<file>: Utilize a temporal profile file to reduce page faults during program startup.
The linker determines the section order by considering three groups:
- Function sections ordered according to the temporal profile
(
--irpgo-profile=), prioritizing early-accessed and frequently accessed functions. - Function sections. Sections containing similar functions are placed together, maximizing compression opportunities.
- Data sections. Similar data sections are placed together.
Within each group, the sections are ordered using the Balanced Partitioning algorithm.
The linker constructs a bipartite graph with two sets of vertices: sections and utility vertices.
- For profile-guided function sections:
- The number of utility vertices is determined by the symbol order within the profile file.
- If
--bp-compression-sort-startup-functionsis specified, extra utility vertices are allocated to prioritize nearby function similarity.
- For sections ordered for compression: Utility vertices are determined by analyzing k-mers of the section content and relocations.
The call graph profile is disabled during this optimization.
When --symbol-ordering-file= is specified, sections
described in that file are placed earlier.
-z nosectionheader
GNU ld 2.41 introduced the option to omit the section header table.
--unique
By default, the linker combines all input sections with the same name
into a single output section. For example, all .text
sections from different object files are merged into one
.text output section.
GNU ld's --unique option creates separate output
sections for each orphan section. Note that with -r
(relocatable output), the internal linker script still causes certain
sections like .text and .debug_info to be
combined.
For finer control, use --unique=glob to match specific
patterns - for example, --unique=* matches all
sections.
ld.lld only implements the --unique form, which applies
to all sections. It treats all input sections orphans in the absence of
a linker script.
Analysis related
--cref
Output the cross reference table. For each non-local symbol, output the defined file and the list of files with references.
-m and
-map=<file>
Output the link map, you can view the address of the output sections, the file offset, and the included input sections.
Warning related
--fatal-warnings
Turn warnings into errors. The difference between warning and error
is that besides whether it contains warning or
error string, the more important point is that error
prevents the output of the link result.
--noinhibit-exec
Turn some errors into warnings. Be careful not to specify
--fatal-warnings to upgrade the degraded warnings to errors
again:)
--no-warnings
Suppress warnings. Warnings turned to errors due to
--fatal-warnings are not suppressed.
Randomness related
--shuffle-sections=<seed>
Shuffle input sections to uncover bugs that rely on a certain order of sections.
--randomize-section-padding=<seed>
Randomly insert padding between input sections and at the start of each segment using given seed.
Imagine a change that unintentionally reduces the memory alignment of
a frequently executed function. While the original program might not
have guaranteed alignment for this function, the change could exacerbate
the issue. Using --randomize-section-padding can help
uncover such subtle performance degradations by introducing variability
in memory layout.
Others
--build-id=value
Generate .note.gnu.build-id to give the output an
identifier. The identifier is generated by hashing the whole output.
SHA-1 is the most common choice. The linker will fill in the content
of .note.gnu.build-id with zeros, hash each byte and fill
the result back to .note.gnu.build-id. Some linkers use
tree-style hashes for parallelism.
--compress-debug-sections=[zlib|zstd]
Use zlib or zstd to compress .debug_* sections of the
output file and mark SHF_COMPRESSED. See Compressed debug
sections.
--hash-style=style
The ELF specification requires a hash table DT_HASH for
dynamic symbol lookup. --hash-style=sysv generates the
table.
DT_GNU_HASH is better than DT_HASH in terms
of space consumption and performance. mips uses the alternative
DT_MIPS_XHASH (a good example of mips abi suffering from
its own wisdom). I personally think DT_MIPS_XHASH is
solving a wrong problem. In fact, there is a way to use
DT_GNU_HASH, but people in the mips community may not want
to worry about it another time.
See glibc and DT_GNU_HASH for a story about "Easy Anti-Cheat".
--no-ld-generated-unwind-info
See PR12570 .plt has no associated .eh_frame/.debug_frame.
When the pc is in the plt entry, if the linker does not synthesize
.eh_frame information, unwinding from the current PC will
not get frames. On i386 and x86-64, in the lazy binding state, the first
call of a plt entry will execute the push instruction. After the esp/rsp
is changed, if the plt entry does not have the unwind information
provided by .eh_frame, the unwinder may not be able to
unwind correctly, which affects the accuracy of profilers.
1 | jmp *got(%rip) |
However, this feature is largely obsolete nowadays due to the
prevailing use of -Wl,-z,relro,-z,now (BIND_NOW). PLT
entries behave as functions without a prologue. A profiler can trivially
retrieve the return address by using the default rule: if a code region
is not covered by metadata, assume the return address is available at
*rsp (x86-64).
To recognize the PLT name, a profiler needs to do:
- Parse the
.pltsection to identify the region of PLT entries - Parse
.rel[a].pltto getR_*_JUMP_SLOTdynamic relocations and their referenced symbol names. - If the current PC is within the PLT region, parse nearby
instructions and find the GOT load. The associated
R_*_JUMP_SLOTidentifies the symbol name. - Concatenate the symbol name and
@pltto formfoo@plt
Note: foo@plt is a convention used by tools like
objdump, but the object file doesn't contain such a symbol.
gdb has heuristics to identify this situation.
This problem will not affect the C++ exception. The PLT entry is a
tail call, and the _Unwind_RaiseException called by
__cxa_throw will penetrate the tail calls of the ld.so
resolver and PLT entry. The PC will be restored to the next instruction
of the caller of the PLT entry.
1 | // b.cc - b.so |
-O
Enable size optimizations. The optimization level is different from
the compiler driver option -O. -O does not
imply --lto-O: there is no effect on LTO code
generation.
In ld.lld, -O1 is the default.
-O0 disables constant merge of
SHF_MERGE.
-O2 enables some computation heavy size
optimization:
- enable string suffix merge of
SHF_MERGE|SHF_STRINGS. This is very slow and not parallel. --compress-debug-sections=zlibuses zlib compression with higher compression ratio.- Since 14.0.0, deduplicate local symbol names in
.strtab. I may remove this completely once ld.lld supports parallel.symtabwrite .
In GNU ld, non-zero -O can make .hash and
.gnu.hash smaller.
For a symbol assignment referencing a SHF_MERGE section,
it is considered to refer to the constant data element. After duplicate
elimination, the symbol value is adjusted to refer to the data element
in the output section.
-plugin file
GNU ld and gold support this option to load GCC LTO plugin
(liblto_plugin.so) or LLVM LTO plugin
(LLVMgold.so). clang -flto={full,thin} passes
-plugin path/to/LLVMgold.so unless
-fuse-ld=lld.
binutils-gdb/include/plugin-api.h defines the plugin
API.
Despite the name of LLVMgold.so containing gold, the
file can be used by GNU binutils (ld, gold, nm, ar) and mold.
--verbose
GNU ld dumps a linker script (either internal or external) with this option. gold, ld.lld, and mold are not linker script driven. There is no linker script output.
Address related
-Ttext-segment
The text segment is traditionally the first segment. Users who
specify -Ttext-segment may actually want to specify the
image base. The option has strange semantics (likely a bug) when
-z separate-code is used together: https://sourceware.org/bugzilla/show_bug.cgi?id=25207.
ld.lld provides --image-base to set the image base.
GNU ld's PE/COFF port has supported --image-base for a
long time and implemented the option for ELF in the binutils 2.44
release.
This option appears to be used mainly for mmap
MAP_FIXED usage to avoid conflict with ASLR. The better
alternative is to avoid setting a fixed address. qemu
linux-user/elfload.c:probe_guest_base may give some
insight.
Target-specific
--cmse-implib,
--out-implib=out.lib
中文版
解析GNU风味的linker options
编译器driver options
在描述链接器选项前先介绍一下driver
options。通常使用gcc或clang,指定的都是driver
options。一些driver options会影响传递给链接器的选项。 有些driver
options和链接器重名,它们往往在传递给链接器同名选项之外还有额外功效,比如:
-shared: 不设置-dynamic-linker,不链接crt1.o-static: 不设置-dynamic-linker,使用crtbeginT.o而非crtbegin.o,使用--start-group链接-lgcc -lgcc_eh -lc(它们有(不好的)循环依赖)
-Wl,--foo,value,--bar=value会传递--foo、value、--bar=value三个选项给链接器。
如果有大量链接选项,可以每行一行放在一个文本文件response.txt里,然后指定-Wl,@response.txt。
注意,-O2不会传递-O2给链接器,-Wl,-O2则会。
-fno-pic,-fno-PIC是同义的,生成position-dependent code-fpie,-fPIE分别叫做small PIE、large PIE,在PIC基础上引入了一个优化:编译的.o只能用于可执行档。参见下文的-Bsymbolic。-fpic,-fPIC分别叫做small PIC、large PIC,position-independent code。在32-bit PowerPC和Sparc上(即将退出历史舞台的架构)两种模式有代码生成差异。大多数架构没有差异。
输入
链接器接受几类输入。对于符号,每个输入文件的符号表都会影响符号解析;对于sections,只有regular object files里的sections(称为input sections)会拼接得到输出文件的output sections。
- .o (regular object files)
- .so (shared objects): 只影响符号解析
- .a (archive files)
符号解析细节参见参见Symbol processing
模式
以下四种链接模式四选一,控制输出文件的类型(可执行档/shared object/relocatable object):
-no-pie(default): 生成position-dependent executable (ET_EXEC)。要求最宽松,源文件可用-fno-pic,-fpie,-fpic编译-pie: 生成position-independent executable (ET_DYN)。源文件须要用-fpie,-fpic编译-shared: 生成position-independent shared object (ET_DYN)。最严格,源文件须要用-fpic编译-r: relocatable link,不生成linker synthesized sections,且保留relocations
-pie可以和-shared都是position-independent的链接模式。-pie也可以和-no-pie都是可执行档的链接模式。
-pie和-shared -Bsymbolic很相似,但它毕竟是可执行档,以下行为和-no-pie贴近而与-shared不同:
- 允许copy relocation和canonical PLT
- 允许relax General Dynamic/Local Dynamic TLS models和TLS descriptors到Initial Exec/Local Exec
- 会链接时解析undefined weak,(ld.lld行为)不生成dynamic relocation。GNU ld是否生成dynamic relocation有非常复杂的规则,且和架构相关
容易产生混淆的是,编译器driver提供了几个同名选项:-no-pie,-pie,-shared,-r。
GCC
6引入了configure-time选项--enable-default-pie:启用该选项的GCC预设-pie和-fPIE。现在,很多Linux发行版都启用了该选项作为基础的security
hardening。
The linker is in one of the following four modes. The mode controls the output type (executable file/shared object/relocatable object).
-no-pie(default): Generate a position-dependent executable (ET_EXEC). This mode has the most loose requirements: the source files can be compiled with-fno-pic, -fpie/-fPIE, -fpic/-fPIC.-pie: Generate a position-independent executable (ET_DYN). Source files need to be compiled with-fpie/-fPIE, -fpic/-fPIC-shared: Generate a position-independent shared object (ET_DYN). The most restrictive mode: source files need to be compiled with-fpic/-fPIC.-r: Generate a relocatable file. This is called a relocatable link and is special. It suppresses various linker synthesized sections and reserves relocations.
Confusingly, the compiler driver provides several options with the
same name: -no-pie, -pie, -shared, -r. GCC 6 introduced the
configure-time option --enable-default-pie: such builds
enable -fPIE and -pie by default. Now, many
Linux distributions have enabled this option as the basic security
hardening.
可执行档(-no-pie和-pie)
A defined symbol is non-preemptible. For a branch instruction using a PLT-generating relocation, the branch can be bound directly the the definition, avoiding a PLT. For a code sequence involving a GOT-generating relocation, the code sequence may be optimized to use direct access. See All about Global Offset Table for detail.
A non-local STV_DEFAULT/STV_PROTECTED defined symbol is
by default not exported to the dynamic symbol table.
-no-pie
-no-pie indicates that the link-time address equals the
run-time address. This property is leveraged by a linker: all
relocations referencing a non-preemptible symbol can be resolved,
including absolute GOT-generating (e.g.
R_AARCH64_LD64_GOT_LO12_NC), PC-relative GOT-generating
(e.g. R_X86_64_REX_GOTPCRELX), etc. In the absence of a GOT
optimization, the GOT entry for a non-preemptible symbol is a constant,
avoiding a dynamic relocation. The image base is an arch-specific
non-zero value by default.
- Some architectures have different PLT code sequences (i386, ppc32 .glink).
R_X86_64_GOTPCRELXandR_X86_64_REX_GOTPCRELXcan be further optimized- ppc64
.branch_lt(long branch addresses) can be optimized
-pie
-pie is very similar to -shared -Bsymbolic,
but it produces an executable file. The following behavior is close to
-no-pie but different from -shared:
- Allow copy relocation and canonical plt.
- Allow relax general dynamic/local dynamic tls models and tls descriptors to initial exec/local exec.
- Will resolve undefined weak symbols to zeroes. ld.lld does not generate dynamic relocation. Whether GNU ld generates dynamic relocation has very complicated rules and is architecture dependent.
Shared object link
(-shared)
A non-local STV_DEFAULT definition is preemptible
(interposable) by default, that is, the definition may be replaced by
the definition in the executable file or another shared object at
runtime. The compiler and the linker cooperate by using GOT and PLT
entries to reference such symbols.
A non-local STV_DEFAULT/STV_PROTECTED symbol is exported
to the dynamic symbol table. E.g. in the following program,
foo is exported to the dynamic symbol table if linked with
-shared but (by default) not if linked with
-no-pie or -pie. 1
void foo() {}
PIC link (-pie and
-shared)
A symbolic relocation (absolute relocation & width matches the word size) referencing a non-preemptible non-TLS symbol converts to a relative relocation.
See Relative relocations and RELR for detail.
Archive member selection
.a文件具有archive member selection的特殊语义。每个成员都是惰性的。 如果链接器发现.a中的某个archive member定义了某个之前被引用但尚未定义的符号,则会从archive中pull这个member。 该member会在概念上成为一个regular object file,其符号表被用于符号解析,且贡献input sections,之后的处理方式就和.o没有任何差异了。
若该archive不能满足之前的某个undefined符号,GNU
ld和gold会跳过该archive,详见--warn-backrefs。
Thin archive的链接语义和regular archive相同。
--start-group可以改变archive member selection语义。
--whole-archive可以取消archive member selection,还原object
file语义。
模式
符号相关
-Bsymbolic
In an ELF shared object, a defined non-local STV_DEFAULT
symbol is preemptible (interposable) by default, that is, the definition
may be replaced by the definition in the executable file or another
shared object at runtime. A definition in an executable file is
guaranteed to be non-preemptible (non-interposable).
The linker provides several mechanisms to make non-local
STV_DEFAULT definitions in a shared object non-preemptible,
similar to -no-pie, -pie.
-Bsymbolicmakes all definitions except those matched by--dynamic-list/--export-dynamic-symbol-list/--export-dynamic-symbol) non-preemptible-Bsymbolic-functionsis similar to-Bsymbolic, but only applies toSTT_FUNCdefinitions
See ELF interposition and -Bsymbolic for detail.
--exclude-libs
If a matched archive defines a non-local symbol, don't export this symbol.
--export-dynamic
Shared objects预设导出所有non-local
STV_DEFAULT/STV_PROTECTED定义符号到dynamic symbol
table。可执行档可用--export-dynamic模拟shared
objects行为。
下面描述可执行档/shared object里一个符号被导出的规则(logical AND):
- non-local
STV_DEFAULT/STV_PROTECTED(this means it can be hid by--exclude-libs) - logical OR of the following:
- undefined
- (
--export-dynamic||-shared) && ! (unnamed_addr linkonce_odr GlobalVariable || local_unnamed_addr linkonce_odr constant GlobalVariable) - matched by
--dynamic-list/--export-dynamic-symbol-list/--export-dynamic-symbol - defined or referenced by a shared object as
STV_DEFAULT STV_PROTECTEDdefinition in a shared object preempted by copy relocation/canonical PLT when--ignore-{data,function}-address-equality}is specified-z ifunc-noplt&& has at least one relocation
如果可执行档定义了在某个链接时shared object引用了一个符号,那么链接器需要导出该符号,使得运行时该shared object的undefined符号可以绑定到可执行档中的定义。
--export-dynamic-symbol=glob,
--export-dynamic-symbol-list, and
--dynamic-list
These options have different semantics for an executable and a shared object.
- executable: put matched non-local defined symbols to the dynamic
symbol table (
--export-dynamicapplies to all non-local defined symbols.) - shared object: rerences to matched non-local
STV_DEFAULTsymbols shouldn't be bound to definitions within the shared object even if they would otherwise be due to-Bsymbolic,-Bsymbolic-functions, or--dynamic-list
--dynamic-list additionally implies
-Bsymbolic.
For the shared object case, I usually call this "make a symbol
preemptible": even if a symbolic intention option
(-Bsymbolic, -Bsymbolic-functions, or
--dynamic-list is in action, a matched symbol is NOT bound
locally.
One may use --export-dynamic-symbol=foo* to match all
non-local STV_DEFAULT symbols foo*. ld.lld
before 11 uses an exact match instead of a glob.
--export-dynamic-symbol-list is implemented since GNU
ld 2.35 and ld.lld
14.
--discard-none,
--discard-locals, and --discard-all
如果输出.symtab,一个live
section里定义的local符号被保留的条件是:
1 | if ((--emit-reloc or -r) && referenced) || --discard-none |
--no-undefined-version
If a version script specifies an exact pattern which does not match a defined symbol, report an error.
Say we have the following version script, if foo is not
a defined symbol, the linker will report an error. For a glob pattern
(e.g. bar*) matching no symbol, there is no error. This is
a compromise. 1
2
3
4v1 {
foo;
bar*;
};
GNU ld has supported --no-undefined-version since 2002-08,
but --undefined-version was a late addition in 2022-10
(milestone: binutils 2.40).
--strip-all
不要创建.strtab和.symtab。
-u symbol
若某个archive file定义了-u指定的符号则pull(由archive
file转换为object file,之后该文件就和一般的.o相同)。
比如:ld -u foo ... a.a。若a.a不定义被之前object
files引用的符号,a.a不会被pull。
如果指定了-u foo,那么a.a中定义了foo的archive
member会被pull。
-u的另一个作用是指定一个GC root。
--version-script=script
Version script有三个用途:
- 定义versions
- 指定一些模式,使得匹配的、定义的、unversioned的符号具有指定的version
- Local
version:
local:可以改变匹配的、定义的、unversioned的符号的binding为STB_LOCAL,不会导出到dynamic symbol table
Symbol versioning描述了具体的symbol versioning机制。
-y symbol
常用于调试。输出指定符号在哪里被引用、哪里被定义。
-z muldefs
允许重复定义的符号。链接器预设不允许两个同名的non-local regular definitions(非weak、非common)。
-z unique-symbol
Rename local symbols so that there are no duplicates.
Some Intel folks were working on function granular kernel address space layout randomization and wanted such a feature (https://sourceware.org/bugzilla/show_bug.cgi?id=26391). GNU ld since 2.36 supports this option. I closed the ld.lld feature request.
I don't think this is a good design.
First, the stability problem. Say, the old kernel has
foo.1 foo.2. If there is a new local foo
symbol, the new kernel will have foo.1 foo.2 foo.3.
However, The new symbols don't necessarily correspond to local symbols
of the same names in the old kernel. Such disturbence is probably more
likely with LTO or PGO. For Clang LTO, the kernel Makefile currently
specifies -mllvm -import-instr-limit=5. If a function close
to the boundary happens to cross the boundary, if inlined into other
translation units, the stability issue may affect many translation
units.
The implementation has to perform an iteration on all local symbols, which can affect link speed.
In addition, the .[0-9]+ scheme has been used by C++
mangling. The Itanium C++ ABI says "A
1 | % c++filt <<< $'_ZL3foov\n_ZL3foov.1' |
As an alternative, I suggest that the FGASLR developer uses the
STT_FILE symbol: 1
2
3
4STT_FILE a.c
STT_NOTYPE foo
STT_FILE b.c
STT_NOTYPE foo
The ELF specification says:
Conventionally, the symbol's name gives the name of the source file associated with the object file. A file symbol has STB_LOCAL binding, its section index is SHN_ABS, and it precedes the other STB_LOCAL symbols for the file, if it is present.
I mentioned my concern on a reply to [PATCH v9 02/15] livepatch: use `-z unique-symbol` if available to nuke pos-based search.
Library相关
--as-needed and
--no-as-needed
防止一些没有用到的链接时shared
objects留下DT_NEEDED。
--as-needed和--no-as-needed是position-dependent选项(非正式叫法,但没找到更贴切的形容词),影响后面命令行出现的shared
objects。一个shared object is needed,如果下面条件之一成立:
- 在命令行中至少一次出现在
--no-as-needed模式下 - 定义了一个被.o live section non-weak引用的符号。也就是说,weak定义仍可能被认为是unneeded。--gc-sections丢弃的section的引用不算
-Bdynamic and
-Bstatic
这两个选项是position-dependent选项,影响后面命令行出现的-lname。
-Bdynamic(default):在-L指定的目录列表中查找libfoo.so和libfoo.a-Bstatic:在-L指定的目录列表中查找libfoo.a
注意,历史上GNU
ld里-Bstatic和-static同义。编译器driver的-static是个不同的选项,除了传递-static给ld外,还会去除预设的--dynamic-linker,影响libgcc
libc等的链接。
--no-dependent-libraries
忽略object files里的.deplibs section。
-soname=name
设置生成的shared object的dynamic
table中的DT_SONAME。
链接器会记录链接时shared objects,在生成的可执行档/shared
object的dynamic
table中用一条DT_NEEDED记录描述每一个链接时shared
object。
- 若该shared
object含有
DT_SONAME,该字段提供`DT_NEEDED的值 - 否则,若通过
-l链接,值为去除目录后的文件名 - 否则值为路径名(绝对/相对路径有差异)
比如:ld -shared -soname=a.so.1 a.o -o a.so; ld b.o ./a.so,a.out的DT_NEEDED为a.so.1。如果第一个命令不含-soname,则a.out的DT_NEEDED为./a.so。
--start-group and
--end-group
如果A.a和B.a有相互引用,且不能确定哪一个会被先pull
into the link,得使用这对选项。下面给出一个例子:
对于一个archive链接顺序:main.o A.a B.a,假设main.o引用了B.a,而A.a没有满足之前的某个undefined符号,那么该链接顺序会导致错误。
链接顺序换成main.o B.a A.a行不行呢?如果main.o变更后引用了A.a,而B.a没有满足之前的某个undefined符号,那么该链接顺序也会导致错误。
一种解决方案是main.o A.a B.a A.a。很多情况下重复一次就够了,但是假如链接第一个A.a时仅加载了A.a(a.o),链接B.b时仅加载了B.a(b.o),链接第二个A.a时仅加载了A.a(c.o)且A.a(c.o)需要B.a中的另一个member,该链接顺序仍会导致undefined
symbol错误。
我们可以再重复一次B.a,即main.o A.a B.a A.a B.a,但更好的解决方案是main.o --start-group A.a B.a --end-group。
--start-lib and
--end-lib
gold发明的很有用的功能,可以代替thin archive。使regular object files有类似archive files的语义(按需加载)。
下文的--whole-archive用于.a,而--start-lib则用于.o:
ld ... --start-lib b.o c.o --end-lib作用类似ld ... a.a,如果a.a包含b.o c.o。
我提交了一个GNU ld的feature request:https://sourceware.org/bugzilla/show_bug.cgi?id=24600
--sysroot
和GCC/Clang driver的--sysroot不同。如果一个linker
script在sysroot目录下,它打开绝对路径文件(INPUT or
GROUP)时,在绝对路径前加上sysroot。
--whole-archive
and --no-whole-archive
--whole-archive选项后的.a会当作.o一样处理,没有惰性语义。
如果a.a包含b.o c.o,那么ld --whole-archive a.a --no-whole-archive和ld b.o c.o作用相同。
--push-state and
--pop-state
-Bstatic, --whole-archive, --as-needed等都是表示boolean状态的position-dependent选项。--push-state可以保存这些选项的boolean状态,--pop-state则会还原。
在链接命令行插入新选项里变更状态时,通常希望能还原,这个时候就可以用--push-state和--pop-state。
比如确保链接libc++.a和libc++abi.a可以用-Wl,--push-state,-Bstatic -lc++ -lc++abi -Wl,--pop-state。
依赖关系相关
-z defs and
-z undefs
遇到来自regular
objects的不能解析的undefined符号(不能在链接时绑定到可执行档或一个链接时shared
object中的定义),是否报错。可执行档预设为-z defs/--no-undefined(不允许),而shared
objects预设为-z undefs(允许)。
很多构建系统会启用-z defs,要求shared
objects在链接时指定所有依赖(link what you use)。
--allow-shlib-undefined
and --no-allow-shlib-undefined
遇到来自shared
objects的不能解析的undefined符号,是否报错。可执行档预设为--no-allow-shlib-undefined(不允许),而shared
objects预设为--allow-shlib-undefined(允许)。
对于如下代码,链接可执行档时会报错: 1
2
3
4
5
6
7// a.so
void f();
void g() { f(); }
// exe
void g()
int main() { g(); }
如果启用--allow-shlib-undefined,链接会成功,但ld.so会在运行时报错,在glibc中为:symbol lookup error: ... undefined symbol:。
GNU ld有个复杂的算法查找transitive closure,只有transitive
closure的shared objects都无法解析一个undefined符号时才会报错。
gold和ld.lld使用一个简化的规则:如果一个shared
object的所有DT_NEEDED依赖都被直接链接了,则启用报错;如果部分依赖没有被链接,那么gold/ld.lld无法准确判断是否一个未被直接链接的shared
object能提供定义,就保守地不报错。
值得一提的是,-z defs/-z undefs/--no-undefined和--[no-]allow-shlib-undefined可以被一个选项--unresolved-symbols控制。
--warn-backrefs
ld.lld特有,参见http://lld.llvm.org/ELF/warn_backrefs.html。
Layout相关
--no-rosegment
ld.lld采用两个RW PT_LOAD的设计:
- R
PT_LOAD - RX
PT_LOAD - RW
PT_LOAD(和PT_GNU_RELRO重叠) - RW
PT_LOAD
指定该选项可以合并R PT_LOAD和RX
PT_LOAD。
-z separate-loadable-segments
ld.lld传统布局:所有PT_LOAD
segments都没有重叠(一个字节不会被同时加载到两个memory mappings)。
实现方式是每个新PT_LOAD的地址对齐到max-page-size。ld.lld预设有4个PT_LOAD(R,RX,RW(RELRO),RW(non-RELRO)),在输出文件里三次对齐都可能浪费一些字节。
在AArch64和PowerPC上因为ABI指定的max-page-size较大(65536),最多可浪费65536*3字节。
-z separate-code
binutils 2.31引入,在Linux/x86上为预设。GNU ld采用:
- R
PT_LOAD - RX
PT_LOAD - R
PT_LOAD - RW
PT_LOAD- 前缀部分为
PT_GNU_RELRO - 非
PT_GNU_RELRO的部分
- 前缀部分为
separate-code的含义是文件中一个被映射到可执行段的字节(RX
PT_LOAD)不会被同时映射到一个R PT_LOAD。
注意RX后的R是不忧的,理想情况是把这个R和第一个R合并,但似乎在GNU
ld里实现会很困难。
我在ld.lld 10引入该选项,语义和GNU
ld类似但布局不同(没有必要模仿两个R的非优布局):两个RW
PT_LOAD允许重叠,也就是说第二个PT_LOAD的地址不用对齐,最多可浪费max-page-size*2字节。
-z noseparate-code
经典布局,允许可执行段和其他PT_LOAD重叠。GNU
ld通常用:
- RX
PT_LOAD - RW
PT_LOAD- 前缀部分为
PT_GNU_RELRO。这部分在ld.so解析完dynamic relocations后mprotect成readonly - 非
PT_GNU_RELRO的部分。这部分在运行时始终可写
- 前缀部分为
第一个PT_LOAD常被笼统的称为text
segment,实际上不准确:非执行部分的rodata也在里面。
ld.lld 10中预设使用这种布局,不需要对齐任何PT_LOAD。
Relocation相关
--apply-dynamic-relocs
对于psABI采用RELA的architectures(AArch64,PowerPC,RISC-V,x86-64,etc),因为dynamic relocations包含addend字段,链接器在被relocate的地址填上0,而不是addend值。 如果可执行档/shared objects使用压缩,能稍稍利于压缩。
--apply-dynamic-relocs is supported for all ports in
ld.lld. As of August 2023, only
the aarch64 port of GNU ld supports
--apply-dynamic-relocs.
--emit-relocs
可用于-no-pie/-pie/-shared获得类似-r的效果:保留输入的relocations。可用于链接后的二进制分析,我知道的唯二用途是Linux
kernel x86的CONFIG_RELOCATABLE和BOLT。
--pack-dyn-relocs=value
relr可以启用DT_RELR,一种更加紧凑的relative
relocation (R_*_RELATIVE)编码方式。Relative
relocations常见于-pie链接的可执行档。
-z text and
-z notext
-z text不允许text relocations。
-z notext允许text relocations。
binutils 2.35起,Linux/x86上的GNU
ld预设启用configure-time选项--enable-textrel-check={warning,error},若有text
relocations会给出warning/error。
Text relocations这个概念的用词不准确,实际含义是作用在readonly
sections上的dynamic relocations的总称。
.o中的relocations如果不能在链接时确定值,就需要转换成dynamic
relocations在运行时由ld.so计算(type和.o中相同)。
如果作用的section没有SHF_WRITE标志,ld.so就得临时执行mprotect变更memory
maps的权限、修改、再还原之前的只读权限,这样就妨碍了page sharing。
Shared objects形成text relocations的情况比可执行档多。 可执行档有canonical PLT和copy relocations可以避免某些text relocations。
不同链接器在不同架构上允许的text relocations的relocation
types不同。GNU ld会允许一些glibc ld.so支持的types。
在x86-64上,链接器都会允许R_X86_64_64和R_X86_64_PC64。
下面的汇编程序里defined_in_so是定义在某个shared
object的符号。注释里给出每种text relocation的场景。
1 | .globl global |
在-no-pie或-pie模式下,根据defined_in_so的符号类型,链接器会作出不同选择:
STT_FUNC: 产生canonical PLTSTT_OBJECT: 产生copy relocationSTT_NOTYPE:GNU ld会产生copy relocation。ld.lld会产生text relocation
Section相关
--gc-sections
非常常见的选项。编译时指定-ffunction-sections或-fdata-sections才有效果。链接器会做liveness
analysis从输出中去除没有用的sections。
GC roots:
--entry/--init/--fini/-u指定的所有定义符号所在的sections- Linker script表达式被引用的定义符号所在的sections
.dynsym中的所有定义符号所在的sections- 类型为
SHT_PREINIT_ARRAY/SHT_INIT_ARRAY/SHT_FINI_ARRAY - 名称为
.ctors/.dtors/.init/.fini/.jcr - 不在section group中的
SHT_NOTE(这个section group规则是为了Fedora watermark) - 被
.eh_frame引用的personality routines和language-specific data area
--icf=all
--icf=safe
启用Identical Code Folding。这个名称其实不准确:(1) 适用于readonly data;(2) 合并的单位是section,而不是函数。
对于一组相同的sections,选择一个作为代表,丢弃其余的sections,然后把relocation重定向到代表section。
gold实现了基于relocation的--icf=safe;ld.lld实现了基于LLVM
address significance table的--icf=safe。
--symbol-ordering-file=<file>
指定一个文本文件,每行一个定义的符号。如果符号A在符号B前面,那么在每一个input section description进行排序,A所在的section排在B所在的section前面。
如果一个符号未定义,或者所在的section被丢弃,链接器会输出一个warning,除非指定了--no-warn-symbol-ordering。
如果一个函数频繁调用另一个,在linked image中如果让两个函数所在的input sections接近,可以增大它们落在同一个page的概率,减小page working set及减少TLB thrashing。参见Karl Pettis and Robert C. Hansen的 Profile Guided Code Positioning
这个选项是ld.lld特有的。gold有一个--section-ordering-file,根据section
name排序。实践中要求text/data
sections具有不同的名字(不可使用clang -funique-section-names)。
而基于符号名排序则可以使用-funique-section-names。
分析相关
--cref
输出cross reference table。对于每一个non-local符号,输出定义的文件和被引用的文件列表。
-M and
-Map=<file>
输出link map,可以查看output sections的地址、文件偏移、包含的input sections。
Warning相关
--fatal-warnings
把warnings转成errors。Warning和error的差别除了是否包含warning或error字串外更重要的一点是,error会阻止输出链接结果。
--noinhibit-exec
把部分errors转成warnings。注意不要指定--fatal-warnings把降级的warnings再升级为errors:)
其他
--build-id=value
生成.note.gnu.build-id,标识一个链接结果。一般用SHA-1。链接器会给.note.gnu.build-id的区域填零,散列每个字节后把结果填回.note.gnu.build-id。
每个链接器用的计算方式各有不同。
--compress-debug-sections=[zlib|zstd]
用zlib压缩输出文件的.debug_*
sections,并标记SHF_COMPRESSED。SHF_COMPRESSED是合并入ELF
specification的最后一个feature,之后ELF
specification就处于不被维护的状态……
--hash-style
--hash-style=sysv指定ELF
specification定义的DT_HASH,一个用于加速符号解析的hash
table。
DT_GNU_HASH在空间占用和效率都优于DT_HASH。
指的一提的是Mips有个DT_MIPS_XHASH(Mips
ABI设计聪明反被聪明误的好例子),我个人觉得在解决一个错误的问题。实际上有办法用DT_GNU_HASH,但可能Mips社区的人觉得东西塞进去了就不想多管了。
--no-ld-generated-unwind-info
参见PR12570 .plt has no associated .eh_frame/.debug_frame。
PC在PLT
entry中时,如果链接器不合成.eh_frame信息,unwinder可能会无法正确unwind。
在i386和x86-64上,lazy binding状态下,一个PLT
entry的首次调用会执行push指令。在ESP/RSP改变后,如果PLT
entry没有.eh_frame提供的unwind信息,unwinder可能会无法正确unwind,影响profiler精度。
1 | jmp *got(%rip) |
However, I think this feature is obsoleted and irrelevant nowadays. To recognize the PLT name, a profiler needs to do:
- Parse the
.pltsection to know the region of PLT entries - Parse
.rel[a].pltto getR_*_JUMP_SLOTdynamic relocations and their referenced symbol names. - If the current PC is within the PLT region, parse nearly
instructions and find the GOT load. The associated
R_*_JUMP_SLOTidentifies the symbol name. - Concatenate the symbol name and
@pltto formfoo@plt
Note: foo@plt is a convention used by some tools, but it
is not a name in the symbol table.
GDB有heuristics可以识别这种情况。
这个问题不会影响C++ exception。PLT entry是tail
call,__cxa_throw调用的_Unwind_RaiseException会穿透ld.so
resolver和PLT entry的tail calls。 PC会还原为PLT
entry的caller的下一条指令。
1 | // b.cc - b.so |
-O
优化等级,和编译器driver选项-O不同。
在ld.lld中,-O0禁用SHF_MERGE的常量合并;-O2启用SHF_MERGE|SHF_STRINGS的string
suffix
merge,--compress-debug-sections=zlib使用较高压缩比的zlib压缩。
-plugin file
GNU ld和gold支持这个选项加载GCC
LTO插件(liblto_plugin.so)或LLVM
LTO插件(LLVMgold.so)
插件的API接口由binutils-gdb/include/plugin-api.h定义。
注意,LLVMgold.so的名称含gold,但也能用于GNU binutils
(ld, gold, nm, ar)和mold。