GNU ld's output section layout is determined by a linker script,
which can be either internal (default) or external (specified with
-T
or -dT
). Within the linker script,
SECTIONS
commands define how input sections are mapped into
output sections.
Input sections not explicitly placed by SECTIONS
commands are termed "orphan
sections".
Orphan sections are sections present in the input files which are not explicitly placed into the output file by the linker script. The linker will still copy these sections into the output file by either finding, or creating a suitable output section in which to place the orphaned input section.
GNU ld's default behavior is to create output sections to hold these orphan sections and insert these output sections into appropriate places.
Orphan section placement is crucial because GNU ld's built-in linker
scripts, while understanding common sections like
.text
/.rodata
/.data
, are unaware
of custom sections. These custom sections should still be included in
the final output file.
- Grouping: Orphan input sections are grouped into orphan output sections that share the same name.
- Placement: These grouped orphan output sections are then inserted
into the output sections defined in the linker script. They are placed
near similar sections to minimize the number of
PT_LOAD
segments needed.
GNU ld's algorithm
GNU ld's orphan section placement algorithm is primarily specified
within ld/ldlang.c:lang_place_orphans
and
ld/ldelf.c:ldelf_place_orphan
.
lang_place_orphans
is a linker pass that is between
INSERT
processing and SHF_MERGE
section
merging.
The algorithm utilizes a structure (orphan_save
) to
associate desired BFD flags (e.g., SEC_ALLOC, SEC_LOAD
)
with special section names (e.g., .text, .rodata
) and a
reference to the associated output section. The associated output
section is initialized to the special section names (e.g.,
.text, .rodata
), if present.
For each orphan section:
- If an output section of the same name is present and
--unique
is not specified, the orphan section is placed in it. - Otherwise, GNU ld identifies the matching
orphan_save
element based on the section's flags. - If an associated output section exists related to the
orphan_save
element, the orphan section is placed after it. - Otherwise, heuristics are applied to place the orphan section after
a similar existing section. For example:
- .text-like sections follow
SHF_ALLOC|SHF_WRITE|SHF_EXECINSTR
sections. - .rodata-like sections follow .text-like sections.
- .tdata-like sections follow .data-like sections.
- .sdata-like sections follow .data-like sections.
- .data-like sections can follow .rodata-like sections.
- .text-like sections follow
- The associated output section is replaced with the new output section. The next orphan output section of similar flags will be placed after the current output section.
As a special case, if an orphan section is placed after the last
output section
(else if (as != snew && as->prev != snew)
), it
will be adjusted to be placed after all trailing commands
(sym = expr
, . = expr
, etc).
For example, custom code section mytext
(with
SHF_ALLOC | SHF_EXECINSTR
) would typically be placed after
.text
, and custom data section mydata (with
SHF_ALLOC | SHF_WRITE
) after .data
.
1 | static struct orphan_save hold[] = { |
Noteworthy details:
.interp
and .rodata
have the same BFD
flags, but they are anchors for different sections.
SHT_NOTE
sections go after .interp
, while
other read-only sections go after .rodata
.
Consider a scenario where a linker script defines .data
and .rw1
sections with identical BFD flags. If we have
orphan sections that share the same flags, GNU ld would insert these
orphans after .data
, even if it might seem more logical to
place them after .rw1
.
1 | .data : { *(.data .data.*) } |
Renaming the output section .data
will achieve the
desired placement:
1 | .mydata : { *(.data .data.*) } |
lld's algorithm
The LLVM linker lld implements a large subset of the GNU ld linker script. However, due to the complexity of GNU ld and lack of an official specification, there can be subtle differences in behavior.
While lld strives to provide a similar linker script behavior, it occasionally makes informed decisions to deviate where deemed beneficial. We balance compatibility with practicality and interpretability.
Users should be aware of these potential discrepancies when transitioning from GNU ld to lld, especially when dealing with intricate linker script features.
lld does not have built-in linker scripts. When no
SECTIONS
is specified, all input sections are orphan
sections.
Rank-based sorting
lld assigns a rank to each output section, calculated using various
flags like RF_NOT_ALLOC, RF_EXEC, RF_RODATA
, etc. Orphan
output sections are then sorted by these ranks.
1 | enum RankFlags { |
The important ranks are:
.interp
SHT_NOTE
- read-only (non-
SHF_WRITE
non-SHF_EXECINSTR
) SHF_EXECINSTR
SHF_WRITE
(RELRO,SHF_TLS
)SHF_WRITE
(RELRO, non-SHF_TLS
)SHF_WRITE
(non-RELRO)- Sections without the
SHF_ALLOC
flag
Special case: non-alloc sections
Non-alloc sections are placed at the end.
Finding the most similar section
For each orphan section, lld identifies the output section with the most similar rank. The similarity is determined by counting the number of leading zeros in the XOR of the two ranks.
1 | // We want to find how similar two ranks are. |
When multiple output sections share the maximum similarity with an orphan section, resolving the ambiguity is crucial. I refined the behavior for lld 19: if the orphan section's rank is not lower than the similar sections, the last similar section is chosen for placement.
1 | // Find the most similar output section as the anchor. Rank Proximity is a |
For example, when inserting .bss
orphan sections
(SHF_ALLOC|SHF_WRITE
, SHT_NOBITS
), lld should
find the last output section that carries the flags/type
SHF_ALLOC|SHF_WRITE
SHT_PROGBITS
.
1 | WA PROGBITS (not here) |
Placement decision
The orphan section is placed either before or after the most similar section, based on a complex rule involving:
- The relative ranks of the orphan and similar section.
- The presence of
PHDRS
orMEMORY
commands in the linker script. - Scanning backward or forward through the script for a suitable insertion point.
In essence:
- If the orphan section's rank is lower than the similar section's
rank, and no
PHDRS
orMEMORY
commands exist, it's placed before the similar section. - Otherwise, it's placed after the similar section, potentially skipping symbol assignments or output sections without input sections in the process.
1 | auto isOutputSecWithInputSections = [](SectionCommand *cmd) { |
Special case: last section
If the orphan section happens to be the last one, it's placed at the very end of the output, mimicking GNU ld's behavior for cases where the linker script fully specifies the beginning but not the end of the file.
Special case: skipping symbol assignments
It is common to surround an output section description with
encapsulation symbols. lld has a special case to not place orphans
between foo
and a following symbol assignment.
Backward scan example:
1 | previous_start = .; |
Forward scan example:
1 | similar0 : { *(similar0) } |
However, an assignment to the location counter serves as a barrier to stop the forward scan.
1 | previous_start = .; |
Special case: initial location counter
In addition, if there is a location counter assignment before the first output section, orphan sections cannot be inserted before the initial location counter assignment. This is to recognize the common pattern that the initial location counter assignments specifies the load address.
1 | sym0 = .; |
Presence of PHDRS
or
MEMORY
The presence of PHDRS
or MEMORY
commands
disallows lld to place the orphan section before the anchor. This
condition is introduced in two patches:
- [ELF] Avoid adding an orphan section to a less suitable segment
- [ELF] Better resemble GNU ld when placing orphan sections into memory regions
When a linker script defines PHDRS
, it typically
specifies the initial section within each PT_LOAD
segment.
These sections often have address requirements, indicated by a preceding
. = expr
statement. If an orphan section is associated with
such a section as its anchor, lld avoids inserting the orphan before the
anchor to maintain the intended segment structure and address
alignment.
For instance, consider this linker script excerpt: 1
2
3
4
5
6
7
8
9PHDRS {
...
rodata PT_LOAD;
}
SECTIONS {
...
. = ALIGN(CONSTANT(MAXPAGESIZE));
.rodata : { *(.rodata .rodata.*) } : rodata
Here, .rodata
is the first section in a
PT_LOAD
segment, and it's aligned to
MAXPAGESIZE
. If an orphan section is inserted before
.rodata
, it would inherit the previous segment's flags and
break the intended address requirement.
Program headers propagation
After orphan section placement, if the PHDRS
command is
specified, lld will propagate program headers to output sections that do
not specify :phdr
.
Case study
By employing this rank-based approach, lld provides an elegant
implementation that does not hard code specific section names (e.g.,
.text
/.rodata
/.data
). In GNU ld,
if you rename special section names
.text
/.rodata
/.data
in the linker
script, the output could become subtle different.
Orphan sections matching an output section name
The following output section description does not match
.foo
input sections, but .foo
orphan sections
will still be placed inside .foo
.
1 | .foo : { *(.bar) } |
Read-only sections
Among read-only sections (e.g., .dynsym
,
.dynstr
, .gnu.hash
, .rela.dyn
,
.rodata
, .eh_frame_hdr
,
.eh_frame
), lld prioritizes the placement of
SHT_PROGBITS
sections (.rodata
,
.eh_frame_hdr
, and .eh_frame
) closer to code
sections. This is achieved by assigning them a higher rank.
The rationale behind this design is to mitigate the risk of relocation
overflow in the absence of an explicit linker script.
These non-SHT_PROGBITS
sections do not contain
relocations to code sections and can be placed away from code
sections.
1 | .dynsym |
If a linker script explicitly includes a SECTIONS
command specifying .rodata
without mentioning other
read-only sections, orphan sections like .dynsym
might be
placed before .rodata
.
1 | .rodata : { *(.rodata .rodata.*) } |
This behavior can be further influenced by the presence of
PHDRS
commands. If an output
section phdr is specified with .rodata
, orphan sections
like .dynsym
would not be placed before
.rodata
, ensuring that the orphans would not affect the
flags of the preceding program header.
1 | PHDRS { |
Symbol assignment between two output sections
A symbol assignment placed between output sections can be interpreted in two ways: as marking the end of the preceding section or the start of the following section. lld doesn't attempt to guess the intended meaning, leading to potential ambiguity in scenarios like this:
1 | .data : { *(.data .data.*) } |
versus
1 | .data : { *(.data .data.*) } |
In both cases, lld might place SHF_ALLOC|SHF_WRITE
SHT_PROGBITS
orphan sections before .bss
,
potentially disrupting the intended behavior if the goal was to mark the
start of the .bss section with bss_start = .
.
To avoid this ambiguity and ensure consistent behavior, the recommended practice is to place symbol assignments within the output section descriptions (FreeBSD example):
1 | .data : { *(.data) } |
Portability
To maximize portability of linker scripts across different linkers, it's essential to establish clear boundaries for PT_LOAD segments. This can be achieved by:
- Explicit alignment: Utilizing
MAXPAGESIZE
alignment to distinctly separate sections within the linker script. - Anchoring sections: Ensuring that the first section in each
PT_LOAD
segment includes at least one input section, preventing ambiguous placement decisions by the linker. When thePHDRS
command is present, ensure that the first sections have:phdr
.
By adhering to these guidelines, you can reduce reliance on linker-specific orphan section placement algorithms, promoting consistency across GNU ld and lld.
When linking a regular position-dependent executable, you may also
supply a minimal linker script like the following for a
-no-pie
link:
1 | SECTIONS |
A better style may define .text
and .rodata
as well. This linker script works with both GNU ld and lld.
1 | clang -fuse-ld=bfd -Wl,-T,a.lds -no-pie a.c -o a.lld |
Disabling orphan sections
For projects that require absolute control over section placement,
GNU ld version 2.26 and later provides
--orphan-handling=[place|warn|error|discard]
. This allows
you to choose how orphan sections are handled:
- place (default): The linker places orphan sections according to its internal algorithm.
- warn: The linker places orphan sections but also issues warnings for each instance.
- error: The linker treats orphan sections as errors, preventing the linking process from completing.
- discard: The linker discards orphan sections entirely.
--unique
TODO