Metadata sections, COMDAT and SHF_LINK_ORDER

Metadata sections

Many compiler options intrument text sections or annotate text sections, and need to create a metadata section for (almost) every text section. Such metadata sections have the following property:

  • All relocations from the metadata section reference the associated text section or (if present) the associated auxiliary metadata sections.

In many applications there is no auxiliary metadata section.

Without inlining (discussed in detail later), many sections additionally have this following property:

  • The metadata section is only referenced by the associated text section or not referenced at all.

Below is an example:

1
2
3
4
.section .text.foo,"ax",@progbits

.section .meta.foo,"a",@progbits
.quad .text.foo-. # PC-relative relocation

Real world examples include:

  • non-SHF_ALLOC: .debug_* (DWARF debugging information), .stack_sizes (stack sizes)
  • SHF_ALLOC, not referenced via relocation by code: .eh_frame (unwind table), .gcc_except_table (language-specific data area for exception handling), __patchable_function_entries (-fpatchable-function-entry=)
  • SHF_ALLOC, referenced via relocation by code: __llvm_prf_cnts/__llvm_prf_data (clang -fprofile-generate/-fprofile-instr-generate), __sancov_bools (clang -fsanitize-coverage=inline-bool-flags), __sancov_cntrs (clang -fsanitize-coverage=inline-8bit-counters), __sancov_guards (clang -fsanitize-coverage=trace-pc-guard)

Non-SHF_ALLOC metadata sections need to use absolute relocation types. There is no program counter concept for a section not loaded into memory, so PC-relative relocations cannot be used.

1
2
3
4
# Without 'w', text relocation.
.section .meta.foo,"",@progbits
.quad .text.foo # link-time constant
# Absolute relocation types have different treatment in SHF_ALLOC and non-SHF_ALLOC sections.

For SHF_ALLOC sections, PC-relative relocations are recommended. If absolute relocations (with the width equaling the word size) are used, R_*_RELATIVE dynamic relocations will be produced and the section needs to be writable.

1
2
3
4
5
6
.section .meta.foo,"a",@progbits
.quad .text.foo-. # link-time constant

# Without 'w', text relocation.
.section .meta.foo,"aw",@progbits
.quad .text.foo # R_*_RELATIVE dynamic relocation if -pie or -shared

C identifier name sections

The runtime usually needs to access all the metadata sections. Metadata section names typically consist of pure C-like identifier characters (isalnum characters in the C locale plus _) to leverage a linker magic. Let's use the section name foo as an example.

  • If __start_foo is not defined, the linker defines it to the start of the output section foo.
  • If __stop_foo is not defined, the linker defines it to the end of the output section foo.

Garbage collection on metadata sections

Users want GC for metadata sections: if .text.foo is retained, meta (for .text.foo) is retained; if .text.foo is discarded, meta is discarded. There are three use cases:

  • If meta does not have the SHF_ALLOC flag, it is usually retained under --gc-sections. {alloc}
  • If meta has the SHF_ALLOC flag and .text.foo does not reference meta, meta will be discarded, because meta is not referenced by other sections (prerequisite). {nonalloc-noreloc}
  • If meta has the SHF_ALLOC flag and .text.foo references meta, traditional GC semantics work as intended. {nonalloc-reloc}

The first case is undesired, because the metadata section is unnecessarily retained. The second case has a more serious correctness issue.

To make the two cases work, we can place .text.foo and meta in a section group. If .text.foo is already in a COMDAT group, we can place meta into the same group; otherwise we can create a non-COMDAT section group (LLVM>=13.0.0, comdat noduplicates support for ELF).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Zero flag section group
.section .text.foo,"aG",@progbits,foo
.globl foo
foo:

.section .meta.foo,"a?",@progbits
.quad .text.foo-.


# GRP_COMDAT section group, common with C++ inline functions and template instantiations
.section .text.foo,"aG",@progbits,foo,comdat
.globl foo
foo:

.section .meta.foo,"a?",@progbits
.quad .text.foo-.

A section group requires an extra section header (usually named .group), which requires 40 bytes on ELFCLASS32 platforms and 64 bytes on ELFCLASS64 platforms. The size overhead is concerning in many applications, so people were looking for better representations. (AArch64 and x86-64 define ILP32 ABIs and use ELFCLASS32, but technically they can use ELFCLASS32 for small code model with regular ABIs, if the kernel allows.)

Another approach is SHF_LINK_ORDER. There are separate chapters introducing section groups (COMDAT) and SHF_LINK_ORDER in this article.

Metadata sections referenced by text sections

Let's discuss the third case in detail. We have these conditions:

  • The metadata sections have the SHF_ALLOC flag.
  • The metadata sections have a C identifier name, so that the runtime can collect them via __start_/__stop_ symbols.
  • Each text section references a metadata section.

Since the runtime uses __start_/__stop_, __start_/__stop_ references are present in a live section.

Now let's introduce the unfortunate special rule about __start_/__stop_:

  • If a live section has a __start_foo or __stop_foo reference, all foo input section will be retained by ld.bfd --gc-sections. Yes, all, even if the input section is in a different object file.
1
2
3
4
5
6
7
8
9
10
11
12
13
# a.s
.global _start
.text
_start:
leaq __start_meta(%rip), %rdi
leaq __stop_meta(%rip), %rsi

.section meta,"a"
.byte 0

# b.s
.section meta,"a"
.byte 1

a.o:(meta) and b.o:(meta) are not referenced via regular relocations. Nevertheless, they are retained by the __start_meta reference. (The __stop_meta reference can retain the sections as well.)

Now, it is natural to ask: how can we make GC for meta?

In LLD<=12, the user can set the SHF_LINK_ORDER flag, because the rule is refined:

__start_/__stop_ references from a live input section retains all non-SHF_LINK_ORDER C identifier name sections.

(Example SHF_LINK_ORDER C identifier name sections: __patchable_function_entries (-fpatchable-function-entry), __sancov_guards (clang -fsanitize-coverage=trace-pc-guard, before clang 13))

In LLD>=13, the user can also use a section group, because the rule is further refined:

__start_/__stop_ references from a live input section retains all non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections.

GNU ld does not implement the refinement (PR27259).

A section group has size overhead, so SHF_LINK_ORDER may be attempting. However, it ceases to be a solution when inlining happens. Let's walk through an example demonstrating the problem.

Our first design uses a plain meta for each text section. We use ,unique to keep separate sections, otherwise the assembler would combine meta into a monolithic section.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Monolithic meta.
.globl _start
_start:
leaq __start_meta(%rip), %rdi
leaq __stop_meta(%rip), %rsi
call bar

.section .text.foo,"ax",@progbits
.globl foo
foo:
leaq .Lmeta.foo(%rip), %rax
ret

.section .text.bar,"ax",@progbits
.globl bar
bar:
call foo
leaq .Lmeta.bar(%rip), %rax
ret

.section meta,"a",@progbits,unique,0
.Lmeta.foo:
.byte 0

.section meta,"a",@progbits,unique,1
.Lmeta.bar:
.byte 1

The __start_meta/__stop_meta references retain meta sections, so we add the SHF_LINK_ORDER flag to defeat the rule. Note: we can omit ,unique because sections with different linked-to sections are not combined by the assembler.

1
2
3
4
5
6
7
.section meta,"ao",@progbits,foo
.Lmeta.foo:
.byte 0

.section meta,"ao",@progbits,bar
.Lmeta.bar:
.byte 1

This works as long as inlining is not concerned.

However, in many instrumentations, the metadata references are created before inlining. With LTO, if the instrumentation is preformed before LTO, inlining can naturally happen after instrumentation. If foo is inlined into bar, the meta for .text.foo may get a reference from another text section .text.bar, breaking an implicit assumption of SHF_LINK_ORDER: a SHF_LINK_ORDER section can only be referenced by its linked-to section.

1
2
3
4
5
6
7
8
9
10
11
12
13
# Both .text.foo and .text.bar reference meta.
.section .text.foo,"ax",@progbits
.globl foo
foo:
leaq .Lmeta.foo(%rip), %rax
ret

.section .text.bar,"ax",@progbits
.globl bar
bar:
leaq .Lmeta.foo(%rip), %rax
leaq .Lmeta.bar(%rip), %rax
ret

Remember that _start calls bar but not foo, .text.bar (caller) will be retained while .text.foo (callee) will be discarded. The meta for foo will link to the discarded .text.foo. This will be recjected by linkers. LLD will report: {{.*}}:(meta): sh_link points to discarded section {{.*}}:(.text.foo).

Reflection

Here is the history behind the GNU ld rule.

LLD had dropped the behavior for a while until r294592 restored it. LLD refined the rule by excluding SHF_LINK_ORDER.

I am with Alan Modra in a 2010 comment:

I think this is a glibc bug. There isn't any good reason why a reference to a __start_section/__stop_section symbol in an output section should affect garbage collection of input sections, except of course that it works around this glibc --gc-sections problem. I can imagine other situations where a user has a reference to __start_section but wants the current linker behaviour.

Anyhow, GNU ld installed a workaround and made it apply to all C identifier name sections, not just the glibc sections.

Making each meta part of a zero flag section group can address this problem, but why do we need a section group to work around a problem which should not exist? I added -z start-stop-gc to LLD so that we can drop the rule entirely (D96914). In PR27451, Alan Modra and I implemented ld.bfd -z start-stop-gc.

Due to PR27491, in a -shared link, __start_xx undefined weak references may get spurious relocation R_X86_64_PC32 against undefined protected symbol `__start_xx' can not be used when making a shared object if all xx sections are discarded.

  • In 2021-04, the glibc bug https://sourceware.org/PR27492 was fixed by me.
  • In 2021-04, ld.lld defaulted to -z start-stop-gc but recognized __libc_ sections as a workaround for glibc libc.a.

What if all metadata sections are discarded?

You may see this: error: undefined symbol: __start_meta (LLD) or undefined reference to `__start_xx' (GNU ld).

One approach is to use undefined weak symbols:

1
__attribute__((weak)) extern const char __start_meta[], __stop_meta[];

Another is to ensure there is at least one live metadata section, by creating an empty section in the runtime. In binutils 2.36, GNU as introduced the flag R to represent SHF_GNU_RETAIN on FreeBSD and Linux emulations. I have added the support to LLVM integrated assembler and allowed the syntax on all ELF platforms.

1
.section meta,"aR",@progbits

With GCC>=11 or Clang>=13 (https://reviews.llvm.org/D97447), you can write:

1
2
__attribute__((retain,used,section("meta")))
static const char dummy[0];

The used attribute, when attached to a function or variable definition, indicates that there may be references to the entity which are not apparent in the source code. On COFF and Mach-O targets (Windows and Apple platforms), the used attribute prevents symbols from being removed by linker section GC. On ELF targets, GNU ld/gold/LLD may remove the definition if it is not otherwise referenced.

The retain attributed was introduced in GCC 11 to set the SHF_GNU_RETAIN flag on ELF targets.

The typical solution before SHF_GNU_RETAIN is:

1
2
3
asm(".pushsection .init_array,\"aw\",@init_array\n" \
".reloc ., R_AARCH64_NONE, meta\n" \
".popsection\n")

This idea is that SHT_INIT_ARRAY sections are GC roots. An empty SHT_INIT_ARRAY does not change the output. The artificial reference keeps meta live.

I added .reloc support for R_ARM_NONE/R_AARCH64_NONE/R_386_NONE/R_X86_64_NONE/R_PPC_NONE/R_PPC64_NONE in LLVM 9.0.0.

COMDAT

In C++, inline functions, template instantiations and a few other things can be defined in multiple object files but need deduplication at link time. In the dark ages the functionality was implemented by weak definitions: the linker does not report duplicate definition errors and resolves the references to the first definition. The downside is that unneeded copies remained in the linked image.

In Microsoft PE file format, the section flag (IMAGE_SCN_LNK_COMDAT) marks a section COMDAT and enables deduplication on a per-section basis (IMAGE_COMDAT_SELECT_NODUPLICATES can drop the deduplication requirement). The PE format interestingly does not need additional space to represent COMDAT sections. Every section has an associated symbol. This symbol has a section definition auxiliary record which has reserved Number/Selection fields.

If a text section needs a data section and deduplication is needed for both sections, you have two choices:

  • Use two COMDAT symbols. There is the drawback that deduplication happens independently for the interconnected sections.
  • Make the data section link to the text section via IMAGE_COMDAT_SELECT_ASSOCIATIVE. Whether an IMAGE_COMDAT_SELECT_ASSOCIATIVE section is retained is dependent on its referenced section.

There is a limitation: MSVC link.exe may report a duplicate symbol error for symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE, even if they would be discarded after handling the leader symbol.

In the GNU world, .gnu.linkonce. was invented to deduplicate groups with just one member. .gnu.linkonce. has been long obsoleted in favor of section groups but the usage has been daunting until 2020. Adhemerval Zanella removed the last live glibc use case for .gnu.linkonce. BZ #20543.

ELF section groups

The ELF specification generalized PE COMDAT to allow an arbitrary number of groups to be interrelated.

Some sections occur in interrelated groups. For example, an out-of-line definition of an inline function might require, in addition to the section containing its executable instructions, a read-only data section containing literals referenced, one or more debugging information sections and other informational sections. Furthermore, there may be internal references among these sections that would not make sense if one of the sections were removed or replaced by a duplicate from another object. Therefore, such groups must be included or omitted from the linked object as a unit. A section cannot be a member of more than one group.

According to "such groups must be included or omitted from the linked object as a unit", a linker's garbage collection feature must retain or discard the sections as a unit.

The most common section group flag is GRP_COMDAT, which makes the member sections similar to COMDAT in Microsoft PE file format, but can apply to multiple sections. (The committee borrowed the name "COMDAT" from PE.)

This is a COMDAT group. It may duplicate another COMDAT group in another object file, where duplication is defined as having the same group signature. In such cases, only one of the duplicate groups may be retained by the linker, and the members of the remaining groups must be discarded.

I want to highlight one thing GCC does (and Clang inherits) for backward compatibility: the definitions relatived to a COMDAT group member are kept STB_WEAK instead of STB_GLOBAL. The idea is that old toolchain which does not recognize COMDAT groups can still operate correctly, just in a degraded manner.

The section group flag can be 0: no signature based deduplication should happen.

In a generic-abi thread, Cary Coutant initially suggested to use a new section flag SHF_ASSOCIATED. HP-UX and Solaris folks objected to a new generic flag. Cary Coutant then discussed with Jim Dehnert and noticed that the existing (rare) flag SHF_LINK_ORDER has semantics closer to the metadata GC semantics, so he intended to replace the existing flag SHF_LINK_ORDER. Solaris had used its own SHF_ORDERED extension before it migrated to the ELF simplification SHF_LINK_ORDER. Solaris is still using SHF_LINK_ORDER so the flag cannot be repurposed. People discussed whether SHF_OS_NONCONFORMING could be repurposed but did not take that route: the platform already knows whether a flag is unknown and knowing a flag is non-conforming does not help produce better output. In the end the agreement was that SHF_LINK_ORDER gained additional metadata GC semantics.

The new semantics:

This flag adds special ordering requirements for link editors. The requirements apply to the referenced section identified by the sh_link field of this section's header. If this section is combined with other sections in the output file, the section must appear in the same relative order with respect to those sections, as the referenced section appears with respect to sections the referenced section is combined with.

A typical use of this flag is to build a table that references text or data sections in address order.

In addition to adding ordering requirements, SHF_LINK_ORDER indicates that the section contains metadata describing the referenced section. When performing unused section elimination, the link editor should ensure that both the section and the referenced section are retained or discarded together. Furthermore, relocations from this section into the referenced section should not be taken as evidence that the referenced section should be retained.

Actually, ARM EHABI has been using SHF_LINK_ORDER for index table sections .ARM.exidx*. A .ARM.exidx section contains a sequence of 2-word pairs. The first word is 31-bit PC-relative offset to the start of the region. The idea is that if the entries are ordered by the start address, the end address of an entry is implicitly the start address of the next entry and does not need to be explicitly encoded. For this reason the section uses SHF_LINK_ORDER for the ordering requirement. The GC semantics are very similar to the metadata sections'.

So the updated SHF_LINK_ORDER wording can be seen as recognition for the current practice (even though the original discussion did not actually notice ARM EHABI).

In GNU as, before version 2.35, SHF_LINK_ORDER could be produced by ARM assembly directives, but not specified by user-customized sections.

Implementation pitfalls

Mixed unordered and ordered sections

If an output section consists of only non-SHF_LINK_ORDER sections, the rule is clear: input sections are ordered in their input order. If an output section consists of only SHF_LINK_ORDER sections, the rule is also clear: input sections are ordered with respect to their linked-to sections.

What is unclear is how to handle an output section with mixed unordered and ordered sections.

GNU ld had a diagnostic: . LLD rejected the case as well error: incompatible section flags for .rodata.

When I implemented -fpatchable-function-entry= for Clang, I observed some GC related issues with the GCC implementation. I reported them and carefully chose SHF_LINK_ORDER in the Clang implementation if the integrated assembler is used.

This was a problem if the user wanted to place such input sections along with unordered sections, e.g. .init.data : { ... KEEP(*(__patchable_function_entries)) ... } (https://github.com/ClangBuiltLinux/linux/issues/953).

As a response, I submitted D77007 to allow ordered input section descriptions within an output section.

This worked well for the Linux kernel. Mixed unordered and ordered sections within an input section description was still a problem. This made it infeasible to add SHF_LINK_ORDER to an existing metadata section and expect new object files linkable with old object files which do not have the flag. I asked how to resolve this upgrade issue and Ali Bahrami responded:

The Solaris linker puts sections without SHF_LINK_ORDER at the end of the output section, in first-in-first-out order, and I don't believe that's considered to be an error.

So I went ahead and implemented a similar rule for LLD: D84001 allows arbitrary mix and places SHF_LINK_ORDER sections before non-SHF_LINK_ORDER sections.

If the linked-to section is discarded due to compiler optimizations

We decided that the integrated assembler allows SHF_LINK_ORDER with sh_link=0 and LLD can handle such sections as regular unordered sections (https://reviews.llvm.org/D72904).

If the linked-to section is discarded due to --gc-sections

You will see error: ... sh_link points to discarded section ....

A SHF_LINK_ORDER section has an assumption: it can only be referenced by its linked-to section. Inlining and the discussed __start_ rule can break this assumption.

Others

  • During --icf={safe,all}, SHF_LINK_ORDER sections are not eligible (conservative but working).
  • In relocatable output, SHF_LINK_ORDER sections cannot be combined by name.
  • When comparing two input sections with different linked-to output sections, use vaddr of output sections instead of section indexes. Peter Smith fixed this in https://reviews.llvm.org/D79286.

Case study

-fpatchable-function-entry=

A function section has a metadata section. No inlining.

SHF_LINK_ORDER is the perfect solution. A section group can be used, but that just adds size overhead.

clang -fprofile-generate and -fprofile-instr-generate

A function needs __llvm_prf_cnts, __llvm_prf_data and in some cases __llvm_prf_vals. Inlining may happen.

A function references its __llvm_prf_cnts and may reference its __llvm_prf_data if value profiling applies. The __llvm_prf_data references the text section, the associated __llvm_prf_cnts and the associated __llvm_prf_vals.

Because the __llvm_prf_cnts and the __llvm_prf_data may be referenced by more than one text section, SHF_LINK_ORDER is not a solution. We need to place the __llvm_prf_cnts, the __llvm_prf_data and (if present) the __llvm_prf_vals in one section group so that they will be retained or discarded as a unit. If the text section is already in a COMDAT group, we can reuse the group; otherwise we need to create a zero flag section group and optionally place the text section into the group. LLVM from 13.0.0 onwards will use a zero flag section group.

Note: due to the __start_ reference rule and the fact that the __llvm_prf_data references the text section, with GNU ld and gold all instrumented text sections cannot be discarded. There can be a huge size bloat. If you use GNU ld>=2.37, you can try -z start-stop-gc.

clang -fsanitize-coverage

Miscellaneous

Arm Compiler 5 splits up DWARF Version 3 debug information and puts these sections into comdat groups. On "monolithic input section handling", Peter Smith commented that:

We found that splitting up the debug into fragments works well as it permits the linker to ensure that all the references to local symbols are to sections within the same group, this makes it easy for the linker to remove all the debug when the group isn't selected.

This approach did produce significantly more debug information than gcc did. For small microcontroller projects this wasn't a problem. For larger feature phone problems we had to put a lot of work into keeping the linker's memory usage down as many of our customers at the time were using 32-bit Windows machines with a default maximum virtual memory of 2Gb.

COMDAT sections have size overhead on extra section headers. Developers may be tempted to decrease the overhead with SHF_LINK_ORDER. However, the approach does not work due to the ordering requirement. Considering the following fragments:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
header [a.o common]
- DW_TAG_compile_unit [a.o common]
-- DW_TAG_variable [a.o .data.foo]
-- DW_TAG_namespace [common]
--- DW_TAG_subprogram [a.o .text.bar]
--- DW_TAG_variable [a.o .data.baz]
footer [a.o common]
header [b.o common]
- DW_TAG_compile_unit [b.o common]
-- DW_TAG_variable [b.o .data.foo]
-- DW_TAG_namespace [common]
--- DW_TAG_subprogram [b.o .text.bar]
--- DW_TAG_variable [b.o .data.baz]
footer [b.o common]

DW_TAG_* tags associated with concrete sections can be represented with SHF_LINK_ORDER sections. After linking the sections will be ordered before the common parts.