The .sframe
format is a lightweight alternative to
.eh_frame
and .eh_frame_hdr
designed for
efficient stack
unwinding. By trading some functionality and flexibility for
compactness, SFrame achieves significantly smaller size while
maintaining the essential unwinding capabilities needed by
profilers.
SFrame focuses on three fundamental elements for each function:
- Canonical Frame Address (CFA): The base address for stack frame calculations
- Return address
- Frame pointer
An .sframe
section follows a straightforward layout:
- Header: Contains metadata and offset information
- Auxiliary header (optional): Reserved for future extensions
- Function Descriptor Entries (FDEs): Array describing each function
- Frame Row Entries (FREs): Arrays of unwinding information per function
1 | struct [[gnu::packed]] sframe_header { |
While magic is popular choices for file formats, they deviate from established ELF conventions, which simplifies utilizes the section type for distinction.
The version field resembles the similar uses within DWARF section headers. SFrame will likely evolve over time, unlike ELF's more stable control structures. This means we'll probably need to keep producers and consumers evolving in lockstep, which creates a stronger case for internal versioning. An internal version field would allow linkers to upgrade or ignore unsupported low-version input pieces, providing more flexibility in handling version mismatches.
Data structures
Function Descriptor Entries (FDEs)
Function Descriptor Entries serve as the bridge between functions and their unwinding information. Each FDE describes a function's location and provides a direct link to its corresponding Frame Row Entries (FREs), which contain the actual unwinding data.
1 | struct [[gnu::packed]] sframe_func_desc_entry { |
The current design has room for optimization. The
sfde_func_num_fres
field uses a full 32 bits, which is
wasteful for most functions. We could use uint16_t
instead,
requiring exceptionally large functions to be split across multiple
FDEs.
It's important to note that SFrame's function concept represents code ranges rather than logical program functions. This distinction becomes particularly relevant with compiler optimizations like hot-cold splitting, where a single logical function may span multiple non-contiguous code ranges, each requiring its own FDE.
The padding field sfde_func_padding2
represents
unnecessary overhead in modern architectures where unaligned memory
access performs efficiently, making the alignment benefits
negligible.
To enable binary search on sfde_func_start_address
, FDEs
must maintain a fixed size, which precludes the use of variable-length
integer encodings like PrefixVarInt.
Frame Row Entries (FREs)
Frame Row Entries contain the actual unwinding information for specific program counter ranges within a function. The template design allows for different address sizes based on the function's characteristics.
1 | template <class AddrType> |
Each FRE contains variable-length stack offsets stored as trailing
data. The fre_offset_size
field determines whether offsets
use 1, 2, or 4 bytes (uint8_t
, uint16_t
, or
uint32_t
), allowing optimal space usage based on stack
frame sizes.
Architecture-specific stack offsets
SFrame adapts to different processor architectures by varying its offset encoding to match their respective calling conventions and architectural constraints.
x86-64
The x86-64 implementation takes advantage of the architecture's predictable stack layout:
- First offset: Encodes CFA as
BASE_REG + offset
- Second offset (if present): Encodes FP as
CFA + offset
- Return address: Computed implicitly as
CFA + sfh_cfa_fixed_ra_offset
(using the header field)
AArch64
AArch64's more flexible calling conventions require explicit return address tracking:
- First offset: Encodes CFA as
BASE_REG + offset
- Second offset: Encodes return address as
CFA + offset
- Third offset (if present): Encodes FP as
CFA + offset
The explicit return address encoding accommodates AArch64's variable stack layouts and link register usage patterns.
s390x
TODO
.eh_frame
and
.sframe
SFrame reduces size compared to .eh_frame
plus
.eh_frame_hdr
by:
- Eliminating
.eh_frame_hdr
through sortedsfde_func_start_address
fields - Replacing CIE pointers with direct FDE-to-FRE references
- Using variable-width
sfre_start_address
fields (1 or 2 bytes) for small functions - Storing start addresses instead of address ranges.
.eh_frame
address ranges - Start addresses in a small function use 1 or 2 byte fields, more
efficient than
.eh_frame
initial_location, which needs at least 4 bytes (DW_EH_PE_sdata4
). - Hard-coding stack offsets rather than using flexible register specifications
However, the bytecode design of .eh_frame
can sometimes
be more efficient than .sframe
, as demonstrated on
x86-64.
SFrame serves as a specialized complement to .eh_frame
rather than a complement replacement. The current version does not
include personality routines, Language Specific Data Area (LSDA)
information, or the ability to encode extra callee-saved registers.
While these constraints make SFrame ideal for profilers and debuggers,
they prevent it from supporting C++ exception handling, where
libstdc++/libc++abi requires the full .eh_frame
feature
set.
In practice, executables and shared objects will likely contain all three sections:
.eh_frame
: Complete unwinding information for exception handling.eh_frame_hdr
: Fast lookup table for.eh_frame
.sframe
: Compact unwinding information for profilers
The auxiliary header, currently unused, provides a pathway for future
enhancements. It could potentially accommodate .eh_frame
augmentation data such as personality routines, language-specific data
areas (LSDAs), and signal frame handling, bridging some of the current
functionality gaps.
Large text section support
The sfde_func_start_address
field uses a signed 32-bit
offset to reference functions, providing a ±2GB addressing range from
the field's location. This signed encoding offers flexibility in section
ordering-.sframe
can be placed either before or after text
sections.
However, this approach faces limitations with large binaries,
particularly when LLVM generates .ltext
sections for
x86-64. The typical section layout creates significant gaps between
.sframe
and .ltext
:
1 | .ltext // Large text section |
Linking and execution views
SFrame employs a unified indexed format across both relocatable files (linking view) and executable files (execution view). While this design consistency might look elegant, it introduces significant complications in the toolchain implementation.
The current Binutils implementation enforces a single element
structure within each .sframe
section, regardless of
whether it's in a relocatable object or final executable. This differs
from DWARF sections, which support multiple concatenated elements (each
with their own header and body) within a single section.
This design choice stems largely from Linux kernel requirements,
where kernel modules are relocatable files created with
ld -r
. The kernel's SFrame support expects each module to
contain a single, indexed format for efficient runtime processing.
GNU ld reflects this constraint by merging all input
.sframe
sections into a single indexed element in the
output, even when producing relocatable files. This deviates from relocatable linking
convention of suppressing synthetic section finalization.
I believe the next version should distinguish between linking and execution views:
- Linking view: Assemblers could produce a simpler format, omitting fields only needed for indexing
- Linkers should concatenate
.sframe
input sections by default. - A new option like
--sframe-index
could ask linkers to build the full indexed format when creating executables and shared object files, similar to--gdb-index
and--debug-names
. To work with Linux kernel,ld -r --sframe-index
should work as well.
Fields that could be omitted from the linking view include
index-specific metadata like sfh_num_fdes
,
sfh_num_fres
, sfh_fdeoff
,
sfh_freoff
.
Section group compliance issues
The current monolithic .sframe
design creates ELF
specification violations when dealing with COMDAT section groups. GNU
Assembler generates a single .sframe
section containing
relocations to STB_LOCAL
symbols from multiple text
sections, including those in different section groups.
This violates the ELF section group rule, which states:
A symbol table entry with
STB_LOCAL
binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.
The problem manifests when inline functions are deduplicated:
1 | cat > a.cc <<'eof' |
Linkers correctly reject this violation:
1 | % ld.lld a.o b.o |
(In 2020, I reported a similar
issue for GCC -fpatchable-function-entry=
.)
The solution requires restructuring the assembler's output strategy.
Instead of creating a monolithic .sframe
section, the
assembler should generate individual SFrame sections corresponding to
each text section. When a text section belongs to a COMDAT group, its
associated SFrame section must join the same group. For standalone text
sections, the SHF_LINK_ORDER
flag should establish the
proper association.
This approach would create multiple SFrame sections within
relocatable files, making the size optimization benefits of a simplified
linking view format even more compelling. While this comes with the
overhead of additional section headers (where each
Elf64_Shdr
consumes 64 bytes), it's a cost we should pay to
be a good ELF citizen. This reinforces the value of my section
header reduction proposal.
Linker relaxation considerations
Since .sframe
carries the SHF_ALLOC
flag,
it affects text section addresses and consequently influences linker
relaxation on architectures like RISC-V and LoongArch.
If variable-length encoding is introduced to the format,
.sframe
would behave as an address-dependent section
similar to .relr.dyn
. However, this dependency should not
pose significant implementation challenges.
Endianness considerations
The SFrame format currently supports endianness variants, which complicates toolchain implementation. While runtime consumers typically target a single endianness, development tools must handle both variants to support cross-compilation workflows.
The endianness discussion in The future of 32-bit support in the kernel reinforces my belief in preferring universal little-endian for new formats. A universal little-endian approach would reduce implementation complexity by eliminating the need for:
- Endianness-aware function calls like
read32le(config, p)
whereconfig->endian
specifies the object file's byte order - Template-based abstractions such as
template <class Endian>
that must wrap every data access function
Instead, toolchain code could use straightforward calls like
read32le(p)
, streamlining both implementation and
maintenance.
This approach remains efficient even on big-endian architectures like IBM z/Architecture and POWER. z/Architecture's LOAD REVERSED instructions, for instance, handle byte swapping with minimal overhead, often requiring no additional instructions beyond normal loads. While slight performance differences may exist compared to native endian operations, the toolchain simplification benefits generally outweigh these concerns.
1 |
|
However, I understand that my opinion is probably not popular within the object file format community and faces resistance from stakeholders with significant big-endian investments.
Summary
SFrame represents a pragmatic approach to stack unwinding that trades flexibility for compactness. I am eager to learn the pros and cons versus frame pointer based unwinding.
While SFrame successfully reduces size compared to
.eh_frame
and .eh_frame_hdr
for profiling use
cases, several design decisions create implementation complexity that
should be addressed in future versions:
- Uncertainty remains about SFrame's potential as a full
.eh_frame
replacement - The unified linking/execution view complicates toolchain implementation unnecessarily
- Section group compliance issues present significant concerns for linker developers
- Lack of large text section support