Remarks on SFrame

The .sframe format is a lightweight alternative to .eh_frame and .eh_frame_hdr designed for efficient stack unwinding. By trading some functionality and flexibility for compactness, SFrame achieves significantly smaller size while maintaining the essential unwinding capabilities needed by profilers.

SFrame focuses on three fundamental elements for each function:

  • Canonical Frame Address (CFA): The base address for stack frame calculations
  • Return address
  • Frame pointer

An .sframe section follows a straightforward layout:

  • Header: Contains metadata and offset information
  • Auxiliary header (optional): Reserved for future extensions
  • Function Descriptor Entries (FDEs): Array describing each function
  • Frame Row Entries (FREs): Arrays of unwinding information per function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
struct [[gnu::packed]] sframe_header {
struct {
uint16_t sfp_magic;
uint8_t sfp_version;
uint8_t sfp_flags;
} sfh_preamble;
uint8_t sfh_abi_arch;
int8_t sfh_cfa_fixed_fp_offset;
// Used by x86-64 to define the return address slot relative to CFA
int8_t sfh_cfa_fixed_ra_offset;
// Size in bytes of the auxiliary header, allowing extensibility
uint8_t sfh_auxhdr_len;
// Numbers of FDEs and FREs
uint32_t sfh_num_fdes;
uint32_t sfh_num_fres;
// Size in bytes of FREs
uint32_t sfh_fre_len;
// Offsets in bytes of FDEs and FREs
uint32_t sfh_fdeoff;
uint32_t sfh_freoff;
};

While magic is popular choices for file formats, they deviate from established ELF conventions, which simplifies utilizes the section type for distinction.

The version field resembles the similar uses within DWARF section headers. SFrame will likely evolve over time, unlike ELF's more stable control structures. This means we'll probably need to keep producers and consumers evolving in lockstep, which creates a stronger case for internal versioning. An internal version field would allow linkers to upgrade or ignore unsupported low-version input pieces, providing more flexibility in handling version mismatches.

Data structures

Function Descriptor Entries (FDEs)

Function Descriptor Entries serve as the bridge between functions and their unwinding information. Each FDE describes a function's location and provides a direct link to its corresponding Frame Row Entries (FREs), which contain the actual unwinding data.

1
2
3
4
5
6
7
8
9
10
11
12
13
struct [[gnu::packed]] sframe_func_desc_entry {
int32_t sfde_func_start_address;
uint32_t sfde_func_size;
uint32_t sfde_func_start_fre_off;
uint32_t sfde_func_num_fres;
// bits 0-3 fretype: sfre_start_address type
// bit 4 fdetype: SFRAME_FDE_TYPE_PCINC or SFRAME_FDE_TYPE_PCMASK
// bit 5 pauth_key: (AArch64 only) the signing key for the return address
uint8_t sfde_func_info;
// The size of the repetitive code block for SFRAME_FDE_TYPE_PCMASK; used by .plt
uint8_t sfde_func_rep_size;
uint16_t sfde_func_padding2;
};

The current design has room for optimization. The sfde_func_num_fres field uses a full 32 bits, which is wasteful for most functions. We could use uint16_t instead, requiring exceptionally large functions to be split across multiple FDEs.

It's important to note that SFrame's function concept represents code ranges rather than logical program functions. This distinction becomes particularly relevant with compiler optimizations like hot-cold splitting, where a single logical function may span multiple non-contiguous code ranges, each requiring its own FDE.

The padding field sfde_func_padding2 represents unnecessary overhead in modern architectures where unaligned memory access performs efficiently, making the alignment benefits negligible.

To enable binary search on sfde_func_start_address, FDEs must maintain a fixed size, which precludes the use of variable-length integer encodings like PrefixVarInt.

Frame Row Entries (FREs)

Frame Row Entries contain the actual unwinding information for specific program counter ranges within a function. The template design allows for different address sizes based on the function's characteristics.

1
2
3
4
5
6
7
8
9
template <class AddrType>
struct [[gnu::packed]] sframe_frame_row_entry {
// If the fdetype is SFRAME_FDE_TYPE_PCINC, this is an offset relative to sfde_func_start_address
AddrType sfre_start_address;
// bit 0 fre_cfa_base_reg_id: define BASE_REG as either FP or SP
// bits 1-4 fre_offset_count: typically 1 to 3, describing CFA, FP, and RA
// bits 5-6 fre_offset_size: byte size of offset entries (1, 2, or 4 bytes)
sframe_fre_info sfre_info;
};

Each FRE contains variable-length stack offsets stored as trailing data. The fre_offset_size field determines whether offsets use 1, 2, or 4 bytes (uint8_t, uint16_t, or uint32_t), allowing optimal space usage based on stack frame sizes.

Architecture-specific stack offsets

SFrame adapts to different processor architectures by varying its offset encoding to match their respective calling conventions and architectural constraints.

x86-64

The x86-64 implementation takes advantage of the architecture's predictable stack layout:

  • First offset: Encodes CFA as BASE_REG + offset
  • Second offset (if present): Encodes FP as CFA + offset
  • Return address: Computed implicitly as CFA + sfh_cfa_fixed_ra_offset (using the header field)

AArch64

AArch64's more flexible calling conventions require explicit return address tracking:

  • First offset: Encodes CFA as BASE_REG + offset
  • Second offset: Encodes return address as CFA + offset
  • Third offset (if present): Encodes FP as CFA + offset

The explicit return address encoding accommodates AArch64's variable stack layouts and link register usage patterns.

s390x

TODO

.eh_frame and .sframe

SFrame reduces size compared to .eh_frame plus .eh_frame_hdr by:

  • Eliminating .eh_frame_hdr through sorted sfde_func_start_address fields
  • Replacing CIE pointers with direct FDE-to-FRE references
  • Using variable-width sfre_start_address fields (1 or 2 bytes) for small functions
  • Storing start addresses instead of address ranges. .eh_frame address ranges
  • Start addresses in a small function use 1 or 2 byte fields, more efficient than .eh_frame initial_location, which needs at least 4 bytes (DW_EH_PE_sdata4).
  • Hard-coding stack offsets rather than using flexible register specifications

However, the bytecode design of .eh_frame can sometimes be more efficient than .sframe, as demonstrated on x86-64.


SFrame serves as a specialized complement to .eh_frame rather than a complement replacement. The current version does not include personality routines, Language Specific Data Area (LSDA) information, or the ability to encode extra callee-saved registers. While these constraints make SFrame ideal for profilers and debuggers, they prevent it from supporting C++ exception handling, where libstdc++/libc++abi requires the full .eh_frame feature set.

In practice, executables and shared objects will likely contain all three sections:

  • .eh_frame: Complete unwinding information for exception handling
  • .eh_frame_hdr: Fast lookup table for .eh_frame
  • .sframe: Compact unwinding information for profilers

The auxiliary header, currently unused, provides a pathway for future enhancements. It could potentially accommodate .eh_frame augmentation data such as personality routines, language-specific data areas (LSDAs), and signal frame handling, bridging some of the current functionality gaps.

Large text section support

The sfde_func_start_address field uses a signed 32-bit offset to reference functions, providing a ±2GB addressing range from the field's location. This signed encoding offers flexibility in section ordering-.sframe can be placed either before or after text sections.

However, this approach faces limitations with large binaries, particularly when LLVM generates .ltext sections for x86-64. The typical section layout creates significant gaps between .sframe and .ltext:

1
2
3
4
5
6
7
8
9
.ltext          // Large text section
.lrodata // Large read-only data
.rodata // Regular read-only data
// .eh_frame and .sframe position
.text // Regular text section
.data
.bss
.ldata // Large data
.lbss // Large BSS

Linking and execution views

SFrame employs a unified indexed format across both relocatable files (linking view) and executable files (execution view). While this design consistency might look elegant, it introduces significant complications in the toolchain implementation.

The current Binutils implementation enforces a single element structure within each .sframe section, regardless of whether it's in a relocatable object or final executable. This differs from DWARF sections, which support multiple concatenated elements (each with their own header and body) within a single section.

This design choice stems largely from Linux kernel requirements, where kernel modules are relocatable files created with ld -r. The kernel's SFrame support expects each module to contain a single, indexed format for efficient runtime processing.

GNU ld reflects this constraint by merging all input .sframe sections into a single indexed element in the output, even when producing relocatable files. This deviates from relocatable linking convention of suppressing synthetic section finalization.

I believe the next version should distinguish between linking and execution views:

  • Linking view: Assemblers could produce a simpler format, omitting fields only needed for indexing
  • Linkers should concatenate .sframe input sections by default.
  • A new option like --sframe-index could ask linkers to build the full indexed format when creating executables and shared object files, similar to --gdb-index and --debug-names. To work with Linux kernel, ld -r --sframe-index should work as well.

Fields that could be omitted from the linking view include index-specific metadata like sfh_num_fdes, sfh_num_fres, sfh_fdeoff, sfh_freoff.

Section group compliance issues

The current monolithic .sframe design creates ELF specification violations when dealing with COMDAT section groups. GNU Assembler generates a single .sframe section containing relocations to STB_LOCAL symbols from multiple text sections, including those in different section groups.

This violates the ELF section group rule, which states:

A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.

The problem manifests when inline functions are deduplicated:

1
2
3
4
5
6
7
8
9
cat > a.cc <<'eof'
[[gnu::noinline]] inline int inl() { return 0; }
auto *fa = inl;
eof
cat > b.cc <<'eof'
[[gnu::noinline]] inline int inl() { return 0; }
auto *fb = inl;
eof
~/opt/gcc-15/bin/g++ -Wa,--gsframe -c a.cc b.cc

Linkers correctly reject this violation:

1
2
3
4
5
6
7
8
9
10
% ld.lld a.o b.o
ld.lld: error: relocation refers to a discarded section: .text._Z3inlv
>>> defined in b.o
>>> referenced by b.cc
>>> b.o:(.sframe+0x1c)

% gold a.o b.o
b.o(.sframe+0x1c): error: relocation refers to local symbol ".text._Z3inlv" [2], which is defined in a discarded section
section group signature: "inl()"
prevailing definition is from a.o

(In 2020, I reported a similar issue for GCC -fpatchable-function-entry=.)

The solution requires restructuring the assembler's output strategy. Instead of creating a monolithic .sframe section, the assembler should generate individual SFrame sections corresponding to each text section. When a text section belongs to a COMDAT group, its associated SFrame section must join the same group. For standalone text sections, the SHF_LINK_ORDER flag should establish the proper association.

This approach would create multiple SFrame sections within relocatable files, making the size optimization benefits of a simplified linking view format even more compelling. While this comes with the overhead of additional section headers (where each Elf64_Shdr consumes 64 bytes), it's a cost we should pay to be a good ELF citizen. This reinforces the value of my section header reduction proposal.

Linker relaxation considerations

Since .sframe carries the SHF_ALLOC flag, it affects text section addresses and consequently influences linker relaxation on architectures like RISC-V and LoongArch.

If variable-length encoding is introduced to the format, .sframe would behave as an address-dependent section similar to .relr.dyn. However, this dependency should not pose significant implementation challenges.

Endianness considerations

The SFrame format currently supports endianness variants, which complicates toolchain implementation. While runtime consumers typically target a single endianness, development tools must handle both variants to support cross-compilation workflows.

The endianness discussion in The future of 32-bit support in the kernel reinforces my belief in preferring universal little-endian for new formats. A universal little-endian approach would reduce implementation complexity by eliminating the need for:

  • Endianness-aware function calls like read32le(config, p) where config->endian specifies the object file's byte order
  • Template-based abstractions such as template <class Endian> that must wrap every data access function

Instead, toolchain code could use straightforward calls like read32le(p), streamlining both implementation and maintenance.

This approach remains efficient even on big-endian architectures like IBM z/Architecture and POWER. z/Architecture's LOAD REVERSED instructions, for instance, handle byte swapping with minimal overhead, often requiring no additional instructions beyond normal loads. While slight performance differences may exist compared to native endian operations, the toolchain simplification benefits generally outweigh these concerns.

1
2
3
4
5
6
7
8
9
10
#define WIDTH(x) \
typedef __UINT##x##_TYPE__ [[gnu::aligned(1)]] uint##x; \
uint##x load_inc##x(uint##x *p) { return *p+1; } \
uint##x load_bswap_inc##x(uint##x *p) { return __builtin_bswap##x(*p)+1; }; \
uint##x load_eq##x(uint##x *p) { return *p==3; } \
uint##x load_bswap_eq##x(uint##x *p) { return __builtin_bswap##x(*p)==3; }; \

WIDTH(16);
WIDTH(32);
WIDTH(64);

However, I understand that my opinion is probably not popular within the object file format community and faces resistance from stakeholders with significant big-endian investments.

Summary

SFrame represents a pragmatic approach to stack unwinding that trades flexibility for compactness. I am eager to learn the pros and cons versus frame pointer based unwinding.

While SFrame successfully reduces size compared to .eh_frame and .eh_frame_hdr for profiling use cases, several design decisions create implementation complexity that should be addressed in future versions:

  • Uncertainty remains about SFrame's potential as a full .eh_frame replacement
  • The unified linking/execution view complicates toolchain implementation unnecessarily
  • Section group compliance issues present significant concerns for linker developers
  • Lack of large text section support