Updated in 2024-11.
I wrote an article a few weeks ago to introduce stack unwinding in detail. Today I will introduce C++ exception handling, an application of stack unwinding. Exception handling has a variety of ABI (interoperability of C++ implementations), the most widely used of which is Itanium C++ ABI: Exception Handling
Itanium C++ ABI: Exception Handling
Simplified exception handling process (from throw to catch):
- Call
__cxa_allocate_exceptionto allocate space to store the exception object and the exception header__cxa_exception - Jump to
__cxa_throw, set the__cxa_exceptionfields and then jump to_Unwind_RaiseException - In
_Unwind_RaiseException, execute the search phase, call personality routines to find matching try catch (type matching) - In
_Unwind_RaiseException, execute the cleanup phase: call personality routines to find stack frames containing out-of-scope variables, and for each stack frame, jump to its landing pad to execute the constructors. The landing pad uses_Unwind_Resumeto resume the cleanup phase - The cleanup phase executed by
_Unwind_RaiseExceptionjumps to the landing pad corresponding to the matching try catch - The landing pad calls
__cxa_begin_catch, executes the catch code, and then calls__cxa_end_catch __cxa_end_catchdecreases the handler count of the exception object, and if it reaches zero, it also destroys the exception object
Note: each stack frame may use a different personality routine. It is common that all frames share the same routine, though.
Among these steps, _Unwind_RaiseException is responsible
for stack unwinding and is language independent. The language-related
concepts (catch block, out-of-scope variable) in stack unwinding are
interpreted/encapsulated by the personality. This is a key idea that
makes the ABI applicable to other languages and allows other languages
to be mixed with C++.
Therefore, Itanium C++ ABI: Exception Handling is divided into Level
1 Base ABI and Level 2 C++ ABI. Base ABI describes the
language-independent stack unwinding part and defines the
_Unwind_* API. Common implementations are:
- libgcc:
libgcc_s.so.1andlibgcc_eh.a - Multiple libraries named libunwind (
libunwind.soorlibunwind.a). If you use Clang, you can use--rtlib=compiler-rt --unwindlib=libunwindto choose to link to libunwind, you can use llvm-project/libunwind or nongnu.org/libunwind
The C++ ABI is related to the C++ language and defines the
__cxa_* API (__cxa_allocate_exception,
__cxa_throw, __cxa_begin_catch, etc.). Common
implementations are:
- libsupc++, part of libstdc++
- libc++abi in llvm-project
The C++ standard library implementation in llvm-project, libc++, can leverage libc++abi, libcxxrt or libsupc++, but libc++abi is recommended.
Level 1 Base ABI
Data structures
The main data structure is:
1 | // Level 1 |
1 | int main() { |
exception_class and exception_cleanup are
set by the API that throws exceptions in Level 2. The Level 1 API does
not process exception_class, but passes it to the
personality routine. Personality routines use this value to distinguish
native and foreign exceptions.
libc++abi
__cxa_throwwill setexception_classto uint64_t representing"CLNGC++\0". libsupc++ uses uint64_t which means"GNUCC++\0". The ABI requires that the lower bits contain"C++\0". The exceptions thrown by libstdc++ will be treated as foreign exceptions by libc++abi. Onlycatch (...)can catch foreign exceptions.
Exception propagation implementation mechanism will use another
exception_classidentifier to represent dependent exceptions.
exception_cleanup stores the destroying delete function
of this exception object, which is used by __cxa_end_catch
to destroy a foreign exception.
The private unwinder state (private_1 and
private_2) in an exception object should be neither read by
nor written to by personality routines or other parts of the
language-specific runtime.
The information required for the Unwind operation (for a given IP/SP,
how to obtain the register information such as the IP/SP of the upper
stack frame) is implementation-dependent, and Level 1 ABI does not
define it. In the ELF system, .eh_frame and
.eh_frame_hdr (PT_EH_FRAME program header)
store unwind information. See Stack
unwinding.
Level 1 API
_Unwind_Reason_Code _Unwind_RaiseException(_Unwind_Exception *obj);
Perform stack unwinding for exceptions. It is noreturn under normal
circumstances, and will give control to matched catch handlers (catch
block) or non-catch handlers (code blocks that need to execute
destructors) like longjmp. It is a two-phase process, divided into phase
1 (search phase) and phase 2 (cleanup phase).
- In the search phase, find matched catch handler and record the stack
pointer in
private_2- Trace the call chain based on IP/SP and other saved registers
- For each stack frame, skip if there is no personality routine; call
if there is (actions set to
_UA_SEARCH_PHASE) - If personality returns
_URC_CONTINUE_UNWIND, continue searching - If personality returns
_URC_HANDLER_FOUND, it means that a matched catch handler or unmatched exception specification is found, and the search stops
- In the cleanup phase, jump to non-catch handlers (usually local
variable destructors), and then transfer the control to the matched
catch handler located in the search phase
- Trace the call chain based on IP/SP and other saved registers
- For each stack frame, skip if there is no personality routine; call
if there is one (actions are set to
_UA_CLEANUP_PHASE, and the stack frame marked by search phase will also set_UA_HANDLER_FRAME) - If personality returns
_URC_CONTINUE_UNWIND, it means there is no landing pad, continue to unwind - If personality returns
_URC_INSTALL_CONTEXT, it means there is a landing pad, jump to the landing pad - For intermediate stack frames that are not marked in the search
phase, the landing pad performs cleanup work (usually destructors of
out-of-scope variables), and calls
_Unwind_Resumeto jump back to the cleanup phase - For the stack frame marked by the search phase, the landing pad
calls
__cxa_begin_catch, then executes the code in the catch block, and finally calls__cxa_end_catchto destroy the exception object
The point of the two-phase process is to avoid any actual stack
unwinding if there is no handler. If there are just cleanup frames, an
abort function can be called. Cleanup frames are also less expensive
than matching a handler. However, parsing .gcc_except_table
is probably not much less expensive than additionally matching a
handler:)
1 | static _Unwind_Reason_Code unwind_phase1(unw_context_t *uc, _Unwind_Context *ctx, |
C++ does not support resumptive exception handling (correcting the exceptional condition and resuming execution at the point where it was raised), so the two-phase process is not necessary, but two-phase allows C++ and other languages to coexist on the call stack.
_Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *obj, _Unwind_Stop_Fn stop, void *stop_parameter);
Execute forced unwinding: Skip the search phase and perform a slightly
different cleanup phase. private_2 is used as the parameter
of the stop function. It is similar to a foreign exception but is rarely
used.
void _Unwind_Resume(_Unwind_Exception *obj); Continue
the unwind process of phase 2. It is similar to longjmp, is noreturn,
and is the only Level 1 API that is directly called by the compiler. The
compiler usually calls this function at the end of non-catch
handlers.
void _Unwind_DeleteException(_Unwind_Exception *obj);
Destroy the specified exception object. It is the only Level 1 API that
handles exception_cleanup and is called by
__cxa_end_catch.
Many implementations provide extensions. Notably
_Unwind_Reason_Code _Unwind_Backtrace(_Unwind_Trace_Fn callback, void *ref);
is another special unwind process: it ignores personality and notifies
an external callback of stack frame information.
Level 2 C++ ABI
This part deals with language-related concepts such as throw, catch blocks, and out-of-scope variable destructors in C++.
Data structures
Each thread has a global stack of currently caught exceptions, linked
through the nextException field of the exception header.
caughtExceptions stores the most recent exception on the
stack, and __cxa_exception::nextException points to the
next exception in the stack.
1 | struct __cxa_eh_globals { |
1 | int main() { |
The definition of __cxa_exception is as follows, and the
end of it stores the _Unwind_Exception defined by Base ABI.
__cxa_exception adds C++ semantic information on the basis
of _Unwind_Exception.
1 | // Level 2 |
The information needed to process the exception (for a given IP, whether it is in a try catch, whether there are out-of-scope variable destructors that need to be executed, whether there is a dynamic exception specification) is called language-specific data area (LSDA), which is the implementation detail nor defined by Level 2 ABI.
Landing pad
A landing pad is a section of code related to exceptions in the text section which performs one of the three tasks:
- In the cleanup clause, call destructors of out-of-scope variables or
callbacks registered by
__attribute__((cleanup(...))), and then use_Unwind_Resumeto resume cleanup phase - A catch clause which captures the exception: call the destructors of
out-of-scope variables, then call
__cxa_begin_catch, execute the catch code, and finally call__cxa_end_catch - rethrow: call destructors of out-of-scope variables in the catch
clause, then call
__cxa_end_catch, and then use_Unwind_Resumeto resume cleanup phase
If a try block has multiple catch clauses, there will be multiple
action table entries in series in the language-specific data area, but
the landing pad includes all (conceptually merged) catch clauses. Before
the personality transfers control to the landing pad, it will call
_Unwind_SetGP to set
__buitin_eh_return_data_regno(1) to store switchValue and
inform the landing pad which type matches.
A rethrow is triggered by __cxa_rethrow in the middle of
the execution of the catch code. It needs to destruct the local
variables defined by the catch clause and call
__cxa_end_catch to offset the
__cxa_begin_catch called at the beginning of the catch
clause.
.gcc_except_table
The language-specific data area on the ELF platforms is usually
stored in the .gcc_except_table section. This section is
parsed by __gxx_personality_v0 and
__gcc_personality_v0. Its structure is very simple:
- header (@LPStart, @TType and call sites coding, the starting offset of action records)
- call site table: Describe the landing pad offset (0 if not exists) and action record offset (biased by 1, 0 for no action) that should be executed for each call site (an address range)
- action table
- type table (referenced by postive switch values)
- dynamic exception specification (deprecated in C++, so rarely used) (referenced by negative switch values)
Here is an example:
1 | .section .gcc_except_table,"a",@progbits |
Each call site record has two values besides call site offset and length: landing pad offset and action record offset.
- The landing pad offset is 0. The action record offset should also be 0. No landing pad
- The landing pad offset is not 0. With landing pad
- The action record offset is 0, also called cleanup (the description
of "cleanup" is somewhat ambiguous, because Level 1 has the term clean
phase), usually describing local variable destructors and
__attribute__((cleanup(...))) - The action record offset is not 0. The action record offset points to an action record in the action table. catch or noexcept specifier or exception specification
- The action record offset is 0, also called cleanup (the description
of "cleanup" is somewhat ambiguous, because Level 1 has the term clean
phase), usually describing local variable destructors and
Each action record has two values:
- switch value (SLEB128): a positive index indicates the TypeInfo of the catch type in the type table; a negative number indicates the offset of the exception specification; 0 indicates a cleanup action which is similar to an action record offset of 0 in the call site record
- offset to the next action record: 0 indicates there is no next action record. This singly linked list form can describe multiple catches or an exception specification list
The offset to next action record can be used not only as a singly linked list, but also as a trie, but it is rare such compression can find its usage in the wild.
The values of landing pad offset/action record offset corresponding to different areas in the program:
- A non-try block without local variable destructor:
landing_pad_offset==0 && action_record_offset==0 - A non-try block with local variable destructors:
landing_pad_offset!=0 && action_record_offset==0. phase 2 should stop and call cleanup - A non-try block with
__attribute__((cleanup(...))):landing_pad_offset!=0 && action_record_offset==0. Same as above - A try block:
landing_pad_offset!=0 && action_record_offset!=0. The landing pad points to the code block obtained by catch splicing. Action record describes a catch for a switch value greater than 0 - A try block with
catch (...): Same as above. The action record is a switch value greater than 0 pointing to an entry with a value of 0 in the type table (indicating catch any) - In a function with noexcept specifier, it is possible to propagate
the exception to the caller area:
landing_pad_offset!=0 && action_record_offset!=0. The landing pad points to the code block that callsstd::terminate. The action record is a switch value greater than 0 pointing to an entry with a value of 0 in the type table (indicating catch any) - In a function with an exception specifier, it may propagate the
exception to the caller area:
landing_pad_offset!=0 && action_record_offset!=0. The landing pad points to the code block that calls__cxa_call_unexpected. Action record is a switch value less than 0 describing an exception specifier list
Level 2 API
void *__cxa_allocate_exception(size_t thrown_size);. The
compiler generates a call to this function for throw A();
and allocates a section of memory to store __cxa_exception
and A object. __cxa_exception is immediately to the left of
A object. The following function illustrates the relationship between
the address of the exception object operated by the program and
__cxa_exception: 1
2
3static void *thrown_object_from_cxa_exception(__cxa_exception *exception_header) {
return static_cast<void *>(exception_header + 1);
}
void __cxa_throw(void *thrown, std::type_info *tinfo, void (*destructor)(void *));
Call the above function to find the __cxa_exception header,
and fill in each field
(referenceCount, exception_class, unexpectedHandler, terminateHandler, exceptionType , exceptionDestructor, unwindHeader.exception_cleanup)
and then call _Unwind_RaiseException. This function is
noreturn.
void *__cxa_begin_catch(void *obj); The compiler
generates a call to this function at the beginning of the catch block.
For a native exception,
- Add
handlerCount - Push the global exception stack of the thread to decrease
uncaught_exception - Return the adjusted pointer of the exception object
For a foreign exception (there is not necessarily a
__cxa_exception header),
- Push if the global exception stack of the thread is empty, otherwise
execute
std::terminate(I don’t know if there is a field similar to__cxa_exception::nextException) - Return
static_cast<_Unwind_Exception *>(obj) + 1(assuming_Unwind_Exceptionis next to the thrown object)
Simplified implementation: 1
2
3
4
5
6
7
8
9
10
11
12void __cxa_throw(void *thrown, std::type_info *tinfo, void (*destructor)(void *)) {
__cxa_exception *hdr = (__cxa_exception *)thrown - 1;
hdr->exceptionType = tinfo; hdr->destructor = destructor;
hdr->unexpectedHandler = std::get_unexpected();
hdr->terminateHandler = std::get_terminate();
hdr->unwindHeader.exception_class = ...;
__cxa_get_globals()->uncaughtExceptions++;
_Unwind_RaiseException(&hdr->unwindHeader);
// Failed to unwind, e.g. the .eh_frame FDE is absent.
__cxa_begin_catch(&hdr->unwindHeader);
std::terminate();
}
void __cxa_end_catch(); is called at the end of the
catch block or when rethrow. For native exception:
- Get the current exception from the global exception stack of the
thread, reduce
handlerCount - When
handlerCountreaches 0, pop the global exception stack of the thread - If this is a native exception: call
__cxa_free_exceptionwhenhandlerCountis decreased to 0 (if this is a dependent exception, decreasereferenceCountand call__cxa_free_exceptionwhen it reaches 0)
For a foreign exception,
- Call
_Unwind_DeleteException - Execute
__cxa_eh_globals::uncaughtExceptions = nullptr;(due to the nature of__cxa_begin_catch, there is exactly one exception in the stack)
void __cxa_rethrow(); will mark the exception object, so
that when handlerCount is reduced to 0 by
__cxa_end_catch, it will not be destroyed, because this
object will be reused by the cleanup phase restored by
_Unwind_Resume.
Note that, except for __cxa_begin_catch and
__cxa_end_catch, most __cxa_* functions cannot
handle foreign exceptions (they do not have the
__cxa_exception header).
Examples
For the following code: 1
2
3
4
5
6
struct A { ~A(); };
struct B { ~B(); };
void foo() { throw 0xB612; }
void bar() { B b; foo(); }
void qux() { try { A a; bar(); } catch (int x) { puts(""); } }
The compiled assembly conceptually looks like this:
1 | void foo() { |
Control flow:
quxcallsbar.barcallsfoo.foothrows an exception- foo dynamically allocates a memory block, stores the thrown int and
__cxa_exceptionheader, and then executes__cxa_throw __cxa_throwfills in other fields of__cxa_exceptionand calls_Unwind_RaiseException
Next, _Unwind_RaiseException drives the two-phase
process of Level 1.
_Unwind_RaiseExceptionexecutes phase 1: search phase- For
bar, call personality with_UA_SEARCH_PHASEas the actions parameter and return_URC_CONTINUE_UNWIND(no catch handler) - For
qux, call personality with_UA_SEARCH_PHASEas the actions parameter and return_URC_HANDLER_FOUND(with catch handler) - The stack pointer of the stack frame that marked qux will be marked
(stored in
private_2) and the search will stop
- For
_Unwind_RaiseExceptionexecutes phase 2: cleanup phasebar's stack frame is not marked by search phase, call personality with_UA_CLEANUP_PHASEas actions parameter, return_URC_INSTALL_CONTEXT- Jump to the landing pad of the bar's stack frame
- After cleaning the landing pad, use
_Unwind_Resumeto return to the cleanup phase - The stack frame of qux is marked by search phase, call personality
with
_UA_CLEANUP_PHASE|_UA_HANDLER_FRAMEas the actions parameter, and return_UA_INSTALL_CONTEXT - Jump to the landing pad of the qux stack frame
- The landing pad calls
__cxa_begin_catch, executes the catch code, and then calls__cxa_end_catch
__gxx_personality_v0
A personality routine is called by Level 1 ABI (both phase 1 and phase 2) to provide language-related processing. Different languages, implementations or architectures may use different personality routines. Common personalities are as follows:
__gxx_personality_v0: C++__gxx_personality_sj0: sjlj__gcc_personality_v0: C-fexceptionsfor__attribute__((cleanup(...)))__CxxFrameHandler3: Windows MSVC__gxx_personality_seh0: MinGW-w64-fseh-exceptions__objc_personality_v0: ObjC in the macOS environment
The most common C++ implementation on ELF systems is
__gxx_personality_v0. It is implemented by:
- GCC:
libstdc++-v3/libsupc++/eh_personality.cc - libc++abi:
src/cxa_personality.cpp
_Unwind_Reason_Code (*__personality_routine)(int version, _Unwind_Action action, uint64 exceptionClass, _Unwind_Exception *exceptionObject, _Unwind_Context *context);
In the absence of errors:
- For
_UA_SEARCH_PHASE, returns_URC_CONTINUE_UNWIND: no lsda, or there is no landing pad, there is a non-catch handler or a matched exception specification_URC_HANDLER_FOUND: there is a matched catch handler or an unmatched exception specification
- For
_UA_CLEANUP_PHASE, returns_URC_CONTINUE_UNWIND: no lsda, or there is no landing pad, or (not produced by a compiler) there is no cleanup action_URC_INSTALL_CONTEXT: the other cases
Before transferring control to the landing pad, the personality will
call _Unwind_SetGP to set two registers (architecture
related, __buitin_eh_return_data_regno(0) and
__buitin_eh_return_data_regno(1)) to store
_Unwind_Exception * and switchValue.
Code:
1 | _unwind_Reason_Code __gxx_personality_v0(int version, _Unwind_Action actions, uint64_t exceptionClass, _Unwind_Exception *exc, _Unwind_Context *ctx) { |
For a native exception, when the personality returns
_URC_HANDLER_FOUND in the search phase, the LSDA related
information of the stack frame will be cached. When the personality is
called again in the cleanup phase with the argument
actions == (_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME), the
personality loads the cache and there is no need to parse
.gcc_except_table.
In the remaining three cases, the personality has to parse
.gcc_except_table:
actions & _UA_SEARCH_PHASEactions & _UA_CLEANUP_PHASE && actions & _UA_HANDLER_FRAME && !is_native:catch (...)can catch a foreign exception. An exception specification terminates upon a foreign exception.actions & _UA_CLEANUP_PHASE && !(actions & _UA_HANDLER_FRAME): non-catch handlers and unmatched catch handlers, matched exception specification. Another case is_Unwind_ForcedUnwind.
1 | static void scan_eh_tab(...) { |
__gcc_personality_v0
libgcc and compiler-rt/lib/builtins implement this function to handle
__attribute__((cleanup(...))). The implementation does not
return _URC_HANDLER_FOUND in the search phase, so the
cleanup handler cannot serve as a catch handler. However, we can supply
our own implementation to return _URC_HANDLER_FOUND in the
search phase... On x86-64, __buitin_eh_return_data_regno(0)
is RAX. We can let the cleanup handler pass RAX to the landing pad.
1 | // a.cc |
1 | % clang -c -fexceptions a.cc b.c |
Rethrow
The landing pad section briefly described the code executed by
rethrow. Usually caught exception will be destroyed in
__cxa_end_catch, so __cxa_rethrow will mark
the exception object and increase handlerCount.
C++11 introduced Exception Propagation (N2179;
std::rethrow_exception etc), and libstdc++ uses
__cxa_dependent_exception to achieve. For design see https://gcc.gnu.org/legacy-ml/libstdc++/2008-05/msg00079.html
1 | struct __cxa_dependent_exception { |
std::current_exception and
std::rethrow_exception will increase the reference
count.
In libstdc++, __cxa_rethrow calls GCC extension
_Unwind_Resume_or_Rethrow which can resume forced
unwinding.
LLVM IR
In construction.
- nounwind: cannot unwind
- unwtables: force generation of the unwind table regardless of nounwind
1 | if uwtables |
Compiler behavior
-fno-exceptions -fno-asynchronous-unwind-tables: neither.eh_framenor.gcc_except_tableexists-fno-exceptions -fasynchronous-unwind-tables:.eh_frameexists,.gcc_except_tabledoesn't-fexceptions: both.eh_frameand.gcc_except_tableexist- In GCC, for a
noexceptfunction, a possibly-throwing call site unhandled by a try block does not get an entry in the.gcc_except_tablecall site table. If the function has no try block, it gets a header-only.gcc_except_table(4 bytes) - In Clang, there is a call site entry calling
__clang_call_terminate. The size overhead is larger than GCC's scheme. Improving this requires LLVM IR work
- In GCC, for a
When an exception propagates from a function to its caller (libgcc_s/libunwind & libsupc++/libc++abi):
- no
.eh_frame:_Unwind_RaiseExceptionreturns_URC_END_OF_STACK.__cxa_throwcallsstd::terminate .eh_framewithout.gcc_except_table: pass-through (local variable destructors are not called). This is the case of-fno-exceptions -fasynchronous-unwind-tables..eh_framewith.gcc_except_tablenot covering the throwing call site:__gxx_personality_v0callsstd::terminatesince no call site code range matches.eh_framewith.gcc_except_tablecovering the throwing call site: do possible cleanup and unwind to the parent frame
Combined with the above description, when an exception will propagate to a caller of a noexcept function:
-fno-exceptions -fno-asynchronous-unwind-tables: propagating through a function callsstd::terminate-fno-exceptions -fasynchronous-unwind-tables: pass-through. Local variable destructors are not called. This behavior is unexpected.-fexceptions: propagating through anoexceptfunction callsstd::terminate
When std::terminate is called, there is a diagnostic
looking like
terminate called after throwing an instance of 'int'
(libstdc++; libc++ has a smiliar one). There is no stack trace. If the
process installs a SIGABRT signal handler, the handler may
get a stack trace and symbolize the addresses.
Catching exceptions while unwinding through -fno-exceptions code is a proposal to improve the diagnostics.
Personality and typeinfo encoding
.eh_frame contains information about the unwind
operation. See Stack
unwinding for its format.
In -fpie/-fpic mode, the personality and type info
encodings have the DW_EH_PE_indirect|DW_EH_PE_pcrel bits on
most targets. 1
2
3
4
5
6void raise() { throw 42; }
bool foo() {
try { raise(); } catch (int) { return true; }
return false;
}
int main() { foo(); }1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25_Z3foov:
.cfi_startproc
.cfi_personality 155, DW.ref.__gxx_personality_v0
.cfi_lsda 27, .Lexception0
...
.section .gcc_except_table,"a",@progbits
...
# >> Catch TypeInfos <<
.Ltmp3: # TypeInfo 1
.long .L_ZTIi.DW.stub-.Ltmp3
.Lttbase0:
.data
.p2align 3, 0x0
.L_ZTIi.DW.stub:
.quad _ZTIi
.hidden DW.ref.__gxx_personality_v0
.weak DW.ref.__gxx_personality_v0
.section .data.DW.ref.__gxx_personality_v0,"aGw",@progbits,DW.ref.__gxx_personality_v0,comdat
.p2align 3, 0x0
.type DW.ref.__gxx_personality_v0,@object
.size DW.ref.__gxx_personality_v0, 8
DW.ref.__gxx_personality_v0:
.quad __gxx_personality_v0
In the example, .eh_frame contains a PC-relative
relocations referencing DW.ref.__gxx_personality_v0
.gcc_except_table contains a PC-relative relocation
referencing .L_ZTIi.DW.stub. The relocations are link-time
constants, so .eh_frame can remain readonly.
DW.ref.__gxx_personality_v0 and
.L_ZTIi.DW.stub reside in writable sections which will
contain dynamic relocations if __gxx_personality_v0 and
_ZTIi are defined in a shared object - which is often the
case.
For -fno-pic code, different targets have different
ideas. AArch64 and RISC-V use
DW_EH_PE_indirect|DW_EH_PE_pcrel as well. On x86,
.cfi_personality refers to
__gxx_personality_v0. This will lead to a canonical PLT if
__gxx_personality_v0 is defined in a shared object (e.g.
libstdc++.so.6). I sent a patch https://gcc.gnu.org/PR108622 to use
DW_EH_PE_indirect|DW_EH_PE_pcrel.
R_MIPS_32
and R_MIPS_64 personality encoding
https://github.com/llvm/llvm-project/issues/58377
1 | void foo() { try { throw 1; } catch (...) {} } |
mips64el-linux-gnuabi64-g++ -fpic and
clang++ --target=mips64el-unknown-linux-gnuabi64 -fpic use
DW_EH_PE_absptr | DW_EH_PE_indirect to encode personality
routine pointers. Using DW_EH_PE_absptr instead of
DW_EH_PE_pcrel is wrong. GNU ld works around the compiler
design problem by converting DW_EH_PE_absptr to
DW_EH_PE_pcrel. ld.lld does not support this and will
report an error: 1
2
3
4
5
6% clang++ --target=mips64el-linux-gnuabi -fpic -fuse-ld=lld -shared ex.cc
ld.lld: error: relocation R_MIPS_64 cannot be used against symbol 'DW.ref.__gxx_personality_v0'; recompile with -fPIC
>>> defined in /tmp/ex-40a996.o
>>> referenced by ex.cc
>>> /tmp/ex-40a996.o:(.eh_frame+0x13)
...
R_MIPS_32 for 32-bit builds is similar.
Potentially-throwing
__cxa_end_catch
__cxa_end_catch is potentially-throwing because it may
destroy an exception object with a potentially-throwing destructor (e.g.
~C() noexcept(false) { ... }). 1
2
3
4
5
6
7
8struct A { ~A(); };
void opaque();
void foo() {
A a;
// The exception object has an unknown type and may throw. The landing pad
// then needs to call A::~A for `a` before jumping to _Unwind_Resume.
try { opaque(); } catch (...) { }
}
To support an exception object with a potentially-throwing destructor, Clang generates conservative code for a catch-all clause or a catch clause matching a record type:
- assume that the exception object may have a throwing destructor
- emit
invoke void @__cxa_end_catch(as the call is not marked as thenounwindattribute). - emit a landing pad to destroy local variables and call
_Unwind_Resume
Per C++ [dcl.fct.def.coroutine], a coroutine's function body implies
a catch (...). Clang's code generation pessimizes even
simple code, like: 1
2
3
4
5
6
7UserFacing foo() {
A a;
opaque();
co_return;
// For `invoke void @__cxa_end_catch()`, the landing pad destroys the
// promise_type and deletes the coro frame.
}
Throwing destructors are typically discouraged. In many environments, the destructors of exception objects are guaranteed to never throw, making our conservative code generation approach seem wasteful.
Furthermore, throwing destructors tend not to work well in practice:
- GCC does not emit call site records for the region containing
__cxa_end_catch. This has been a long time, since 2000. - If a catch-all clause catches an exception object that throws, both GCC and Clang using libstdc++ leak the allocated exception object.
To avoid code generation pessimization, I added -fassume-nothrow-exception-dtor
for Clang 18 to assume that __cxa_end_catch calls have the
nounwind attribute. This requires that thrown exception
objects' destructors will never throw.
To detect misuses, diagnose throw expressions with a
potentially-throwing destructor. Technically, it is possible that a
potentially-throwing destructor never throws when called transitively by
__cxa_end_catch, but these cases seem rare enough to
justify a relaxed mode.
Misc
Use libc++ and libc++abi
On Linux, compared with clang, clang++
additionally links against libstdc++/libc++ and libm.
Dynamically link against libc++.so (which depends on libc++abi.so)
(additionally specify -pthread if threads are used):
1 | clang++ -stdlib=libc++ -nostdlib++ a.cc -lc++ -lc++abi |
If compile actions and link actions are separate
(-stdlib=libc++ passes -lc++ but its position
is undesired, so just don't use it):
1 | clang++ -nostdlib++ a.cc -lc++ -lc++abi |
Statically link in libc++.a (which includes the members of
libc++abi.a). This requires a
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=on build:
1 | clang++ -stdlib=libc++ -static-libstdc++ -nostdlib++ a.cc -pthread |
Statically link in libc++.a and libc++abi.a. This is a bit inferior because there is a duplicate -lc++ passed by the driver.
1 | clang++ -stdlib=libc++ -static-libstdc++ -nostdlib++ a.cc -Wl,--push-state,-Bstatic -lc++ -lc++abi -Wl,--pop-state -pthread |
libc++abi and libsupc++
It is worth noting that the
<exception> <stdexcept> type layout provided by
libc++abi (such as logic_error, runtime_error,
etc.) are specifically compatible with libsupc++. After GCC 5 libstdc++
abandoned ref-counted std::string, libsupc++ still uses
__cow_string for logic_error and other
exception classes. libc++abi uses a similar ref-counted string.
libsupc++ and libc++abi do not use inline namespace and have
conflicting symbol names. Therefore, usually a libc++/libc++abi
application cannot use a shared object (ODR violation) of a dynamically
linked libstdc++.so.
If you make some efforts, you can still solve this problem: compile
the non-libsupc++ part of libstdc++ to get self-made
libstdc++.so.6. The executable file link libc++abi provides
the C++ ABI symbols required by libstdc++.so.6.
Monolithic
.gcc_except_table
Prior to Clang 12, a monolithic .gcc_except_table was
used. Like many other metadata sections, the main problem with the
monolithic sections is that they cannot be garbage collected by the
linker. For RISC-V -mrelax and basic block sections, there
is a bigger problem: .gcc_except_table has relocations
pointing to text sections local symbols. If the pointed text sections
are discarded in the COMDAT group, these relocations will be rejected by
the linker
(error: relocation refers to a symbol in a discarded section).
The solution is to use fragmented .gcc_except_table(https://reviews.llvm.org/D83655).
But the actual deployment is not that simple:) ld.lld processes
--gc-sections first (it is not clear which
.eh_frame pieces are live), and then processes (and garbage
collects) .eh_frame.
During --gc-sections, all .eh_frame pieces
are live. They will mark all .gcc_except_table.* live.
According to the GC rules of the section group, a
.gcc_except_table.* will mark other sections (including
.text.*) live in the same section group. The result is that
.text.* in all section groups cannot be GC, resulting in
increased input size.
https://reviews.llvm.org/D91579 fixed this problem: For
.eh_frame, do not mark .gcc_except_table in
section group.
clang -fbasic-block-sections=
This option produces one section for each basic block (more
aggressive than -ffunction-sections) for aggressive machine
basic block optimizations. There are some challenges integrating LSDA
into this framework.
You can either allocate a .gcc_except_table for each
basic block section needing LSDA, or let all basic block sections use
the same .gcc_except_table. The LLVM implementation chose
the latter, which has several advantages:
- No duplicate headers
- Sharable type table
- Sharable action table (this only matters for the deprecated exception specification)
There is only one LPStart when using the same
.gcc_except_table, and it is necessary to ensure that all
offsets from landing pads to LPStart can be represented by relocations.
Because most architectures do not have a difference relocation type
(R_RISCV_SUB*), placing landing pads in the same section is
the choice.
Exception handling ABI for the ARM architecture
The overall structure is the same as Itanium C++ ABI: Exception
Handling, with some differences in data structure,
_Unwind_*, etc.
https://maskray.me/blog/2020-11-08-stack-unwinding contains a few notes.
Compact Exception Tables for MIPS ABIs
In construction.
Use .eh_frame_entry and .gnu_extab to
describe.
Design thoughts:
- Exception code ranges are sorted and must be linearly searched. Therefore it would be more compact to specify each relative to the previous one, rather than relative to a fixed base.
- The landing pad is often close to the exception region that uses it. Therefore it is better to use the end of the exception region as the reference point, than use the function base address.
- The action table can be integrated directly with the exception region definition itself. This removes one indirection. The threading of actions can still occur, by providing an offset to the next exception encoding of interest.
- Often the action threading is to the next exception region, so optimizing that case is important.
- Catch types and exception specification type lists cannot easily be encoded inline with the exception regions themselves. It is necessary to preserve the unique indices that are automatically created by the DWARF scheme.
It uses compact unwind descriptors similar to ARM EH. Builtin PR1 means there is no language-dependent data, Builtin PR2 is used for C/C++
Misc
Khalil Estell's CppCon 2024 talk C++ Exceptions for Smaller Firmware mentions that a custom exception implementation that drops some rare functionality can make the library code size mush smaller, suitable for firmware development.