Updated in 2024-11.
I wrote an article a few weeks ago to introduce stack unwinding in detail. Today I will introduce C++ exception handling, an application of stack unwinding. Exception handling has a variety of ABI (interoperability of C++ implementations), the most widely used of which is Itanium C++ ABI: Exception Handling
Itanium C++ ABI: Exception Handling
Simplified exception handling process (from throw to catch):
- Call
__cxa_allocate_exception
to allocate space to store the exception object and the exception header__cxa_exception
- Jump to
__cxa_throw
, set the__cxa_exception
fields and then jump to_Unwind_RaiseException
- In
_Unwind_RaiseException
, execute the search phase, call personality routines to find matching try catch (type matching) - In
_Unwind_RaiseException
, execute the cleanup phase: call personality routines to find stack frames containing out-of-scope variables, and for each stack frame, jump to its landing pad to execute the constructors. The landing pad uses_Unwind_Resume
to resume the cleanup phase - The cleanup phase executed by
_Unwind_RaiseException
jumps to the landing pad corresponding to the matching try catch - The landing pad calls
__cxa_begin_catch
, executes the catch code, and then calls__cxa_end_catch
__cxa_end_catch
decreases the handler count of the exception object, and if it reaches zero, it also destroys the exception object
Note: each stack frame may use a different personality routine. It is common that all frames share the same routine, though.
Among these steps, _Unwind_RaiseException
is responsible
for stack unwinding and is language independent. The language-related
concepts (catch block, out-of-scope variable) in stack unwinding are
interpreted/encapsulated by the personality. This is a key idea that
makes the ABI applicable to other languages and allows other languages
to be mixed with C++.
Therefore, Itanium C++ ABI: Exception Handling is divided into Level
1 Base ABI and Level 2 C++ ABI. Base ABI describes the
language-independent stack unwinding part and defines the
_Unwind_*
API. Common implementations are:
- libgcc:
libgcc_s.so.1
andlibgcc_eh.a
- Multiple libraries named libunwind (
libunwind.so
orlibunwind.a
). If you use Clang, you can use--rtlib=compiler-rt --unwindlib=libunwind
to choose to link to libunwind, you can use llvm-project/libunwind or nongnu.org/libunwind
The C++ ABI is related to the C++ language and defines the
__cxa_*
API (__cxa_allocate_exception
,
__cxa_throw
, __cxa_begin_catch
, etc.). Common
implementations are:
- libsupc++, part of libstdc++
- libc++abi in llvm-project
The C++ standard library implementation in llvm-project, libc++, can leverage libc++abi, libcxxrt or libsupc++, but libc++abi is recommended.
Level 1 Base ABI
Data structures
The main data structure is:
1 | // Level 1 |
1 | int main() { |
exception_class
and exception_cleanup
are
set by the API that throws exceptions in Level 2. The Level 1 API does
not process exception_class
, but passes it to the
personality routine. Personality routines use this value to distinguish
native and foreign exceptions.
libc++abi
__cxa_throw
will setexception_class
to uint64_t representing"CLNGC++\0"
. libsupc++ uses uint64_t which means"GNUCC++\0"
. The ABI requires that the lower bits contain"C++\0"
. The exceptions thrown by libstdc++ will be treated as foreign exceptions by libc++abi. Onlycatch (...)
can catch foreign exceptions.
Exception propagation implementation mechanism will use another
exception_class
identifier to represent dependent exceptions.
exception_cleanup
stores the destroying delete function
of this exception object, which is used by __cxa_end_catch
to destroy a foreign exception.
The private unwinder state (private_1
and
private_2
) in an exception object should be neither read by
nor written to by personality routines or other parts of the
language-specific runtime.
The information required for the Unwind operation (for a given IP/SP,
how to obtain the register information such as the IP/SP of the upper
stack frame) is implementation-dependent, and Level 1 ABI does not
define it. In the ELF system, .eh_frame
and
.eh_frame_hdr
(PT_EH_FRAME
program header)
store unwind information. See Stack
unwinding.
Level 1 API
_Unwind_Reason_Code _Unwind_RaiseException(_Unwind_Exception *obj);
Perform stack unwinding for exceptions. It is noreturn under normal
circumstances, and will give control to matched catch handlers (catch
block) or non-catch handlers (code blocks that need to execute
destructors) like longjmp. It is a two-phase process, divided into phase
1 (search phase) and phase 2 (cleanup phase).
- In the search phase, find matched catch handler and record the stack
pointer in
private_2
- Trace the call chain based on IP/SP and other saved registers
- For each stack frame, skip if there is no personality routine; call
if there is (actions set to
_UA_SEARCH_PHASE
) - If personality returns
_URC_CONTINUE_UNWIND
, continue searching - If personality returns
_URC_HANDLER_FOUND
, it means that a matched catch handler or unmatched exception specification is found, and the search stops
- In the cleanup phase, jump to non-catch handlers (usually local
variable destructors), and then transfer the control to the matched
catch handler located in the search phase
- Trace the call chain based on IP/SP and other saved registers
- For each stack frame, skip if there is no personality routine; call
if there is one (actions are set to
_UA_CLEANUP_PHASE
, and the stack frame marked by search phase will also set_UA_HANDLER_FRAME
) - If personality returns
_URC_CONTINUE_UNWIND
, it means there is no landing pad, continue to unwind - If personality returns
_URC_INSTALL_CONTEXT
, it means there is a landing pad, jump to the landing pad - For intermediate stack frames that are not marked in the search
phase, the landing pad performs cleanup work (usually destructors of
out-of-scope variables), and calls
_Unwind_Resume
to jump back to the cleanup phase - For the stack frame marked by the search phase, the landing pad
calls
__cxa_begin_catch
, then executes the code in the catch block, and finally calls__cxa_end_catch
to destroy the exception object
The point of the two-phase process is to avoid any actual stack
unwinding if there is no handler. If there are just cleanup frames, an
abort function can be called. Cleanup frames are also less expensive
than matching a handler. However, parsing .gcc_except_table
is probably not much less expensive than additionally matching a
handler:)
1 | static _Unwind_Reason_Code unwind_phase1(unw_context_t *uc, _Unwind_Context *ctx, |
C++ does not support resumptive exception handling (correcting the exceptional condition and resuming execution at the point where it was raised), so the two-phase process is not necessary, but two-phase allows C++ and other languages to coexist on the call stack.
_Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *obj, _Unwind_Stop_Fn stop, void *stop_parameter);
Execute forced unwinding: Skip the search phase and perform a slightly
different cleanup phase. private_2
is used as the parameter
of the stop function. It is similar to a foreign exception but is rarely
used.
void _Unwind_Resume(_Unwind_Exception *obj);
Continue
the unwind process of phase 2. It is similar to longjmp, is noreturn,
and is the only Level 1 API that is directly called by the compiler. The
compiler usually calls this function at the end of non-catch
handlers.
void _Unwind_DeleteException(_Unwind_Exception *obj);
Destroy the specified exception object. It is the only Level 1 API that
handles exception_cleanup
and is called by
__cxa_end_catch
.
Many implementations provide extensions. Notably
_Unwind_Reason_Code _Unwind_Backtrace(_Unwind_Trace_Fn callback, void *ref);
is another special unwind process: it ignores personality and notifies
an external callback of stack frame information.
Level 2 C++ ABI
This part deals with language-related concepts such as throw, catch blocks, and out-of-scope variable destructors in C++.
Data structures
Each thread has a global stack of currently caught exceptions, linked
through the nextException
field of the exception header.
caughtExceptions
stores the most recent exception on the
stack, and __cxa_exception::nextException
points to the
next exception in the stack.
1 | struct __cxa_eh_globals { |
1 | int main() { |
The definition of __cxa_exception
is as follows, and the
end of it stores the _Unwind_Exception
defined by Base ABI.
__cxa_exception
adds C++ semantic information on the basis
of _Unwind_Exception
.
1 | // Level 2 |
The information needed to process the exception (for a given IP, whether it is in a try catch, whether there are out-of-scope variable destructors that need to be executed, whether there is a dynamic exception specification) is called language-specific data area (LSDA), which is the implementation detail nor defined by Level 2 ABI.
Landing pad
A landing pad is a section of code related to exceptions in the text section which performs one of the three tasks:
- In the cleanup clause, call destructors of out-of-scope variables or
callbacks registered by
__attribute__((cleanup(...)))
, and then use_Unwind_Resume
to resume cleanup phase - A catch clause which captures the exception: call the destructors of
out-of-scope variables, then call
__cxa_begin_catch
, execute the catch code, and finally call__cxa_end_catch
- rethrow: call destructors of out-of-scope variables in the catch
clause, then call
__cxa_end_catch
, and then use_Unwind_Resume
to resume cleanup phase
If a try block has multiple catch clauses, there will be multiple
action table entries in series in the language-specific data area, but
the landing pad includes all (conceptually merged) catch clauses. Before
the personality transfers control to the landing pad, it will call
_Unwind_SetGP
to set
__buitin_eh_return_data_regno(1)
to store switchValue and
inform the landing pad which type matches.
A rethrow is triggered by __cxa_rethrow
in the middle of
the execution of the catch code. It needs to destruct the local
variables defined by the catch clause and call
__cxa_end_catch
to offset the
__cxa_begin_catch
called at the beginning of the catch
clause.
.gcc_except_table
The language-specific data area on the ELF platforms is usually
stored in the .gcc_except_table
section. This section is
parsed by __gxx_personality_v0
and
__gcc_personality_v0
. Its structure is very simple:
- header (@LPStart, @TType and call sites coding, the starting offset of action records)
- call site table: Describe the landing pad offset (0 if not exists) and action record offset (biased by 1, 0 for no action) that should be executed for each call site (an address range)
- action table
- type table (referennced by postive switch values)
- dynamic exception specification (deprecated in C++, so rarely used) (referenced by negative switch values)
Here is an example:
1 | .section .gcc_except_table,"a",@progbits |
Each call site record has two values besides call site offset and length: landing pad offset and action record offset.
- The landing pad offset is 0. The action record offset should also be 0. No landing pad
- The landing pad offset is not 0. With landing pad
- The action record offset is 0, also called cleanup (the description
of "cleanup" is somewhat ambiguous, because Level 1 has the term clean
phase), usually describing local variable destructors and
__attribute__((cleanup(...)))
- The action record offset is not 0. The action record offset points to an action record in the action table. catch or noexcept specifier or exception specification
- The action record offset is 0, also called cleanup (the description
of "cleanup" is somewhat ambiguous, because Level 1 has the term clean
phase), usually describing local variable destructors and
Each action record has two values:
- switch value (SLEB128): a positive index indicates the TypeInfo of the catch type in the type table; a negative number indicates the offset of the exception specification; 0 indicates a cleanup action which is similar to an action record offset of 0 in the call site record
- offset to the next action record: 0 indicates there is no next action record. This singly linked list form can describe multiple catches or an exception specification list
The offset to next action record can be used not only as a singly linked list, but also as a trie, but it is rare such compression can find its usage in the wild.
The values of landing pad offset/action record offset corresponding to different areas in the program:
- A non-try block without local variable destructor:
landing_pad_offset==0 && action_record_offset==0
- A non-try block with local variable destructors:
landing_pad_offset!=0 && action_record_offset==0
. phase 2 should stop and call cleanup - A non-try block with
__attribute__((cleanup(...)))
:landing_pad_offset!=0 && action_record_offset==0
. Same as above - A try block:
landing_pad_offset!=0 && action_record_offset!=0
. The landing pad points to the code block obtained by catch splicing. Action record describes a catch for a switch value greater than 0 - A try block with
catch (...)
: Same as above. The action record is a switch value greater than 0 pointing to an entry with a value of 0 in the type table (indicating catch any) - In a function with noexcept specifier, it is possible to propagate
the exception to the caller area:
landing_pad_offset!=0 && action_record_offset!=0
. The landing pad points to the code block that callsstd::terminate
. The action record is a switch value greater than 0 pointing to an entry with a value of 0 in the type table (indicating catch any) - In a function with an exception specifier, it may propagate the
exception to the caller area:
landing_pad_offset!=0 && action_record_offset!=0
. The landing pad points to the code block that calls__cxa_call_unexpected
. Action record is a switch value less than 0 describing an exception specifier list
Level 2 API
void *__cxa_allocate_exception(size_t thrown_size);
. The
compiler generates a call to this function for throw A();
and allocates a section of memory to store __cxa_exception
and A object. __cxa_exception
is immediately to the left of
A object. The following function illustrates the relationship between
the address of the exception object operated by the program and
__cxa_exception
: 1
2
3static void *thrown_object_from_cxa_exception(__cxa_exception *exception_header) {
return static_cast<void *>(exception_header + 1);
}
void __cxa_throw(void *thrown, std::type_info *tinfo, void (*destructor)(void *));
Call the above function to find the __cxa_exception
header,
and fill in each field
(referenceCount, exception_class, unexpectedHandler, terminateHandler, exceptionType , exceptionDestructor, unwindHeader.exception_cleanup
)
and then call _Unwind_RaiseException
. This function is
noreturn.
void *__cxa_begin_catch(void *obj);
The compiler
generates a call to this function at the beginning of the catch block.
For a native exception,
- Add
handlerCount
- Push the global exception stack of the thread to decrease
uncaught_exception
- Return the adjusted pointer of the exception object
For a foreign exception (there is not necessarily a
__cxa_exception
header),
- Push if the global exception stack of the thread is empty, otherwise
execute
std::terminate
(I don’t know if there is a field similar to__cxa_exception::nextException
) - Return
static_cast<_Unwind_Exception *>(obj) + 1
(assuming_Unwind_Exception
is next to the thrown object)
Simplified implementation: 1
2
3
4
5
6
7
8
9
10
11
12void __cxa_throw(void *thrown, std::type_info *tinfo, void (*destructor)(void *)) {
__cxa_exception *hdr = (__cxa_exception *)thrown - 1;
hdr->exceptionType = tinfo; hdr->destructor = destructor;
hdr->unexpectedHandler = std::get_unexpected();
hdr->terminateHandler = std::get_terminate();
hdr->unwindHeader.exception_class = ...;
__cxa_get_globals()->uncaughtExceptions++;
_Unwind_RaiseException(&hdr->unwindHeader);
// Failed to unwind, e.g. the .eh_frame FDE is absent.
__cxa_begin_catch(&hdr->unwindHeader);
std::terminate();
}
void __cxa_end_catch();
is called at the end of the
catch block or when rethrow. For native exception:
- Get the current exception from the global exception stack of the
thread, reduce
handlerCount
- When
handlerCount
reaches 0, pop the global exception stack of the thread - If this is a native exception: call
__cxa_free_exception
whenhandlerCount
is decreased to 0 (if this is a dependent exception, decreasereferenceCount
and call__cxa_free_exception
when it reaches 0)
For a foreign exception,
- Call
_Unwind_DeleteException
- Execute
__cxa_eh_globals::uncaughtExceptions = nullptr;
(due to the nature of__cxa_begin_catch
, there is exactly one exception in the stack)
void __cxa_rethrow();
will mark the exception object, so
that when handlerCount
is reduced to 0 by
__cxa_end_catch
, it will not be destroyed, because this
object will be reused by the cleanup phase restored by
_Unwind_Resume
.
Note that, except for __cxa_begin_catch
and
__cxa_end_catch
, most __cxa_*
functions cannot
handle foreign exceptions (they do not have the
__cxa_exception
header).
Examples
For the following code: 1
2
3
4
5
6
struct A { ~A(); };
struct B { ~B(); };
void foo() { throw 0xB612; }
void bar() { B b; foo(); }
void qux() { try { A a; bar(); } catch (int x) { puts(""); } }
The compiled assembly conceptually looks like this:
1 | void foo() { |
Control flow:
qux
callsbar
.bar
callsfoo
.foo
throws an exception- foo dynamically allocates a memory block, stores the thrown int and
__cxa_exception
header, and then executes__cxa_throw
__cxa_throw
fills in other fields of__cxa_exception
and calls_Unwind_RaiseException
Next, _Unwind_RaiseException
drives the two-phase
process of Level 1.
_Unwind_RaiseException
executes phase 1: search phase- For
bar
, call personality with_UA_SEARCH_PHASE
as the actions parameter and return_URC_CONTINUE_UNWIND
(no catch handler) - For
qux
, call personality with_UA_SEARCH_PHASE
as the actions parameter and return_URC_HANDLER_FOUND
(with catch handler) - The stack pointer of the stack frame that marked qux will be marked
(stored in
private_2
) and the search will stop
- For
_Unwind_RaiseException
executes phase 2: cleanup phasebar
's stack frame is not marked by search phase, call personality with_UA_CLEANUP_PHASE
as actions parameter, return_URC_INSTALL_CONTEXT
- Jump to the landing pad of the bar's stack frame
- After cleaning the landing pad, use
_Unwind_Resume
to return to the cleanup phase - The stack frame of qux is marked by search phase, call personality
with
_UA_CLEANUP_PHASE|_UA_HANDLER_FRAME
as the actions parameter, and return_UA_INSTALL_CONTEXT
- Jump to the landing pad of the qux stack frame
- The landing pad calls
__cxa_begin_catch
, executes the catch code, and then calls__cxa_end_catch
__gxx_personality_v0
A personality routine is called by Level 1 ABI (both phase 1 and phase 2) to provide language-related processing. Different languages, implementations or architectures may use different personality routines. Common personalities are as follows:
__gxx_personality_v0
: C++__gxx_personality_sj0
: sjlj__gcc_personality_v0
: C-fexceptions
for__attribute__((cleanup(...)))
__CxxFrameHandler3
: Windows MSVC__gxx_personality_seh0
: MinGW-w64-fseh-exceptions
__objc_personality_v0
: ObjC in the macOS environment
The most common C++ implementation on ELF systems is
__gxx_personality_v0
. It is implemented by:
- GCC:
libstdc++-v3/libsupc++/eh_personality.cc
- libc++abi:
src/cxa_personality.cpp
_Unwind_Reason_Code (*__personality_routine)(int version, _Unwind_Action action, uint64 exceptionClass, _Unwind_Exception *exceptionObject, _Unwind_Context *context);
In the absence of errors:
- For
_UA_SEARCH_PHASE
, returns_URC_CONTINUE_UNWIND
: no lsda, or there is no landing pad, there is a non-catch handler or a matched exception specification_URC_HANDLER_FOUND
: there is a matched catch handler or an unmatched exception specification
- For
_UA_CLEANUP_PHASE
, returns_URC_CONTINUE_UNWIND
: no lsda, or there is no landing pad, or (not produced by a compiler) there is no cleanup action_URC_INSTALL_CONTEXT
: the other cases
Before transferring control to the landing pad, the personality will
call _Unwind_SetGP
to set two registers (architecture
related, __buitin_eh_return_data_regno(0)
and
__buitin_eh_return_data_regno(1)
) to store
_Unwind_Exception *
and switchValue
.
Code:
1 | _unwind_Reason_Code __gxx_personality_v0(int version, _Unwind_Action actions, uint64_t exceptionClass, _Unwind_Exception *exc, _Unwind_Context *ctx) { |
For a native exception, when the personality returns
_URC_HANDLER_FOUND
in the search phase, the LSDA related
information of the stack frame will be cached. When the personality is
called again in the cleanup phase with the argument
actions == (_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME)
, the
personality loads the cache and there is no need to parse
.gcc_except_table
.
In the remaining three cases, the personality has to parse
.gcc_except_table
:
actions & _UA_SEARCH_PHASE
actions & _UA_CLEANUP_PHASE && actions & _UA_HANDLER_FRAME && !is_native
:catch (...)
can catch a foreign exception. An exception specification terminates upon a foreign exception.actions & _UA_CLEANUP_PHASE && !(actions & _UA_HANDLER_FRAME)
: non-catch handlers and unmatched catch handlers, matched exception specification. Another case is_Unwind_ForcedUnwind
.
1 | static void scan_eh_tab(...) { |
__gcc_personality_v0
libgcc and compiler-rt/lib/builtins implement this function to handle
__attribute__((cleanup(...)))
. The implementation does not
return _URC_HANDLER_FOUND
in the search phase, so the
cleanup handler cannot serve as a catch handler. However, we can supply
our own implementation to return _URC_HANDLER_FOUND
in the
search phase... On x86-64, __buitin_eh_return_data_regno(0)
is RAX. We can let the cleanup handler pass RAX to the landing pad.
1 | // a.cc |
1 | % clang -c -fexceptions a.cc b.c |
Rethrow
The landing pad section briefly described the code executed by
rethrow. Usually caught exception will be destroyed in
__cxa_end_catch
, so __cxa_rethrow
will mark
the exception object and increase handlerCount
.
C++11 introduced Exception Propagation (N2179;
std::rethrow_exception
etc), and libstdc++ uses
__cxa_dependent_exception
to achieve. For design see https://gcc.gnu.org/legacy-ml/libstdc++/2008-05/msg00079.html
1 | struct __cxa_dependent_exception { |
std::current_exception
and
std::rethrow_exception
will increase the reference
count.
In libstdc++, __cxa_rethrow
calls GCC extension
_Unwind_Resume_or_Rethrow
which can resume forced
unwinding.
LLVM IR
In construction.
- nounwind: cannot unwind
- unwtables: force generation of the unwind table regardless of nounwind
1 | if uwtables |
Compiler behavior
-fno-exceptions -fno-asynchronous-unwind-tables
: neither.eh_frame
nor.gcc_except_table
exists-fno-exceptions -fasynchronous-unwind-tables
:.eh_frame
exists,.gcc_except_table
doesn't-fexceptions
: both.eh_frame
and.gcc_except_table
exist- In GCC, for a
noexcept
function, a possibly-throwing call site unhandled by a try block does not get an entry in the.gcc_except_table
call site table. If the function has no try block, it gets a header-only.gcc_except_table
(4 bytes) - In Clang, there is a call site entry calling
__clang_call_terminate
. The size overhead is larger than GCC's scheme. Improving this requires LLVM IR work
- In GCC, for a
When an exception propagates from a function to its caller (libgcc_s/libunwind & libsupc++/libc++abi):
- no
.eh_frame
:_Unwind_RaiseException
returns_URC_END_OF_STACK
.__cxa_throw
callsstd::terminate
.eh_frame
without.gcc_except_table
: pass-through (local variable destructors are not called). This is the case of-fno-exceptions -fasynchronous-unwind-tables
..eh_frame
with.gcc_except_table
not covering the throwing call site:__gxx_personality_v0
callsstd::terminate
since no call site code range matches.eh_frame
with.gcc_except_table
covering the throwing call site: do possible cleanup and unwind to the parent frame
Combined with the above description, when an exception will propagate to a caller of a noexcept function:
-fno-exceptions -fno-asynchronous-unwind-tables
: propagating through a function callsstd::terminate
-fno-exceptions -fasynchronous-unwind-tables
: pass-through. Local variable destructors are not called. This behavior is unexpected.-fexceptions
: propagating through anoexcept
function callsstd::terminate
When std::terminate
is called, there is a diagnostic
looking like
terminate called after throwing an instance of 'int'
(libstdc++; libc++ has a smiliar one). There is no stack trace. If the
process installs a SIGABRT
signal handler, the handler may
get a stack trace and symbolize the addresses.
Catching exceptions while unwinding through -fno-exceptions code is a proposal to improve the diagnostics.
Personality and typeinfo encoding
.eh_frame
contains information about the unwind
operation. See Stack
unwinding for its format.
In -fpie/-fpic
mode, the personality and type info
encodings have the DW_EH_PE_indirect|DW_EH_PE_pcrel
bits on
most targets. 1
2
3
4
5
6void raise() { throw 42; }
bool foo() {
try { raise(); } catch (int) { return true; }
return false;
}
int main() { foo(); }1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25_Z3foov:
.cfi_startproc
.cfi_personality 155, DW.ref.__gxx_personality_v0
.cfi_lsda 27, .Lexception0
...
.section .gcc_except_table,"a",@progbits
...
# >> Catch TypeInfos <<
.Ltmp3: # TypeInfo 1
.long .L_ZTIi.DW.stub-.Ltmp3
.Lttbase0:
.data
.p2align 3, 0x0
.L_ZTIi.DW.stub:
.quad _ZTIi
.hidden DW.ref.__gxx_personality_v0
.weak DW.ref.__gxx_personality_v0
.section .data.DW.ref.__gxx_personality_v0,"aGw",@progbits,DW.ref.__gxx_personality_v0,comdat
.p2align 3, 0x0
.type DW.ref.__gxx_personality_v0,@object
.size DW.ref.__gxx_personality_v0, 8
DW.ref.__gxx_personality_v0:
.quad __gxx_personality_v0
In the example, .eh_frame
contains a PC-relative
relocations referencing DW.ref.__gxx_personality_v0
.gcc_except_table
contains a PC-relative relocation
referencing .L_ZTIi.DW.stub
. The relocations are link-time
constants, so .eh_frame
can remain readonly.
DW.ref.__gxx_personality_v0
and
.L_ZTIi.DW.stub
reside in writable sections which will
contain dynamic relocations if __gxx_personality_v0
and
_ZTIi
are defined in a shared object - which is often the
case.
For -fno-pic
code, different targets have different
ideas. AArch64 and RISC-V use
DW_EH_PE_indirect|DW_EH_PE_pcrel
as well. On x86,
.cfi_personality
refers to
__gxx_personality_v0
. This will lead to a canonical PLT if
__gxx_personality_v0
is defined in a shared object (e.g.
libstdc++.so.6
). I sent a patch https://gcc.gnu.org/PR108622 to use
DW_EH_PE_indirect|DW_EH_PE_pcrel
.
R_MIPS_32
and R_MIPS_64
personality encoding
https://github.com/llvm/llvm-project/issues/58377
1 | void foo() { try { throw 1; } catch (...) {} } |
mips64el-linux-gnuabi64-g++ -fpic
and
clang++ --target=mips64el-unknown-linux-gnuabi64 -fpic
use
DW_EH_PE_absptr | DW_EH_PE_indirect
to encode personality
routine pointers. Using DW_EH_PE_absptr
instead of
DW_EH_PE_pcrel
is wrong. GNU ld works around the compiler
design problem by converting DW_EH_PE_absptr
to
DW_EH_PE_pcrel
. ld.lld does not support this and will
report an error: 1
2
3
4
5
6% clang++ --target=mips64el-linux-gnuabi -fpic -fuse-ld=lld -shared ex.cc
ld.lld: error: relocation R_MIPS_64 cannot be used against symbol 'DW.ref.__gxx_personality_v0'; recompile with -fPIC
>>> defined in /tmp/ex-40a996.o
>>> referenced by ex.cc
>>> /tmp/ex-40a996.o:(.eh_frame+0x13)
...
R_MIPS_32
for 32-bit builds is similar.
Potentially-throwing
__cxa_end_catch
__cxa_end_catch
is potentially-throwing because it may
destroy an exception object with a potentially-throwing destructor (e.g.
~C() noexcept(false) { ... }
). 1
2
3
4
5
6
7
8struct A { ~A(); };
void opaque();
void foo() {
A a;
// The exception object has an unknown type and may throw. The landing pad
// then needs to call A::~A for `a` before jumping to _Unwind_Resume.
try { opaque(); } catch (...) { }
}
To support an exception object with a potentially-throwing destructor, Clang generates conservative code for a catch-all clause or a catch clause matching a record type:
- assume that the exception object may have a throwing destructor
- emit
invoke void @__cxa_end_catch
(as the call is not marked as thenounwind
attribute). - emit a landing pad to destroy local variables and call
_Unwind_Resume
Per C++ [dcl.fct.def.coroutine], a coroutine's function body implies
a catch (...)
. Clang's code generation pessimizes even
simple code, like: 1
2
3
4
5
6
7UserFacing foo() {
A a;
opaque();
co_return;
// For `invoke void @__cxa_end_catch()`, the landing pad destroys the
// promise_type and deletes the coro frame.
}
Throwing destructors are typically discouraged. In many environments, the destructors of exception objects are guaranteed to never throw, making our conservative code generation approach seem wasteful.
Furthermore, throwing destructors tend not to work well in practice:
- GCC does not emit call site records for the region containing
__cxa_end_catch
. This has been a long time, since 2000. - If a catch-all clause catches an exception object that throws, both GCC and Clang using libstdc++ leak the allocated exception object.
To avoid code generation pessimization, I added -fassume-nothrow-exception-dtor
for Clang 18 to assume that __cxa_end_catch
calls have the
nounwind
attribute. This requires that thrown exception
objects' destructors will never throw.
To detect misuses, diagnose throw expressions with a
potentially-throwing destructor. Technically, it is possible that a
potentially-throwing destructor never throws when called transitively by
__cxa_end_catch
, but these cases seem rare enough to
justify a relaxed mode.
Misc
Use libc++ and libc++abi
On Linux, compared with clang
, clang++
additionally links against libstdc++/libc++ and libm.
Dynamically link against libc++.so (which depends on libc++abi.so)
(additionally specify -pthread
if threads are used):
1 | clang++ -stdlib=libc++ -nostdlib++ a.cc -lc++ -lc++abi |
If compile actions and link actions are separate
(-stdlib=libc++
passes -lc++
but its position
is undesired, so just don't use it):
1 | clang++ -nostdlib++ a.cc -lc++ -lc++abi |
Statically link in libc++.a (which includes the members of
libc++abi.a). This requires a
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=on
build:
1 | clang++ -stdlib=libc++ -static-libstdc++ -nostdlib++ a.cc -pthread |
Statically link in libc++.a and libc++abi.a. This is a bit inferior because there is a duplicate -lc++ passed by the driver.
1 | clang++ -stdlib=libc++ -static-libstdc++ -nostdlib++ a.cc -Wl,--push-state,-Bstatic -lc++ -lc++abi -Wl,--pop-state -pthread |
libc++abi and libsupc++
It is worth noting that the
<exception> <stdexcept>
type layout provided by
libc++abi (such as logic_error
, runtime_error
,
etc.) are specifically compatible with libsupc++. After GCC 5 libstdc++
abandoned ref-counted std::string
, libsupc++ still uses
__cow_string
for logic_error
and other
exception classes. libc++abi uses a similar ref-counted string.
libsupc++ and libc++abi do not use inline namespace and have
conflicting symbol names. Therefore, usually a libc++/libc++abi
application cannot use a shared object (ODR violation) of a dynamically
linked libstdc++.so
.
If you make some efforts, you can still solve this problem: compile
the non-libsupc++ part of libstdc++ to get self-made
libstdc++.so.6
. The executable file link libc++abi provides
the C++ ABI symbols required by libstdc++.so.6
.
Monolithic
.gcc_except_table
Prior to Clang 12, a monolithic .gcc_except_table
was
used. Like many other metadata sections, the main problem with the
monolithic sections is that they cannot be garbage collected by the
linker. For RISC-V -mrelax
and basic block sections, there
is a bigger problem: .gcc_except_table
has relocations
pointing to text sections local symbols. If the pointed text sections
are discarded in the COMDAT group, these relocations will be rejected by
the linker
(error: relocation refers to a symbol in a discarded section
).
The solution is to use fragmented .gcc_except_table
(https://reviews.llvm.org/D83655).
But the actual deployment is not that simple:) ld.lld processes
--gc-sections
first (it is not clear which
.eh_frame
pieces are live), and then processes (and garbage
collects) .eh_frame
.
During --gc-sections
, all .eh_frame
pieces
are live. They will mark all .gcc_except_table.*
live.
According to the GC rules of the section group, a
.gcc_except_table.*
will mark other sections (including
.text.*
) live in the same section group. The result is that
.text.*
in all section groups cannot be GC, resulting in
increased input size.
https://reviews.llvm.org/D91579 fixed this problem: For
.eh_frame
, do not mark .gcc_except_table
in
section group.
clang -fbasic-block-sections=
This option produces one section for each basic block (more
aggressive than -ffunction-sections
) for aggressive machine
basic block optimizations. There are some challenges integrating LSDA
into this framework.
You can either allocate a .gcc_except_table
for each
basic block section needing LSDA, or let all basic block sections use
the same .gcc_except_table
. The LLVM implementation chose
the latter, which has several advantages:
- No duplicate headers
- Sharable type table
- Sharable action table (this only matters for the deprecated exception specification)
There is only one LPStart when using the same
.gcc_except_table
, and it is necessary to ensure that all
offsets from landing pads to LPStart can be represented by relocations.
Because most architectures do not have a difference relocation type
(R_RISCV_SUB*
), placing landing pads in the same section is
the choice.
Exception handling ABI for the ARM architecture
The overall structure is the same as Itanium C++ ABI: Exception
Handling, with some differences in data structure,
_Unwind_*
, etc.
https://maskray.me/blog/2020-11-08-stack-unwinding contains a few notes.
Compact Exception Tables for MIPS ABIs
In construction.
Use .eh_frame_entry
and .gnu_extab
to
describe.
Design thoughts:
- Exception code ranges are sorted and must be linearly searched. Therefore it would be more compact to specify each relative to the previous one, rather than relative to a fixed base.
- The landing pad is often close to the exception region that uses it. Therefore it is better to use the end of the exception region as the reference point, than use the function base address.
- The action table can be integrated directly with the exception region definition itself. This removes one indirection. The threading of actions can still occur, by providing an offset to the next exception encoding of interest.
- Often the action threading is to the next exception region, so optimizing that case is important.
- Catch types and exception specification type lists cannot easily be encoded inline with the exception regions themselves. It is necessary to preserve the unique indices that are automatically created by the DWARF scheme.
It uses compact unwind descriptors similar to ARM EH. Builtin PR1 means there is no language-dependent data, Builtin PR2 is used for C/C++
Misc
Khalil Estell's CppCon 2024 talk C++ Exceptions for Smaller Firmware mentions that a custom exception implementation that drops some rare functionality can make the library code size mush smaller, suitable for firmware development.
中文版
几周前写了一篇文章详细介绍stack unwinding。 今天介绍C++ exception handling,stack unwinding的一个应用。Exception handling有多种ABI(interoperability of C++ implementations),其中应用最广泛的是Itanium C++ ABI: Exception Handling
Itanium C++ ABI: Exception Handling
简化的exception处理流程(从throw到catch):
- 调用
__cxa_allocate_exception
分配空间存放exception object和exception header__cxa_exception
- 跳转到
__cxa_throw
,设置__cxa_exception
字段后跳转到_Unwind_RaiseException
_Unwind_RaiseException
执行search phase,调用personality查找匹配的try catch(类型匹配)_Unwind_RaiseException
执行cleanup phase:调用personality查找包含out-of-scope变量的stack frames,对于每个stack frame,跳转到其landing pad执行destructors。该landing pad用_Unwind_Resume
跳转回cleanup phase_Unwind_RaiseException
执行的cleanup phase跳转到匹配的try catch对应的landing pad- 该landing
pad调用
__cxa_begin_catch
,执行catch代码,然后调用__cxa_end_catch
__cxa_end_catch
销毁exception object
注意:每个栈帧的personality routine可以不同。实践中多个栈帧使用同一个personality routine是很常见的。
其中_Unwind_RaiseException
负责stack
unwinding,是语言无关的。而stack unwinding中的语言相关概念(catch
block、out-of-scope variable)用personality解释/封装。
这是一个核心思想,使得该ABI可以应用与其他语言并允许其他语言和C++混用。
因此,Itanium C++ ABI: Exception Handling分成Level 1 Base ABI and
Level 2 C++ ABI两部分。Base ABI描述了语言无关的stack
unwinding部分,定义了_Unwind_*
API。常见实现是:
- libgcc:
libgcc_s.so.1
andlibgcc_eh.a
- 多个名称为libunwind的库(
libunwind.so
或libunwind.a
)。使用Clang的话可以用--rtlib=compiler-rt --unwindlib=libunwind
选择链接libunwind,可以用llvm-project/libunwind或nongnu.org/libunwind
C++ ABI则和C++语言相关,定义了__cxa_*
API(__cxa_allocate_exception
, __cxa_throw
,
__cxa_begin_catch
等)。常见实现是:
- libsupc++,libstdc++的一部分
- llvm-project中的libc++abi
llvm-project中的C++标准库实现libc++可以接入libc++abi、libcxxrt或libsupc++,推荐使用libc++abi。
Level 1 Base ABI
Data structures
主要数据结构是:
1 | // Level 1 |
1 | int main() { |
exception_class
和exception_cleanup
是Level
2抛出exception的API设置的。Level 1
API不处理exception_class
,只是把它传递给personality
routine。Personality routine用该值区分native和foreign exceptions。
libc++abi
__cxa_throw
会设置exception_class
为表示"CLNGC++\0"
的uint64_t。libsupc++则使用表示"GNUCC++\0"
的uint64_t。ABI要求低位包含"C++\0"
。 libstdc++抛出的exceptions会被libc++abi当作foreign exceptions。只有catch (...)
可以捕获foreign exceptions。
Exception propagation实现机制会用另一个
exception_class
标识符来表示dependent exceptions。
exception_cleanup
存放这个exception object的destroying
delete函数,被__cxa_end_catch
用来销毁一个foreign
exception。
private_1
和private_2
是Level
1私有的,不应被personality使用。
Unwind操作需要的信息(对于给定的IP/SP,如何获取上一层栈帧的IP/SP等寄存器信息)是实现相关的,Level
1 ABI没有定义。
在ELF系统里,.eh_frame
和.eh_frame_hdr
(PT_EH_FRAME
program header)存储unwind信息。 参见Stack
unwinding。
Level 1 API
_Unwind_Reason_Code _Unwind_RaiseException(_Unwind_Exception *obj);
执行用于exception的stack
unwinding。 它正常情况下是noreturn的,会像longjmp那样把控制权交给matched
catch handler(catch block)或non-catch
handlers(需要执行destructors的代码块)。 它是个two-phase
process,分为phase 1 (search phase)和phase 2 (cleanup phase)。
- search phase查找matched catch handler,把stack
pointer记录在
private_2
中- 根据IP/SP及其他保存的寄存器追溯调用链
- 对于每个栈帧,如果没有personality
routine则跳过;有则调用(actions设置为
_UA_SEARCH_PHASE
) - 若personality返回
_URC_CONTINUE_UNWIND
,继续搜索 - 若personality返回
_URC_HANDLER_FOUND
,表示找到了一个matched catch handler or unmatched exception specification,停止搜索
- cleanup phase跳转到non-catch handlers(通常是local variable
destructors),再把控制权交给phase 1定位的matched catch handler
- 根据IP/SP及其他保存的寄存器追溯调用链
- 对于每个栈帧,如果没有personality
routine则跳过;有则调用(actions设置为
_UA_CLEANUP_PHASE
,search phase标记的栈帧还会设置_UA_HANDLER_FRAME
) - 若personality返回
_URC_CONTINUE_UNWIND
,表示没有landing pad,继续unwind - 若personality返回
_URC_INSTALL_CONTEXT
,表示有landing pad,跳转到landing pad - 对于search phase没有标记的中间栈帧,landing
pad执行清理工作(一般是destructors of out-of-scope
variables),会调用
_Unwind_Resume
跳转回cleanup phase - 对于被search phase标记的栈帧,landing
pad调用
__cxa_begin_catch
,然后执行catch block中的代码,最后调用__cxa_end_catch
销毁exception object
1 | static _Unwind_Reason_Code unwind_phase1(unw_context_t *uc, _Unwind_Context *ctx, |
C++不支持resumptive exception handling (correcting the exceptional condition and resuming execution at the point where it was raised),所以two-phase process不是必需的,但two-phase允许C++和其他语言共存于call stack上。
_Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *obj, _Unwind_Stop_Fn stop, void *stop_parameter);
执行forced
unwinding: 跳过search phase,执行稍微不同的cleanup
phase。private_2
被用作stop function的参数。
这个函数很少用到。
void _Unwind_Resume(_Unwind_Exception *obj);
继续phase
2的unwind过程。它类似longjmp,是noreturn的,是唯一被编译器直接调用的Level
1 API。编译器通常在non-catch handlers末尾调用该函数。
void _Unwind_DeleteException(_Unwind_Exception *obj);
销毁指定的exception
object。它是唯一处理exception_cleanup
的Level 1
API,被__cxa_end_catch
调用。
很多实现提供扩展:_Unwind_Reason_Code _Unwind_Backtrace(_Unwind_Trace_Fn callback, void *ref);
是另一种特殊的unwind过程:忽略personality,将栈帧信息通知一个外部callback。
Level 2 C++ ABI
这一部分处理C++的throw、catch block、out-of-scope variable destructors等语言相关概念。
Data structures
每个thread有一个全局exception栈,caughtExceptions
存储栈顶(最新)的exception,__cxa_exception::nextException
指向栈中下一个exception。
1
2
3
4struct __cxa_eh_globals {
__cxa_exception *caughtExceptions;
unsigned uncaughtExceptions;
};
1 | int main() { |
__cxa_exception
的定义如下,其末尾存放Base
ABI定义的_Unwind_Exception
。__cxa_exception
在_Unwind_Exception
基础上添加了C++语义信息。
1 | // Level 2 |
处理exception需要的信息(对于给定的IP,是否在try catch中、是否有需要执行的out-of-scope variable destructors、是否有dynamic exception specification)叫作language-specific data area (LSDA),是实现相关的,Level 2 ABI没有定义。
Landing pad
Landing pad是text section中的一段和exception相关的代码,它有三种:
- cleanup clause:通常调用destructors of out-of-scope
variables或
__attribute__((cleanup(...)))
注册的callbacks,然后用_Unwind_Resume
跳转回cleanup phase - 捕获exception的catch clause:调用destructors of out-of-scope
variables,然后调用
__cxa_begin_catch
,执行catch代码,最后调用__cxa_end_catch
- rethrow:调用destructors of out-of-scope variables in the catch
clause,然后调用
__cxa_end_catch
,接着用_Unwind_Resume
跳转回cleanup phase
如果一个try有多个catch,那么language-specific data
area里会有多个串联的action table entries,但landing pad描述合并的catch
clauses。 Personality在转移控制权给landing
pad前,会调用_Unwind_SetGP
设置__buitin_eh_return_data_regno(1)
存放switchValue,告知landing
pad哪一个类型匹配了。
Rethrow是在执行catch代码中间被__cxa_rethrow
触发的,需要destruct
catch clause定义的局部变量,调用__cxa_end_catch
抵消catch
clause开头调用的__cxa_begin_catch
。
.gcc_except_table
ELF系统里language-specific data
area通常存储在.gcc_except_table
section中。该section被__gxx_personality_v0
和__gcc_personality_v0
解析。它的结构很简单:
- header(@LPStart、@TType和call sites的编码,action records的起始偏移)
- call site table: 描述每个call site(一个地址区间)应执行的landing pad offset (0 if not exists)和action record offset (biased by 1, 0 for no action)
- action table
- type table (referennced by postive switch values)
- dynamic exception specification (deprecated in C++, so rarely used) (referenced by negative switch values)
下面是一个例子:
1 | .section .gcc_except_table,"a",@progbits |
每个call site record除了call site offset和length外还有两个值landing pad offset和action record offset。
- landing pad offset为0。action record offset也应为0。没有landing pad
- landing pad offset非0。有landing pad
- action record
offset为0,也叫做cleanup("cleanup"这个描述有些歧义,因为Level 1有clean
phase的术语),通常描述local variable
destructors和
__attribute__((cleanup(...)))
- action record offset非0。action record offset指向action table中一条action record。catch or noexcept specifier or exception specification
- action record
offset为0,也叫做cleanup("cleanup"这个描述有些歧义,因为Level 1有clean
phase的术语),通常描述local variable
destructors和
每个action record有两个值:
- switch value (SLEB128): 正数表示catch的类型的TypeInfo在type table中的下标;负数表示type table中一个exception specification的offset;0表示cleanup action,效果类似于call site record中action record offset为0
- offset to next action record: 须要处理的下一个action record,0表示结束。这种单链表形式可以描述串联的多个catch,或exception specification list
offset to next action record不仅可以用作单链表,也可用作trie,但几乎碰不到可以用上trie性质的场景。
程序中不同区域对应的landing pad offset/action record offset取值:
- 无local variable
destructor的非try区域:
landing_pad_offset==0 && action_record_offset==0
- 有local variable
destructor的非try区域:
landing_pad_offset!=0 && action_record_offset==0
。phase 2应停下调用cleanup - 有
__attribute__((cleanup(...)))
的变量的非try区域:landing_pad_offset!=0 && action_record_offset==0
。同上 - try区域:
landing_pad_offset!=0 && action_record_offset!=0
。landing pad指向catch拼接得到的代码块。action record为大于0的type filter描述一个catch - try区域,含
catch (...)
:同上。action record为大于0的type filter指向type table中一个值0的项(表示catch any) - 在一个含noexcept specifier的函数可能propagate
exception到caller的区域:
landing_pad_offset!=0 && action_record_offset!=0
。landing pad指向调用std::terminate
的代码块。action record为大于0的type filter指向type table中一个值0的项(表示catch any) - 在一个含exception specifier的函数可能propagate
exception到caller的区域:
landing_pad_offset!=0 && action_record_offset!=0
。landing pad指向调用__cxa_call_unexpected
的代码块。action record为小于0的type filter描述一个exception specifier list
Level 2 API
void *__cxa_allocate_exception(size_t thrown_size);
。编译器为throw A();
生成该函数的调用,分配一段内存存放__cxa_exception
和A
object。__cxa_exception
紧挨在A object左侧。
下面这个函数说明了程序操作的exception
object的地址和__cxa_exception
的关系: 1
2
3static void *thrown_object_from_cxa_exception(__cxa_exception *exception_header) {
return static_cast<void *>(exception_header + 1);
}
void __cxa_throw(void *thrown, std::type_info *tinfo, void (*destructor)(void *));
调用上述函数找到__cxa_exception
header,填充各个字段(referenceCount, exception_class, unexpectedHandler, terminateHandler, exceptionType, exceptionDestructor, unwindHeader.exception_cleanup
)后调用_Unwind_RaiseException
。这个函数是noreturn的。
void *__cxa_begin_catch(void *obj);
。编译器在catch
block的开头生成该函数的调用。对于native exception:
- 加
handlerCount
- 压入该thread的全局exception栈,减少
uncaught_exception
值 - 返回adjusted pointer of the exception object
对于foreign exception(不一定有__cxa_exception
header):
- 该thread的全局exception栈为空的话则push,否则执行
std::terminate
(不知道是否有类似__cxa_exception::nextException
的字段) - 返回
static_cast<_Unwind_Exception *>(obj) + 1
(假设_Unwind_Exception
紧挨着thrown object)
简化实现: 1
2
3
4
5
6
7
8
9
10
11
12void __cxa_throw(void *thrown, std::type_info *tinfo, void (*destructor)(void *)) {
__cxa_exception *hdr = (__cxa_exception *)thrown - 1;
hdr->exceptionType = tinfo; hdr->destructor = destructor;
hdr->unexpectedHandler = std::get_unexpected();
hdr->terminateHandler = std::get_terminate();
hdr->unwindHeader.exception_class = ...;
__cxa_get_globals()->uncaughtExceptions++;
_Unwind_RaiseException(&hdr->unwindHeader);
// Failed to unwind, e.g. the .eh_frame FDE is absent.
__cxa_begin_catch(&hdr->unwindHeader);
std::terminate();
}
void __cxa_end_catch();
在catch
block末尾或rethrow时被调用。对于native exception:
- 从该thread的全局exception栈上获取当前exception,减少
handlerCount
handlerCount
到〇则pop该thread的全局exception栈- 如果是native
exception:
handlerCount
减少到0时调用__cxa_free_exception
(有dependent exception时得减少referenceCount
,到0时调用__cxa_free_exception
)
对于foreign exception:
- 调用
_Unwind_DeleteException
- 执行
__cxa_eh_globals::uncaughtExceptions = nullptr;
(由于__cxa_begin_catch
性质,栈中有恰好一个exception)
void __cxa_rethrow();
会标注exception
object,使handlerCount
被__cxa_end_catch
减低到0时不会被销毁,因为这个object会被_Unwind_Resume
恢复的cleanup
phase复用。
注意,除了__cxa_begin_catch
和__cxa_end_catch
,多数__cxa_*
函数无法处理foreign
exceptions(没有__cxa_exception
header)。
实例
对于如下代码: 1
2
3
4
5
6
struct A { ~A(); };
struct B { ~B(); };
void foo() { throw 0xB612; }
void bar() { B b; foo(); }
void qux() { try { A a; bar(); } catch (int x) { puts(""); } }
编译得到的汇编概念上长这样:
1 | void foo() { |
运行流程:
- qux调用bar,bar调用foo,foo抛出exception
- foo动态分配内存块,存放抛出的int和
__cxa_exception
header,然后执行__cxa_throw
__cxa_throw
填充__cxa_exception
的其他字段,调用_Unwind_RaiseException
接下来_Unwind_RaiseException
驱动Level 1的two-phase
process。
_Unwind_RaiseException
执行phase 1: search phase- 对于bar,以
_UA_SEARCH_PHASE
为actions参数调用personality,返回_URC_CONTINUE_UNWIND
(没有catch handler) - 对于qux,以
_UA_SEARCH_PHASE
为actions参数调用personality,返回_URC_HANDLER_FOUND
(有catch handler) - 标记qux的栈帧的stack
pointer会被标记(保存在
private_2
中),并停止搜索
- 对于bar,以
_Unwind_RaiseException
执行phase 2: cleanup phase- bar的栈帧不是search
phase标记的,以
_UA_CLEANUP_PHASE
为actions参数调用personality,返回_URC_INSTALL_CONTEXT
- 跳转到bar的栈帧的landing pad
- landing pad清理b之后用
_Unwind_Resume
回到cleanup phase - qux的栈帧是search
phase标记的,以
_UA_CLEANUP_PHASE|_UA_HANDLER_FRAME
为actions参数调用personality,返回_UA_INSTALL_CONTEXT
- 跳转到qux栈帧的landing pad
- landing
pad调用
__cxa_begin_catch
,执行catch代码,然后调用__cxa_end_catch
- bar的栈帧不是search
phase标记的,以
__gxx_personality_v0
Personality routine被Level 1 phase 1和phase 2调用,用于提供语言相关处理。不同的语言、实现或架构可能使用不同的personality routines。常见的personality如下:
__gxx_personality_v0
: C++__gxx_personality_sj0
: sjlj__gcc_personality_v0
: C-fexceptions
,用于__attribute__((cleanup(...)))
__CxxFrameHandler3
: Windows MSVC__gxx_personality_seh0
: MinGW-w64-fseh-exceptions
__objc_personality_v0
: MacOSX环境ObjC
C++在ELF系统上的实现最常用的是__gxx_personality_v0
,其实现在:
- GCC:
libstdc++-v3/libsupc++/eh_personality.cc
- libc++abi:
src/cxa_personality.cpp
_Unwind_Reason_Code (*__personality_routine)(int version, _Unwind_Action action, uint64 exceptionClass, _Unwind_Exception *exceptionObject, _Unwind_Context *context);
没有错误的情况下:
- For
_UA_SEARCH_PHASE
, returns_URC_CONTINUE_UNWIND
: no lsda, or there is no landing pad, there is a non-catch handler or a matched exception specification_URC_HANDLER_FOUND
: there is a matched catch handler or an unmatched exception specification
- For
_UA_CLEANUP_PHASE
, returns_URC_CONTINUE_UNWIND
: no lsda, or there is no landing pad, or (not produced by a compiler) there is no cleanup action_URC_INSTALL_CONTEXT
: the other cases
Personality转移控制权给landing
pad前,会调用_Unwind_SetGP
设置两个寄存器(架构相关,__buitin_eh_return_data_regno(0)
和__buitin_eh_return_data_regno(1)
)存放_Unwind_Exception *
和switchValue
。
代码:
1 | _unwind_Reason_Code __gxx_personality_v0(int version, _Unwind_Action actions, uint64_t exceptionClass, _Unwind_Exception *exc, _Unwind_Context *ctx) { |
对于native exception,search phase
personality返回_URC_HANDLER_FOUND
时会缓存该栈帧的LSDA相关信息。在cleanup
phase再度调用personality时actions == (_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME)
,personality知道可以读取缓存,不需要解析.gcc_except_table
。
在剩下三种情况下会调用scan_eh_tab
解析.gcc_except_table
:
actions & _UA_SEARCH_PHASE
actions & _UA_CLEANUP_PHASE && actions & _UA_HANDLER_FRAME && !is_native
: foreign exception可以被catch (...)
捕获,但遇到exception specification则应terminateactions & _UA_CLEANUP_PHASE && !(actions & _UA_HANDLER_FRAME)
: non-catch handlers and unmatched catch handlers, matched exception specification。还有一种可能是_Unwind_ForcedUnwind
的phase 2
1 | static void scan_eh_tab(...) { |
__gcc_personality_v0
libgcc and
compiler-rt/lib/builtins实现了这个函数来处理__attribute__((cleanup(...)))
。
默认的实现在search
phase没有返回_URC_HANDLER_FOUND
,所以cleanup
handler不能用作catch handler。 然而,我们可以提供自己的实现在search
phase返回_URC_HANDLER_FOUND
...
在x86-64上,__buitin_eh_return_data_regno(0)
是RAX。我们可以让cleanup
handler传递RAX给landing pad。
1 | // a.cc |
1 | % clang -c -fexceptions a.cc b.c |
Rethrow
前面Landing pad节简述了rethrow执行的代码。通常caught
exception会在__cxa_end_catch
销毁,因此__cxa_rethrow
会标记exception
object并增加handlerCount
。
C++11 引入了Exception Propagation (N2179;
std::rethrow_exception
etc),libstdc++中使用__cxa_dependent_exception
实现。
设计参见https://gcc.gnu.org/legacy-ml/libstdc++/2008-05/msg00079.html
1 | struct __cxa_dependent_exception { |
std::current_exception
和std::rethrow_exception
会增加引用计数。
在libstdc++里,
__cxa_rethrow
调用GCC扩展_Unwind_Resume_or_Rethrow
(能resume
forced unwinding)。
LLVM IR
待补充
- nounwind: cannot unwind
- unwtables: force generation of the unwind table regardless of nounwind
1 | if uwtables |
编译器行为
-fno-exceptions -fno-asynchronous-unwind-tables
: neither.eh_frame
nor.gcc_except_table
exists-fno-exceptions -fasynchronous-unwind-tables
:.eh_frame
exists,.gcc_except_table
doesn't-fexceptions
: both.eh_frame
and.gcc_except_table
exist- In GCC, for a
noexcept
function, a possibly-throwing call site unhandled by a try block does not get an entry in the.gcc_except_table
call site table. If the function has no try block, it gets a header-only.gcc_except_table
(4 bytes) - In Clang, there is a call site entry calling
__clang_call_terminate
. The size overhead is larger than GCC's scheme. Improving this requires LLVM IR work
- In GCC, for a
如果某个exception将要propagate到一个function的caller时:
- no
.eh_frame
:_Unwind_RaiseException
returns_URC_END_OF_STACK
.__cxa_throw
callsstd::terminate
.eh_frame
without.gcc_except_table
: pass-through (local variable destructors are not called). This is the case of-fno-exceptions -fasynchronous-unwind-tables
..eh_frame
with empty.gcc_except_table
:__gxx_personality_v0
callsstd::terminate
since no call site code range matches.eh_frame
with proper.gcc_except_table
: unwind
结合上述描述,某个exception将要propagate到一个noexcept function的caller时:
-fno-exceptions -fno-asynchronous-unwind-tables
: propagating through a function callsstd::terminate
-fno-exceptions -fasynchronous-unwind-tables
: pass-through. Local variable destructors are not called. This behavior is unexpected.-fexceptions
: propagating through anoexcept
function callsstd::terminate
When std::terminate
is called, there is a diagnostic
looking like
terminate called after throwing an instance of 'int'
(libstdc++; libc++ has a smiliar one). There is no stack trace. If the
process installs a SIGABRT
signal handler, the handler may
get a stack trace and symbolize the addresses.
Catching exceptions while unwinding through -fno-exceptions code is a proposal to improve the diagnostics.
Personality and typeinfo encoding
.eh_frame
contains information about the unwind
operation. See Stack
unwinding for its format.
In -fpie/-fpic
mode, the personality and type info
encodings have the DW_EH_PE_indirect|DW_EH_PE_pcrel
bits on
most targets. 1
2
3
4
5
6void raise() { throw 42; }
bool foo() {
try { raise(); } catch (int) { return true; }
return false;
}
int main() { foo(); }1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25_Z3foov:
.cfi_startproc
.cfi_personality 155, DW.ref.__gxx_personality_v0
.cfi_lsda 27, .Lexception0
...
.section .gcc_except_table,"a",@progbits
...
# >> Catch TypeInfos <<
.Ltmp3: # TypeInfo 1
.long .L_ZTIi.DW.stub-.Ltmp3
.Lttbase0:
.data
.p2align 3, 0x0
.L_ZTIi.DW.stub:
.quad _ZTIi
.hidden DW.ref.__gxx_personality_v0
.weak DW.ref.__gxx_personality_v0
.section .data.DW.ref.__gxx_personality_v0,"aGw",@progbits,DW.ref.__gxx_personality_v0,comdat
.p2align 3, 0x0
.type DW.ref.__gxx_personality_v0,@object
.size DW.ref.__gxx_personality_v0, 8
DW.ref.__gxx_personality_v0:
.quad __gxx_personality_v0
In the example, .eh_frame
contains a PC-relative
relocations referencing DW.ref.__gxx_personality_v0
.gcc_except_table
contains a PC-relative relocation
referencing .L_ZTIi.DW.stub
. The relocations are link-time
constants, so .eh_frame
can remain readonly.
DW.ref.__gxx_personality_v0
and
.L_ZTIi.DW.stub
reside in writable sections which will
contain dynamic relocations if __gxx_personality_v0
and
_ZTIi
are defined in a shared object - which is often the
case.
For -fno-pic
code, different targets have different
ideas. AArch64 and RISC-V use
DW_EH_PE_indirect|DW_EH_PE_pcrel
as well. On x86,
.cfi_personality
refers to
__gxx_personality_v0
. This will lead to a canonical PLT if
__gxx_personality_v0
is defined in a shared object (e.g.
libstdc++.so.6
). I sent a patch https://gcc.gnu.org/PR108622 to use
DW_EH_PE_indirect|DW_EH_PE_pcrel
.
R_MIPS_32
and R_MIPS_64
personality encoding
https://github.com/llvm/llvm-project/issues/58377
1 | void foo() { try { throw 1; } catch (...) {} } |
mips64el-linux-gnuabi64-g++ -fpic
and
clang++ --target=mips64el-unknown-linux-gnuabi64 -fpic
use
DW_EH_PE_absptr | DW_EH_PE_indirect
to encode personality
routine pointers. Using DW_EH_PE_absptr
instead of
DW_EH_PE_pcrel
is wrong. GNU ld works around the compiler
design problem by converting DW_EH_PE_absptr
to
DW_EH_PE_pcrel
. ld.lld does not support this and will
report an error: 1
2
3
4
5
6% clang++ --target=mips64el-linux-gnuabi -fpic -fuse-ld=lld -shared ex.cc
ld.lld: error: relocation R_MIPS_64 cannot be used against symbol 'DW.ref.__gxx_personality_v0'; recompile with -fPIC
>>> defined in /tmp/ex-40a996.o
>>> referenced by ex.cc
>>> /tmp/ex-40a996.o:(.eh_frame+0x13)
...
R_MIPS_32
for 32-bit builds is similar.
Potentially-throwing
__cxa_end_catch
__cxa_end_catch
is potentially-throwing because it may
destroy an exception object with a potentially-throwing destructor (e.g.
~C() noexcept(false) { ... }
). 1
2
3
4
5
6
7
8struct A { ~A(); };
void opaque();
void foo() {
A a;
// The exception object has an unknown type and may throw. The landing pad
// then needs to call A::~A for `a` before jumping to _Unwind_Resume.
try { opaque(); } catch (...) { }
}
To support an exception object with a potentially-throwing destructor, Clang generates conservative code for a catch-all clause or a catch clause matching a record type:
- assume that the exception object may have a throwing destructor
- emit
invoke void @__cxa_end_catch
(as the call is not marked as thenounwind
attribute). - emit a landing pad to destroy local variables and call
_Unwind_Resume
Per C++ [dcl.fct.def.coroutine], a coroutine's function body implies
a catch (...)
. Clang's code generation pessimizes even
simple code, like: 1
2
3
4
5
6
7UserFacing foo() {
A a;
opaque();
co_return;
// For `invoke void @__cxa_end_catch()`, the landing pad destroys the
// promise_type and deletes the coro frame.
}
Throwing destructors are typically discouraged. In many environments, the destructors of exception objects are guaranteed to never throw, making our conservative code generation approach seem wasteful.
Furthermore, throwing destructors tend not to work well in practice:
- GCC does not emit call site records for the region containing
__cxa_end_catch
. This has been a long time, since 2000. - If a catch-all clause catches an exception object that throws, both GCC and Clang using libstdc++ leak the allocated exception object.
To avoid code generation pessimization, I added -fassume-nothrow-exception-dtor
for Clang 18 to assume that __cxa_end_catch
calls have the
nounwind
attribute. This requires that thrown exception
objects' destructors will never throw.
To detect misuses, diagnose throw expressions with a
potentially-throwing destructor. Technically, it is possible that a
potentially-throwing destructor never throws when called transitively by
__cxa_end_catch
, but these cases seem rare enough to
justify a relaxed mode.
其他
使用libc++和libc++abi
On Linux, compared with clang
, clang++
additionally links against libstdc++/libc++ and libm.
Dynamically link against libc++.so (which depends on libc++abi.so)
(additionally specify -pthread
if threads are used):
1 | clang++ -stdlib=libc++ -nostdlib++ a.cc -lc++ -lc++abi |
If compile actions and link actions are separate
(-stdlib=libc++
passes -lc++
but its position
is undesired, so just don't use it):
1 | clang++ -nostdlib++ a.cc -lc++ -lc++abi |
Statically link in libc++.a (which includes the members of
libc++abi.a). This requires a
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=on
build:
1 | clang++ -stdlib=libc++ -static-libstdc++ -nostdlib++ a.cc -pthread |
Statically link in libc++.a and libc++abi.a. This is a bit inferior because there is a duplicate -lc++ passed by the driver.
1 | clang++ -stdlib=libc++ -static-libstdc++ -nostdlib++ a.cc -Wl,--push-state,-Bstatic -lc++ -lc++abi -Wl,--pop-state -pthread |
libc++abi和libsupc++
值得注意的是,libc++abi提供的<exception> <stdexcept>
类型布局(如logic_error
runtime_error
等)都是特意和libsupc++兼容的。 GCC
5的libstdc++抛弃ref-counted
std::string
后libsupc++仍使用__cow_string
用于logic_error
等。libc++abi也使用了类似的ref-counted
string。
libsupc++和libc++abi不使用inline
namespace,有冲突的符号名,因此通常一个libc++/libc++abi应用无法使用某个动态链接libstdc++.so
的shared
object(ODR violation)。
如果花一些工夫,还是能解决这个问题的:编译libstdc++中非libsupc++的部分得到自制libstdc++.so.6
。可执行档链接libc++abi提供libstdc++.so.6
需要的C++
ABI符号。
Monolithic
.gcc_except_table
Clang 12之前采用monolithic
.gcc_except_table
。和其他很多metadata
sections一样,monolithic设计的主要问题是无法被linker garbage collect。
对于RISC-V -mrelax
和basic block
sections则会有更大的问题:.gcc_except_table
有指向text
sections local symbols的relocations。 如果指向的text sections在COMDAT
group中被丢弃,则这些relocations会被linker拒绝(error: relocation refers to a symbol in a discarded section
)。
解决方案就是采用fragmented .gcc_except_table
(https://reviews.llvm.org/D83655)。
但实际部署没有那么简单:)LLD先处理--gc-sections
(尚不明确哪些.eh_frame
pieces是live的),后处理(包括GC).eh_frame
。
--gc-sections
时,所有.eh_frame
pieces是live的。它们会标记所有.gcc_except_table.*
live。
根据section
group的GC规则,一个.gcc_except_table.*
会标注同一section
group的其他sections(包含.text.*
) live。 结果就是所有section
groups中的.text.*
无法被GC,导致输入大小增大。
https://reviews.llvm.org/D91579修复了这个问题:对于.eh_frame
,不要标注section
group中的.gcc_except_table
。
-fbasic-block-sections=
使用basic block sections时,可以选择每个basic block
section获得其专属的.gcc_except_table
,或者让一个函数的所有basic
block
sections使用同一个.gcc_except_table
。LLVM实现选择了后者,有几个好处:
- No duplicate headers
- Sharable type table
- Sharable action table (this only matters for the deprecated exception specification)
使用同一个.gcc_except_table
就只有一个LPStart,得保证所有landing
pads到LPStart的offsets均可以用relocations表示。
因为多数架构没有表示差的relocation type,因此把landing
pads放在同一个section是最合适的表示方式。
Exception handling ABI for the ARM architecture
整体结构和Itanium C++ ABI: Exception
Handling相同,数据结构、_Unwind_*
等有些许差异。
https://maskray.me/blog/2020-11-08-stack-unwinding含有少量注记。
Compact Exception Tables for MIPS ABIs
用.eh_frame_entry
和.gnu_extab
描述。
设计理念:
- Exception code ranges are sorted and must be linearly searched. Therefore it would be more compact to specify each relative to the previous one, rather than relative to a fixed base.
- The landing pad is often close to the exception region that uses it. Therefore it is better to use the end of the exception region as the reference point, than use the function base address.
- The action table can be integrated directly with the exception region definition itself. This removes one indirection. The threading of actions can still occur, by providing an offset to the next exception encoding of interest.
- Often the action threading is to the next exception region, so optimizing that case is important.
- Catch types and exception specification type lists cannot easily be encoded inline with the exception regions themselves. It is necessary to preserve the unique indices that are automatically created by the DWARF scheme.
使用和ARM EH类似的compact unwind descriptors。Builtin PR1表示没有language-dependent data,Builtin PR2用于C/C++