The Linux kernel uses many GCC options to support ftrace.
-pg-mfentry-mnop-mcount-mprofile-kernelpowerpc-mrecord-mcount-mhotpatch=pre-halfwords,post-halfwords-fpatchable-function-entry=N[,M]
In the "prehistoric" period (Initial revision) of the current GCC git
repo, -pg support was already visible. -pg
inserts mcount() after the function prologue (Linux x86).
On other OSes or architectures, it may have different names like
_mcount, __mcount, .mcount. Trace
information can be used for gprof and gcov.
1 | # gcc -S -pg -O3 -fno-asynchronous-unwind-tables |
-pg takes effect after inlining.
- At link time, GCC will choose a different crt1 file
gcrt1.o - libc implements
gcrt1.o(glibcsysdeps/x86_64/_mcount.Sgmon/mcount.c, FreeBSDsys/amd64/amd64/prof_machdep.c). - musl does not provide
gcrt1.ohttps://www.openwall.com/lists/musl/2014/11/05/2
Usage in glibc:
gcrt1.odefines__gmon_start__. Othercrt1.ofiles don't define itcrti.ouses undefined weak__gmon_start__to detectgcrt1.o, and calls it if present__gmon_start__ingcrt1.ocalls__monstartupfor initialization. Initializing before program execution avoids call-once synchronization.
GCC
r21495 (1998) introduced -finstrument-functions,
inserting __cyg_profile_func_enter(this_fn, call_site)
after the function prologue and
__cyg_profile_func_exit(callee, call_site) before the
epilogue. Programs can implement these two functions to record function
calls. These two functions each have two parameters, which significantly
impacts code size. Additionally, many applications don't actually need
the call_site parameter.
1 | void __cyg_profile_func_enter(void *this_fn, void *call_site); |
1 | # gcc -S -O3 -finstrument-functions -fno-asynchronous-unwind-tables |
-finstrument-functions by default acts before inlining,
providing good control flow representation but with significant
overhead. The reason is that after inlining, a function may contain
multiple __cyg_profile_func_enter() calls. Clang provides
-finstrument-functions-after-inlining to trace after
inlining. GCC x86's -pg -mfentry -minstrument-return=call
can insert call __return__ on function returns, serving as
an alternative to
-finstrument-functions -finstrument-functions-after-inlining.
The earliest ftrace implementation in Linux kernel 2008 16444a8a40d
used -pg and mcount. Linux defined
mcount, comparing a function pointer to check if ftrace is
enabled. If not enabled, mcount acts like an empty
function.
1 |
|
All functions execute call mcount after their prologue,
creating significant overhead. Therefore, later Linux kernel recorded
mcount caller PCs in a hash table, using a daemon that runs once per
second to check the hash table and modify call mcount to
NOP for functions that don't need tracing.
Later, 8da3821ba56
changed from "JIT" to "AOT". At build time, a Perl script
scripts/recordmcount.pl calls objdump to record all
call mcount addresses, storing them in the
__mcount_loc section. During kernel startup, all
call mcount instructions are preemptively modified to NOPs,
eliminating the need for a daemon. Since Perl+objdump was too slow, in
2010, 16444a8a40d
added a C implementation scripts/recordmcount.c.
mcount has the drawback that stack frame size is hard to determine,
preventing ftrace from accessing tracee arguments. GCC
r162651 (2010) (GCC 4.6) introduced -mfentry, changing
the post-prologue call mcount to pre-prologue
call __fentry__. In 2011, d57c5d51a30
added x86-64 -mfentry support.
GCC
r206111 (2013) introduced SystemZ-specific -mhotpatch.
Note the description: only one NOP after function entry, with
restrictions on NOP types before entry. This lacks generality and can't
be used by other architectures. Later generalized to
-mhotpatch=pre-halfwords,post-halfwords.
GCC
b54214fe22107618e7dd7c6abd3bff9526fcb3e5 (2013-03) ported
-mprofile-kernel to PowerPC64 ELFv2. In 2016, powerpc/ftrace:
Add Kconfig & Make glue for mprofile-kernel and several previous
commits adopted this option.
GCC
r215629 (2014) introduced -mrecord-mcount and
-mnop-mcount. -mrecord-mcount replaces
linux/scripts/record_mcount.{pl,c}.
-mnop-mcount cannot be used with PIC, replacing
__fentry__ with NOP. The design didn't consider generality;
most RISC architectures can't use parameterless
-mnop-mcount. As of today, -mnop-mcount is
only supported by x86 and SystemZ.
(In 2019, Linux x86 removed mcount support 562e14f7229.)
GCC
r250521 (2017) introduced
-fpatchable-function-entry=N[,M]. Similar to
SystemZ-specific -mhotpatch=, it inserts M NOPs before
function entry and N-M NOPs after entry. Now adopted by Linux arm64 and
parisc. This feature has good design philosophy, but the implementation
has many issues and can only be used in Linux kernel.
In 2018, GCC x86 introduced -minstrument-return=call for
use with -pg -mfentry to insert
call __return__ on function returns.
-minstrument-return=nop5 inserts a 5-byte nop.
1 | # gcc -fpatchable-function-entry=3,1 -S -O3 a.c -fno-asynchronous-unwind-tables |
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197
__patchable_function_entriesgets collected byld --gc-sections(linker section garbage collection). This makes GCC's implementation unusable for most programs. This issue was finally completely fixed when adding PowerPC ELFv2 support https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99899 (milestone: 13.0) - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195
Collection of COMDAT section groups containing
__patchable_function_entriesentries causes linking errors. This makes many C++ programs usinginlineunusable. - Incorrect option name in error message:
gcc -fpatchable-function-entry=a -c a.c=>cc1: error: invalid arguments for '-fpatchable_function_entry' - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93194
__patchable_function_entriesdoesn't specify section alignment. This was my second GCC patch~ __patchable_function_entriesentries should use PC-relative relocations, not absolute relocations, to avoid generatingR_*_RELATIVEdynamic relocations after linking. I initially couldn't accept this, because other defects could be fixed on the clang side while maintaining backward compatibility, but relocation type cannot be changed. Later I realized MIPS doesn't provideR_MIPS_PC64... So I chose to forgive GCC. That's just how MIPS is: ISA defects -> psABI "invents" clever ELF tricks to work around + introduces new problems. "mips is really the worst abi i've ever seen." "you mean worst dozen abis ;"- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 When AArch64 Branch Target Identification is enabled, NOP sled should be after BTI
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492 When x86 Indirect Branch Tracking is enabled, NOP sled should be after ENDBR32/ENDBR64. Before starting to implement -fpatchable-function-entry=, I happened to add -z force-ibt to lld. So seeing the AArch64 issue naturally made me think of similar issues on x86.
- No consideration for cooperation with
-fasynchronous-unwind-tables. Again, Linux kernel uses-fno-asynchronous-unwind-tables. So during GCC implementation, this issue wasn't naturally considered - Initial
.locdirective should be before NOP sled. This causes symbolizing function addresses to fail to get filename/line number information
Fixing --gc-sections and COMDAT is quite tricky and
requires functionality from GNU as and GNU ld on the binutils side:
- GNU Assembler 2.35 implemented unique section ID (https://sourceware.org/bugzilla/show_bug.cgi?id=25380)
- GNU Assembler 2.35 implemented
SHF_LINK_ORDER(https://sourceware.org/bugzilla/show_bug.cgi?id=25381) - GNU ld --gc-sections semantics https://sourceware.org/ml/binutils/2019-11/msg00266.html
Except for AArch64 BTI, all other issues were reported by me~
Steps to add -fpatchable-function-entry= to clang:
- D72215 Introduce LLVM function attribute "patchable-function-entry", AArch64 AsmPrinter support
- D72220 x86 AsmPrinter support
- D72221 Implement
function attribute
__attribute__((patchable_function_entry(0,0)))in clang - D72222 Add driver
option
-fpatchable-function-entry=N[,0]to clang - D73070 Introduce LLVM function attribute "patchable-function-prefix"
- Move codegen passes, change NOP sled order with BTI/ENDBR, fixing cooperation between XRay, -mfentry and -fcf-protection=branch
- D73680 AArch64 BTI,
handle patch label position when M=0:
bti c; .Lpatch0: nopinstead of.Lpatch0: bti c; nop - x86 ENDBR32/ENDBR64, handle patch label position when M=0:
endbr64; .Lpatch0: nopinstead of.Lpatch0: endbr64; nop
All the above patches, except for the x86 ENDBR patch label position adjustment, are included in clang 10.0.0.
Before -fpatchable-function-entry=, clang already had
multiple methods to insert code at function entry:
-fxray-instrument. XRay uses a method similar to-finstrument-functionsfor tracing, and like Linux kernel, modifies code at runtime- Azul Systems introduced PatchableFunction for JIT. I reused this pass when introducing "patchable-function-entry"
- IR feature: prologue data, adds arbitrary bytes after function entry. Used for function sanitizer
- IR feature: prefix data, adds arbitrary bytes before function entry.
Used for GHC
TABLES_NEXT_TO_CODE. Info table is placed before entry code. GHC's LLVM backend is currently still in a state of disrepair
PowerPC64 ELFv2
PowerPC ELFv2 implementation see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888
-fpatchable-function-entry=5,2 outputs:
1 | .globl foo |
NOPs are after global entry. There are M and N-M NOPs before and after local entry respectively. Due to distance restrictions between global entry and local entry, M can only take a few values like 0 (2-2), 2 (4-2), 6 (8-2), 14 (16-2).
In PR99888, I proposed in 2022 that there's no need to make NOPs
consecutive. The discussion at the end of 2023 also showed that current
consecutive NOPs are inconvenient for kernel and userspace live
patching. In Nov 2024, -msplit-patch-nops is implemented to
place -msplit-patch-nops before the function label. (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980)