Clang provides a few options to generate timing report. Among them,
-ftime-report
and -ftime-trace
can be used to
analyze the performance of Clang's internal passes.
-fproc-stat-report
records time and memory on spawned processes (ld
, and gas if-fno-integrated-as
).-ftime-trace
, introduced in 2019, generates Clang timing information in the Chrome Trace Event format (JSON). The format supports nested events, providing a rich view of the front end.-ftime-report
: The option name is borrowed from GCC.
This post focuses on the traditional -ftime-report
,
which uses a line-based textual format.
Understanding
-ftime-report
output
The output consists of information about multiple timer groups. The last group spans the largest interval and encompasses timing data from other groups.
Up to Clang 19, the last group is called "Clang front-end time report". You would see something like the following.
1 | % clang -c -w -ftime-report ~/Dev/testsuite/sqlite3.i |
The "Clang front-end timer" timer measured the time spent in
clang::FrontendAction::Execute
, which includes lexing,
parsing, semantic analysis, LLVM IR generation, optimization, and
machine code generation. However, "Code Generation Time" and "LLVM IR
Generation Time" belonged to the default timer group "Miscellaneous
Ungrouped Timers". This caused confusion for many users. For example, https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/
elaborates on the issues.
To address the ambiguity, I revamped the output in Clang 20.
1 | ... |
The last group has been renamed and changed to cover a longer interval within the invocation. It provides timing information for four stages:
- Front end: Includes lexing, parsing, semantic analysis, and miscellnaenous tasks not captured by the subsequent timers.
- LLVM IR generation: The time spent in generating LLVM IR.
- LLVM IR optimization: The time consumed by LLVM's IR optimization pipeline.
- Machine code generation: The time taken to generate machine code or assembly from the optimized IR.
The -ftime-report
output further elaborates on these
stages through additional groups:
- "Pass execution timing report" (first instance): A subset of the "Optimizer" group, providing detailed timing for individual optimization passes.
- "Analysis execution timing report": A subset of the first "Pass execution timing report". In LLVM's new pass manager, analyses are executed as part of pass invocations.
- "Pass execution timing report" (second instance): A subset of the "Machine code generation" group. (This group's name should be updated once the legacy pass manager is no longer used for IR optimization.)
- "Instruction Selection and Scheduling": This group appears when SelectionDAG is utilized and is part of the "Instruction Selection" timer within the second "Pass execution timing report".
Examples:
"Pass execution timing report" (first instance)
1 | ===-------------------------------------------------------------------------=== |
When -ftime-report=per-run-pass
is specified, a timer is
created for each pass object. This can result in significant output,
especially for modules with numerous functions, as each pass will be
reported multiple times.
Clang internals
As clang -### -c -ftime-report
shows, clangDriver
forwards -ftime-report
to Clang cc1. Within cc1, this
option sets the codegen flag
clang::CodeGenOptions::TimePasses
. This flag enables eth
uses of llvm::Timer
objects to measure the execution time
of specific code blocks.
From Clang 20 onwards, the placement of the timers can be understood through the following call tree.
1 | cc1_main |
The measured interval does not cover the whole invocation. integrated
cc1 clang -c -ftime-report a.c
LLVM internals
LLVM/lib/Support/Time.cpp
implements the timer feature.
Timer
belongs to a TimerGroup
.
Timer::startTimer
and Timer::stopTimer
generate a TimeRecord
. In
clang/tools/driver/cc1_main.cpp
,
llvm::TimerGroup::printAll(llvm::errs());
dumps these
TimerGroup
and TimeRecord
information to
stderr.
There are a few cl::opt
options
sort-timers
(default: true): sort the timers in a group in descending wall time.track-memory
: record increments or decrements in malloc statistics. In glibc 2.33 and above, this utilizesmallinfo2::unordblks
.info-output-file
: dump output to the specified file.
Examples:
1 | clang -c -ftime-report -mllvm -sort-timers=0 a.c |
The cl::opt option -time-passes
can be used with the
LLVM internal tools opt
and llc
, e.g.
1 | opt -S -passes='default<O2>' -time-passes < a.ll |
On Apple platforms, LLVM_SUPPORT_XCODE_SIGNPOSTS=on
builds enable
os_signpost
for
startTimer
/stopTimer
.