I do not use Apple products myself, but I sometimes delve into Mach-O due to my interest in object file formats. Additionally, my LLVM/Clang changes sometimes require some understanding of Mach-O. Occasionally, I need to understand the format to some extent to work around its quirks (the old format inherited many problems of "a.out").
Recently, there has been interest (from Oleksii Lozovskyi) in enabling XRay, a function call tracing system in LLVM, to work on Apple systems. Intrigued by this, I decided to delve into the details and investigate the necessary changes. XRay supports many 64-bit architectures on Linux and some BSDs. I became acquainted with XRay back in 2017 and made some casual contributions since then.
If the target triple is x86_64-apple-darwin
, you may
notice that Clang will allow you to perform compilation, but linking
will fail.
1 | % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c |
For arm64-apple-darwin
, Clang rejected
-fxray-instrument
, but we can bypass this driver detection
with -Xclang
. (This is now accepted.) 1
2% clang --target=arm64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c
clang: error: the clang compiler does not support '-fxray-instrument on aarch64-apple-darwin'1
2% clang --target=arm64-apple-darwin -Xclang -fxray-instrument -Xclang -fxray-instruction-threshold=1 -c a.c
fatal error: error in backend: unsupported relocation of local symbol ''. Must have non-local symbol earlier in section.
I noticed that this fatal error is caused by an old workaround (2015) for ld64 and proposed to remove it: https://reviews.llvm.org/D152831.
SUBTRACTOR
relocation types require r_extern==1
Let's examine
clang -S --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1
generated assembly. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 .section __DATA,xray_instr_map
Lxray_sleds_start0:
Ltmp0:
.quad Lxray_sled_0-Ltmp0 // problematic
.quad Lfunc_begin0-(Ltmp0+8) // problematic
.byte 0x00
.byte 0x00
.byte 0x02
.space 13
Ltmp1:
.quad Lxray_sled_1-Ltmp1 // problematic
.quad Lfunc_begin0-(Ltmp1+8) // problematic
.byte 0x01
.byte 0x00
.byte 0x02
.space 13
Lxray_sleds_end0:
I have marked 4 label differences as problematic. Let's examine the
first one, .quad Lxray_sled_0-Ltmp0
, which is represented
as a pair of relocations (llvm-readobj -r a.o
)
1
20x0 0 3 0 X86_64_RELOC_SUBTRACTOR 0 xray_instr_map
0x0 0 3 1 X86_64_RELOC_UNSIGNED 0 _foo
X86_64_RELOC_SUBTRACTOR
is an external relocation
(r_extern==1
) where r_symbolnum
references a
symbol table entry. If r_extern==0
, Linkers will produce an
error. Symbols prefixed with "L" are considered temporary symbols in
LLVMMC and are not included in the symbol table. LLVM integrated
assembler attempts to rewrite A-B
into
A-B'+offset
where B'
can be included in the
symbol table. B' is called an atom and should be a non-temporary symbol
in the same section. However, since xray_instr_map
does not
define a non-temporary symbol, the X86_64_RELOC_SUBTRACTOR
relocation will have no associated symbol, and its r_extern
value will be 0.
To fix this issue, we need to define a non-temporary symbol in the
section. We can accomplish this by renaming
Lxray_sleds_start0
to lxray_sleds_start0
("L"
to "l"). In LLVMMC, LinkerPrivateGlobalPrefix
is set to "l"
for Apple targets. We can define an overload of
MCContext::createLinkerPrivateTempSymbol(const Twine &Name)
to allow LLVMMC to select an unused symbol starting with
lxray_sleds_start
. (There is a pitfall: we should not use
"ltmp[0-9]+" which may collide with other compiler internal symbols.)
For ELF targets, the
MCContext::createLinkerPrivateTempSymbol
function creates a
temporary symbol starting with ".L".
Note: GNU assembler calls .L
and L
system-specific local label prefixes.
Function index section
Oleksii Lozovskyi reported that the
-fxray-function-index
option was not functioning as
expected.
- (default): no function index section
-fxray-function-index
: no function index section-fno-xray-function-index
: function index section (xray_fn_idx
) is present
Initially, the function index section was always present. The
introduction of the -f[no-]xray-function-index
option
caused confusion due to the use of a negatively named variable, leading
to potential misunderstandings. A clangDriver
refactoring accidentally introduced this regression. I have resolved
the issue for Clang 17.
Now that -fxray-function-index
is back, we get the
xray_fn_idx
section by default. The section contains
entries like the following: 1
2
3
4.section __DATA,xray_fn_idx
.p2align 4, 0x90
.quad Lxray_sleds_start0
.quad Lxray_sleds_end0
These absolute addresses require rebase opcodes in the special
section __LINKEDIT,__rebase
. This is not great and I wanted
to fix it back in 2020 but never got around to do it. This motivated
me to actually fix the issue and create https://reviews.llvm.org/D152661 to change the
[start,end)
representation to the
(pc_relative_start, size)
representation.
My initial attempt somehow wrote something like this. I took a
difference of two labels, and right shifted it by 5 to get the number of
sleds. 1
2
3
4
5 .section __DATA,xray_fn_idx,regular,live_support
.p2align 4, 0x90
lxray_fn_idx0:
.quad lxray_sleds_start0-lxray_fn_idx0
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
This approach works on ELF targets but not on Mach-O targets due to a pile of assembler issues.
1 | % clang -c --target=x86_64-apple-darwin a.s |
Assembler issues
When assembling an assembly file into an object file, an expression can be evaluated in multiple steps. Two steps are particularly important:
- Parsing time. At this stage, We have a
MCAssembler
object but noMCAsmLayout
object. Instruction operands and certain directives like.if
require the ability to evaluate an expression early. - Object file writing time. At this stage, we have both a
MCAssembler
object and aMCAsmLayout
object. TheMCAsmLayout
object provides information about the offset of each fragment.
The first issue is not specific to this case and is also encountered
in ELF. The following assembly code should assemble to the hex pairs
01000001, but Clang fails to compute .if .-1b == 3
.
1
2
3
4
5
6
7
8
9
10
11
12
13% cat x.s
1:
.byte 1
.space 2
.if .-1b == 3
.byte 1
.else
.byte 0
.endif
% clang -c x.s
x.s:4:5: error: expected absolute expression
.if .-1b == 3
^
In 2019, Jian Cai implemented limited
expression folding support to LLVM integrated assembler to support the
Linux kernel arm use case. 1
2arch/arm/mm/proc-v7.S:169:143: error: expected absolute expression
.pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection
I have just added support for MCFillFragment
(.space
and .fill
) and for A-B, where A is a
pending label (which will be reassigned to a real fragment in
flushPendingLabels()
). Latest LLVM integrated assembler
(milestone: LLVM 17) can successfully assemble x.s
when a
MCAssembler
object is present. However, evaluation still
does not work without a MCAssembler
object, which is
expected. 1
2
3
4% llvm-mc x.s -filetype=null
x.s:4:5: error: expected absolute expression
.if .-1b == 3
^
Then I noticed a potential pitfall for Mach-O
in
MCSection::flushPendingLabels
. When flushing pending
labels, it did not ensure that the new fragment inherits the previous
atom symbol. I fixed this issue, although I haven't been able to create
a test case to verify this behavior.
After this fix,
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
can
be successfully assembled. However, during "direct
object emission", an error
expected relocatable expression
will be reported. The issue
is quite subtle.
In the case of direct object emission, where LLVM IR is directly
lowered to an object file bypassing assembly (e.g., using
clang -c a.c
instead of clang -c a.s
or
clang -c --save-temps a.c
), the assembler information is
not used at parse time
(MCStreamer::UseAssemblerInfoForParsing
). As a result, the
assembly code
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
will
be transformed into a fixup. Lxray_sleds_end0
is a pending
label which will then be assigned to the next fragment when AsmPrinter
emits new piece of data to the xray_instr_map
section.
During object writing time, we have a MCAsmLayout
object
and atom information for fragments. LLVMMC evaluates the fixup with the
MCAsmLayout
object. However, since
lxray_sleds_start0
and Lxray_sleds_end0
have
different atoms, causing the condition in
MachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl
to
fail. In my opinion, it may be necessary to relax the condition in this
case.
Linker dead stripping
Both xray_instr_map
and xray_fn_idx
sections contain independent pieces. To support linker dead stripping,
also known as linker garbage
collection, we need to add the S_ATTR_LIVE_SUPPORT
attribute to the two sections.
Runtime library issues
The runtime library is located at compiler-rt/lib/xray
and there are a number of issues.
First, compiler-rt/lib/xray/xray_trampoline_x86_64.S
used .Ltmp*
symbols which are temporary for ELF but
non-temporary for Mach-O. The non-temporary labels become atoms and can
cause bad dead stripping behaviors. I fixed the problem by using the
LOCAL_LABEL
macro, which generates an "L" symbol
specifically for Mach-O.
compiler-rt/lib/xray/xray_trampoline_AArch64.S
had a
similar problem. I fixed it to look like the following:
1
2
3
4
5
6
7#include "../sanitizer_common/sanitizer_asm.h"
...
.p2align 2
.global ASM_SYMBOL(__xray_FunctionEntry)
ASM_HIDDEN(__xray_FunctionEntry)
ASM_TYPE_FUNCTION(__xray_FunctionEntry)
ASM_SYMBOL(__xray_FunctionEntry):
For Apple targets, # define ASM_SYMBOL(symbol) _##symbol
prepends an underscore to the symbol name. For ELF targets,
ASM_SYMBOL
is the identity function.
Compiling compiler-rt/lib/xray/xray_trampoline_AArch64.S
will give us an external definition (N_SECT | N_EXT
)
___xray_FunctionEntry
. The undefined reference from C++
code gives an undefined symbol ___xray_FunctionEntry
(N_UNDF | N_EXT
), which can be resolved by the definition.
1
Success = patchFunctionEntry(Enable, FuncId, Sled, __xray_FunctionEntry); // extern "C" void __xray_FunctionEntry();
Second, we need https://reviews.llvm.org/D153221 to build Mach-O runtimes for multiple architectures.
After the build system, we may discover new issues.
Driver change
Normally, we want to enable a Clang driver option when a feature is
fully ready. In this case, we know the code generation has been working
for a long time, and the full support (metadata section refactoring and
runtime) will be implemented soon. Therefore, I believe it's acceptable
to make the driver accept
--target=arm64-apple-darwin -fxray-instrument
as it makes
testing the runtime more convenient.
Epilogue
As the work to port LLVM XRay to Apple systems continues, it is encouraging to witness the progress achieved thus far. The endeavor has given me opportunities to explore the Mach-O object file format.
There is a pending patch for ELF RISC-V support.