In my previous post, LLVM
integrated assembler: Improving MCExpr and MCValue, I explored
improvements to the internal representation MCExpr
and
MCValue
. This post dives into recent improvements I’ve made
to refine that system.
Symbol equating
In GNU Assembler, the following directives are called symbol equating. I have re-read its documentation https://sourceware.org/binutils/docs/as.html. Yes, it uses "equating" instead of "assignment" or "definition".
symbol = expression
(multiple=
on the same symbol is allowed).set symbol, expression
(equivalent to=
).equ symbol, expression
(equivalent to=
).equiv symbol, expression
(redefinition leads to errors).eqv symbol, expression
(lazy evaluation, not implemented in LLVM integrated assembler)
Cycle detection
Equated symbols may form a cycle, which is not allowed.
1 | # CHECK: [[#@LINE+2]]:7: error: cyclic dependency detected for symbol 'a' |
Previously, the LLVM integrated assembler detected cycles by having an occurs check when a symbol was equated.
1 | bool parseAssignmentExpression(StringRef Name, bool allow_redef, |
The occurs check function isSymbolUsedInExpression
was
defined as a tree traversal (DAG traveral more precisely, as
subexpressions can be reused, but it very rarely happens in LLVM).
1 | bool MCExpr::isSymbolUsedInExpression(const MCSymbol *Sym) const { |
The problem was that this assignment routine was not used by all
symbol equating. For instance, .weakref
and many
target-specific AsmParsers might define variables without doing the
occurs check.
This can be implemented as the classic 3-color depth-first search algorithm for graph, or 2-color for tree. If we apply the 2-color algorithm to a DAG, some vertexes (symbols) might be visited multiple times. This is OK, as shared subexpressions are very uncommon.
I settled on using a 2-color depth-first search.
1 | bool MCExpr::evaluateAsRelocatableImpl(MCValue &Res, const MCAssembler *Asm, |
Expression resolving
=
and the equivalent .set
and
equ
allow a symbol to be equated multiple times.
1 | .data |
When such a symbol is referenced, its current value is snapshoted and used. Future reassignments do not change previous references.
In general, the LLVM integrated assembler did not allow equating a
symbol whose value was not a MCConstExpr
(a parse-time
integer constant). 1
2
3
4% clang -c g.s
g.s:6:8: error: invalid reassignment of non-absolute variable 'x'
.set x,.-.data
^
This was probably to reject potentially unsafe reassignments. When a symbol is being reassigned, its old value might still be referenced by an instruction operand or another symbol that has not been resolved yet.
In the past few years when we worked on porting Clang to Linux kernel ports, we worked around the limitation by updating the assembly code.
- ARM: 8971/1: replace the sole use of a symbol with its definition in 2020-04
- crypto: aesni - add compatibility with IAS in 2020-07
- powerpc/64/asm: Do not reassign labels in 2021-12
Relocation generation
You might want to read Relocation generation in assemblers first for the concept.
The linker relaxation framework created redundant relocations (which could be resolved instead) in a few scenarios, including
1 | .option norelax |
and
1 | call foo |
The issues are resolved with quite a few patches. The target-neutral relocation generation framework has been largely revamped to make this possible.