Both compiler developers and security researchers have built disassemblers. They often prioritize different aspects. Compiler toolchains, benefiting from direct contributions from CPU vendors, tend to offer more accurate and robust decoding. Security-focused tools, on the other hand, often excel in user interface design.
For quick disassembly tasks, rizin provides a convenient command-line interface.
1 | % rz-asm -a x86 -b 64 -d 4829c390 |
-a x86
can be omitted.
llvm-mc
Within the LLVM ecosystem, llvm-objdump serves as a drop-in
replacement for the traditional GNU objdump, leveraging instruction
information from LLVM's TableGen files
(llvm/lib/Target/*/*.td
). Another LLVM tool, llvm-mc, was
originally designed for internal testing of the Machine Code (MC) layer,
particularly the assembler and disassembler components. There are
numerous RUN: llvm-mc ...
tests within
llvm/test/MC
. Despite its internal origins, llvm-mc is
often distributed as part of the LLVM toolset, making it accessible to
users.
However, using llvm-mc for simple disassembly tasks can be cumbersome. It requires explicitly prefixing hexadecimal byte values with 0x:
1 | % echo 0x48 0x29 0xc3 0x90 | llvm-mc --triple=x86_64 --cdis --output-asm-variant=1 |
Let's break down the options used in this command:
--triple=x86_64
: This specifies the target architecture. If your LLVM build's default target triple is alreadyx86_64-*-*
, this option can be omitted.--output-asm-variant=1
: LLVM, like GCC, defaults to AT&T syntax for x86 assembly. This option switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntax if you prefer the Intel syntax in compiler toolchains.--cdis
: Introduced in LLVM 18, this option enables colored disassembly. In older LLVM versions, you have to use--disassemble
.
I have contributed patches to remove
.text
and allow disassembling
raw bytes without the 0x prefix. You can now use the
--hex
option:
1 | % echo 4829c390 | llvm-mc --cdis --hex --output-asm-variant=1 |
You can further simplify this by creating a bash/zsh function. bash and zsh's "here string" feature provides a clean way to specify stdin.
1 | disasm() { |
1 | % disasm 4829c390 |
The --hex
option conveniently ignores whitespace and
#
-style comments within the input.
Atomic blocks
llvm-mc handles decoding failures by skipping a number of bytes, as
determined by the target-specific
llvm::MCDisassembler::getInstruction
. To treat a sequence
of bytes as a single unit during disassembly, enclose them within
[]
.
1 | % echo '[f995ab99][f995ab99]' | llvm-mc --triple=riscv64 --cdis --hex |
llvm-mc can also function as an assembler:
1 | % echo 'li t3, 42' | llvm-mc -show-encoding --triple=riscv64 |
(I've contributed a change to LLVM 20 that removes the previously
printed .text
directive.)
llvm-objdump
For address information, llvm-mc falls short. We need to turn to
llvm-objdump to get that detail. Here is a little fish script that takes
raw hex bytes as input, converts them to a binary format
(xxd -r -p
), and then creates an ELF relocatable file
(llvm-objcopy -I binary
) targeting the x86-64 architecture.
Finally, llvm-objdump with the -D
flag disassembles the
data section (.data
) containing the converted binary.
1 | llvm-objdump -D -j .data $opt (echo $argv | xxd -r -p | llvm-objcopy -I binary -O elf64-x86-64 - - | psub) | sed '1,/<_binary__stdin__start>:/d' |
Here is a more feature-rich script that supports multiple architectures:
1 | #!/usr/bin/env fish |
1 | % ./disasm e8 00000000c3 e800000000 c3 |
Summary
- Asembler:
llvm-mc --show-encoding
- Disasembler:
llvm-mc --cdis --hex
- Disassembler with address information:
xxd -r -p
,llvm-objcopy
, andllvm-objdump -D -j .data