Simplifying disassembly with llvm-mc

Both compiler developers and security researchers have built disassemblers. They often prioritize different aspects. Compiler toolchains, benefiting from direct contributions from CPU vendors, tend to offer more accurate and robust decoding. Security-focused tools, on the other hand, often excel in user interface design.

For quick disassembly tasks, rizin provides a convenient command-line interface.

1
2
3
% rz-asm -a x86 -b 64 -d 4829c390
sub rbx, rax
nop

-a x86 can be omitted.

Within the LLVM ecosystem, llvm-objdump serves as a drop-in replacement for the traditional GNU objdump, leveraging instruction information from LLVM's TableGen files (llvm/lib/Target/*/*.td). Another LLVM tool, llvm-mc, was originally designed for internal testing of the Machine Code (MC) layer, particularly the assembler and disassembler components. There are numerous RUN: llvm-mc ... tests within llvm/test/MC. Despite its internal origins, llvm-mc is often distributed as part of the LLVM toolset, making it accessible to users.

However, using llvm-mc for simple disassembly tasks can be cumbersome. It requires explicitly prefixing hexadecimal byte values with 0x:

1
2
3
4
% echo 0x48 0x29 0xc3 0x90 | llvm-mc --triple=x86_64 --cdis --output-asm-variant=1
.text
sub rbx, rax
nop

Let's break down the options used in this command:

  • --triple=x86_64: This specifies the target architecture. If your LLVM build's default target triple is already x86_64-*-*, this option can be omitted.
  • --output-asm-variant=1: LLVM, like GCC, defaults to AT&T syntax for x86 assembly. This option switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntax if you prefer the Intel syntax in compiler toolchains.
  • --cdis: Introduced in LLVM 18, this option enables colored disassembly. In older LLVM versions, you have to use --disassemble.

I have contributed patches to remove .text and allow disassembling raw bytes without the 0x prefix. You can now use the --hex option:

1
2
3
% echo 4829c390 | llvm-mc --cdis --hex --output-asm-variant=1
sub rbx, rax
nop

You can further simplify this by creating a shell alias:

1
alias disasm="llvm-mc --cdis --hex --output-asm-variant=1"

bash and zsh's "here string" feature provides a clean way to specify stdin.

1
2
3
4
5
6
% disasm <<< 4829c390
sub rbx, rax
nop
% disasm <<< $'4829 c3\n# comment\n90'
sub rbx, rax
nop

The --hex option conveniently ignores whitespace and #-style comments within the input.