Updated in 2023-05.
There are two libraries for processing command line options in LLVM.
llvm/Support/ComandLine.h
See https://llvm.org/docs/CommandLine.html for documentation.
Global variables (mostly llvm::cl::opt<type>
, some
llvm::cl::list<type>
) are most common to represent
command-line options. The llvm::cl::opt
constructor
registers this command line option in a global registry. The program
calls llvm::cl::ParseCommandLineOptions(argc, argv, ...)
in
main
to parse the command line options. opt
supports various integer types, bool
,
std::string
, etc. Defining some specialization can support
support custom class/enum types.
1 | static cl::OptionCategory cat("split-file Options"); |
Beside functionality options, there are many internal command line options in LLVM.
- Having an option to select different code paths. Usually introduced when one feature is under development and is considered experimental.
- The functionality has been stable for a period of time, and the default value has been changed to true. In some cases, users who find a regression can set the option to false as a workaround.
- Provide more input to a pass for testing.
The llvm::cl
library is easy to use. Adding a new option
only requires adding a variable to a source file. The library provides
some icing on the cake, e.g. recommending options with similar spellings
in case of an incorrect spelling. But the customizability is poor. For
example:
- A
llvm::cl::opt<bool>
option accepts various input methods such as-v 0 -v=0 -v false -v=false -v=False
- It is inconvenient to support both
--long
and--no-long
at the same time. Occasionally, the workaround is to set a variable for--no-long
. If you want to deal with two options override each other, you must determine the relative position of the two options in the command line
User-oriented external tools often have such customization
requirements. The style of GNU getopt_long
is
--long
. -long
and --long
can be
mixed in llvm/Support/ComandLine.h
, and mandatory
--
is not supported for a long time.
LLVM binary utilities (llvm-nm, llvm-objdump, llvm-readelf, etc.) in
order to replace GNU binutils, Need to provide grouped short options in
the style of POSIX shell utilities (-ab
means
-a -b
). This feature is not supported for a long time,
which troubles users who want to migrate to LLVM binary utilities.
In addition, llvm::cl::opt
is a singleton, and local
variables can also be defined to dynamically increase options, but this
usage is rare (llvm-readobj and llvm-cov). There is also a very peculiar
usage. The legacy pass manager in the opt tool automatically obtains the
pass name list and registers a large number of global options.
To prevent errors, llvm::cl::opt
does not support
defining the same option multiple times. If you link both shared object
and archive LLVM libraries at the same time, a classic error will be
triggered:
1 | : CommandLine Error: Option'help-list' registered more than once! |
If you want to set the value of the llvm::cl::opt
variable in Clang, you can use -mllvm -option=value
. Use
ld.lld/LLVMgold.so Full/Thin LTO to set these option values, use
-plugin-opt=-option=value
(ld.lld can also use
-mllvm
).
llvm/Option/OptTable.h
This was originally developed for Clang. It was later moved to llvm
and adopted by many components such as llvm-objcopy, lld, and most
binutils counterparts (llvm-symbolizer, llvm-objdump, etc). OptTable
uses a domain-specific language (TableGen) to describe command line
options and generate a parser. The parsed options are organized into an
object, and each option is represented by an integer. It is easy to
check whether a pair of boolean options with variable default values
(--demangle --no-demangle
) are in effect:
Args.hasFlag(OPT_demangle, OPT_no_demangle, !IsAddr2Line)
.
Here is an example TableGen file:
1 | multiclass B<string name, string help1, string help2> { |
In the C++ source file, one writes:
1 | opt::InputArgList Args = parseOptions(argc, argv, IsAddr2Line, Saver, Tbl); |
Grouped short options
Note that GCC's command line options do not support grouped short options, so Clang does not implement it. Some binutils counterparts need the support, and I added grouped short options (D83639) in July 2020.
Anecdote: LLD uses this library to parse command-line options. GNU ld
actually supports grouped short options, for example,
ld.bfd -vvv
means -v -v -v
. I suggested that
GNU ld actually supports many -long
style options, and
supporting grouped short options can cause confusion.
1 | % touch an ommand':)' |
binutils 2.36 is expected to deprecate grouped short options :)
Target-specific options
In Clang, clang/include/clang/Driver/Options.td
declares
both generic and target-specific options.
On the plus side, if a useful feature is machine-specific in GCC, it is easy to implement it as a target-agnostic option in Clang.
On the down side, if it is an inherently target-specific option, it
is easy to forget to report an error for other targets. There will
usually be a -Wunused-command-line-argument
warning, but a
warning may not be good enough.
For example, GCC's powerpc port doesn't support -march=
,
but Clang incorrectly parsed and ignored it, resulting in a
-Wunused-command-line-argument
warning. https://reviews.llvm.org/D145141 changed the warning to
an error.
To prevent the aforementioned issues, when dealing with
target-specific options in
clang/include/clang/Driver/Options.td
, we should add the
ability to annotate compatible
clang::driver::ToolChain
.
Comparison with
getopt_long
Many users of getopt_long
use a switch and a large
number of cases to process command line options, and it is easy to
create various position dependent behaviors. For large build systems,
sometimes it is not clear where the compiler/linker options are added,
and some position dependent behaviors are quite annoying.
The Args.getLastArgValue(..., ...)
pattern widely used
in Clang has a limitation. For options that take a value, we typically
verify just the last option, ignoring previous options. For example,
invalid option values non-last options in
clang -ftls-model=xxx -ftls-model=initial-exec
and
clang -mstack-protector-guard=xxx -mstack-protector-guard=global
cannot be detected.
中文版
题外话:不知不觉,达成了llvm-project 1900 commits的成就。
LLVM中命令行选项的处理有两个库。
llvm/Support/ComandLine.h
文档参见https://llvm.org/docs/CommandLine.html
简单来说,用全局变量(llvm::cl::opt<type> var
最常见,也有llvm::cl::list
等)表示命令行选项。opt
的构造函数会在一个全局的registry中注册这个命令行选项。
在main
中调用llvm::cl::ParseCommandLineOptions(argc, argv, ...)
解析命令行。
opt
支持很多类型,如各种integer
types、bool、std::string等,还支持自定义enum类型。
1 | static cl::OptionCategory cat("split-file Options"); |
LLVM中有很多开发者使用的命令行选项,除了功能选项外,还有:
- 对某一pass有较大改动,in-tree开发时为了防止衰退,设置一个预设为false的enable变量
- 一段时间功能稳定,把预设值改为true。在某些场合下发现衰退的用户可以使用false作为workaround
- 给一个pass提供更多输入,用于测试
这个库使用便捷,添加一个新选项只需要在一个局部文件中加一个变量。还提供了一些锦上添花的小功能,如推荐拼写接近的选项。 但命令行解析的定制性很弱。比如:
- 一个
cl::opt<bool>
选项接受-v 0 -v=0 -v false -v=false -v=False
等多种输入方式 - 不便同时支持
--long
和--no-long
。偶尔有需求时的workaround是给--no-long
也设置一个变量。假如要处理两个选项互相override,就要判断两个选项在命令行中的相对位置
面向用户的外部工具往往有这类定制需求。GNU
getopt_long
的风格是--long
。
llvm/Support/ComandLine.h
里-long
和--long
可以混用,很长一段时间不支持强制--
。
LLVM binary utilities
(llvm-nm、llvm-objdump、llvm-readelf等)为了替代GNU binutils,
需要提供POSIX shell utilities风格的grouped short options
(-ab
表示-a -b
)。
很长一段时间这个功能不被支持,困扰了想要迁移到LLVM binary
utilities的用户。
另外cl::opt
是singleton,也可以定义局部变量动态增加选项,但这种用法很少见(llvm-readobj和llvm-cov)。
还有个很奇特的用法,opt工具中legacy pass manager自动获取pass
name列表,并注册大量全局选项。
为了防止错误,cl::opt
不支持多次定义同一个选项。如果同时链接了shared
object和archive两种LLVM库,就会触发经典错误:
1 | : CommandLine Error: Option 'help-list' registered more than once! |
在Clang里如果要设置cl::opt
变量的值,可以用-mllvm -option=value
。使用ld.lld/LLVMgold.so
Full/Thin
LTO也可以设置这些选项值,用-plugin-opt=-option=value
(ld.lld也可用-mllvm
)。
llvm/Option/OptTable.h
原先给Clang开发,后来移入llvm,被llvm-objcopy、lld、llvm-symbolizer等采用。
用一个domain-specific language
(TableGen)描述选项,生成一个parser。解析过的选项组织成一个object,每个选项用一个integer表示。
检查一对预设值不定的boolean选项(--demangle --no-demangle
)是否生效很容易:Args.hasFlag(OPT_demangle, OPT_no_demangle, !IsAddr2Line)
。
1 | multiclass B<string name, string help1, string help2> { |
1 | opt::InputArgList Args = parseOptions(argc, argv, IsAddr2Line, Saver, Tbl); |
注意GCC的命令行选项不支持grouped short options,因此Clang也没有需求。很长一段时间因为缺少这个功能限制了它的使用场景。我在2020年7月加入了grouped short options (D83639)。
轶闻:LLD采用这个库解析命令行选项。GNU ld实际上支持grouped short
options,比如ld.bfd -vvv
表示-v -v -v
。我提出GNU
ld实际上支持很多-long
风格的选项,再支持grouped short
options容易引起混乱。
1 | % touch an ommand ':)' |
binutils 2.36有望deprecate grouped short options:)
再拓展一下,很多getopt_long
用户用一个switch加大量case处理命令行选项,很容易弄出各种各样position
dependent行为。 对于大型build system,有时候搞不清楚compiler/linker
options是在什么地方添加的,有些position dependent行为挺讨厌的。