Updated in 2023-12.
GCC supports some function attributes for function multi-versioning: a way for a function to have multiple implementations, each using a different set of ISA extensions. A function attribute specifies different requirements of ISA extensions. The generated program decodes the CPU model and features at run-time, and picks the most restrictive implementation that is satisfied by the CPU, assuming that the most restrictive implementation has the best performance.
__attribute__((target(...)))
__attribute__((target(...)))
has been available for a
long time, even before attributes for function multi-versioning were
introduced. Here are some links to relevant documentation.
- https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#:~:text=target%20(
- Attributes in Clang#target
- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html
- https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
Usually we use different function names for different implementations
and define a dispatch function. This approach is like a manual ifunc.
1
2
3
4
5
6
7
8
9
10
11extern int flags;
static __attribute__((target("default"))) int foo_default(int a) { return a & a-1; }
static __attribute__((target("arch=x86-64-v2"))) int foo_v2(int a) { return a & a-1; }
static __attribute__((target("arch=x86-64-v3"))) int foo_v3(int a) { return a & a-1; }
int foo(int a) {
if (flags & 2) return foo_v3(a);
if (flags & 1) return foo_v2(a);
return foo_default(a);
}
The function bodies are duplicated. We can define a
[[gnu::always_inline]]
function shared by the different
implementations. 1
2
3
4__attribute__((always_inline)) static inline foo_impl(int a) { return a & a-1; }
static __attribute__((target("default"))) int foo_default(int a) { return foo(a); }
static __attribute__((target("arch=x86-64-v2"))) int foo_v2(int a) { return foo(a); }
static __attribute__((target("arch=x86-64-v3"))) int foo_v3(int a) { return foo(a); }
Let's check the behavior of an external linkage. In C++ mode, GCC and
Clang emit two symbols _Z3foov
and
_Z3foov.sse4.2
for the following program: 1
2__attribute__((target("default"))) int foo(void) { return 0; }
__attribute__((target("sse4.2"))) int foo(void) { return 1; }
In C mode, GCC reports error: redefinition of ‘foo’
.
Clang emits two symbols foo
and
foo.see4.2
.
With more than one declaration, the compiler merges the attributes.
1
2
3
4
5
6
7int foo(void);
__attribute__((target("avx2"))) int foo(void) { return 0; }
//---
__attribute__((target("avx2"))) int foo(void);
int foo(void) { return 0; }
riscv-c-api-doc defined
the target
attribute in 2023-11.
__attribute__((target_clones(...)))
GCC added this
attribute to convenient function multi-versioning. Since GCC 6, we
can just define one function with the attribute specifying all supported
targets. GCC 12 implements
arch=x86-64
, arch=x86-64-v2
,
arch=x86-64-v3
, and arch=x86-64-v4
for
micro-architecture levels. Clang implemented target_clones
in 2021. I added support for
x86 micro-architecture levels for Clang 18.
1 | // b.c |
See the GCC doc (Common Function Attributes) and Attributes in Clang#target_clones.
For the above function, GCC emits three implementations:
foo.default
, foo.arch_x86_64_v2
, and
foo.arch_x86_64_v3
. foo
is a dispatch function
that selects one of the implementations. This is implemented as a GNU indirect function
(ifunc). The ifunc resolver is called once by rtld at the relocation
resolving phase. The resolver calls __cpu_indicator_init
and inspects features bits from __cpu_model
and/or
__cpu_features2
.
1 | .section .text.foo.resolver,"axG",@progbits,foo.resolver,comdat |
The target_clones
applies to non-definition
declarations. foo.default
, foo.arch_x86_64_v2
,
and foo.arch_x86_64_v3
are undefined symbols while (GCC:
foo
, Clang: foo.ifunc
) and
foo.resolver
remain as definitions. 1
2
3
4// a.c
__attribute__((target_clones("default","arch=x86-64-v2","arch=x86-64-v3")))
int foo(int a);
int main(void) { foo(0); }
Drawbacks
Compilers largely don't know the semantics of ifunc and are very
conservative. Ifunc defeats most interprocedural optimizations (feature
request to enable more). We can see that the
target_clones
function foo
is not inlined into
foo_plus_1
. Fortunately, functions called by a
target_clones
function are still inlinable.
An ifunc call needs a PLT entry, regardless of whether it is preemptive or not. On the contrary, a non-preemptive function does not need a PLT entry.
x86
libgcc.a(cpuinfo.o)
defines __cpu_model
and
__cpu_features2
. __cpu_indicator_init
executes
cpuid
, extracts information about the x86 family model and
available CPU features, and stores them into __cpu_model
and __cpu_features2
. The resolver decodes the information
and selects the best implementation.
In libgcc.a
and libclang_rt.builtins.a
,
__cpu_model
and __cpu_features2
have the
hidden visibility, therefore a process may have multiple copies from the
main executable and loaded shared objects.
In llvm-project, compiler-rt/lib/builtins/cpu_model.c
provides an alternative implementation.
AArch64
The support is missing/incomplete as of GCC 12 and Clang 16.0. When
implemented, +
separated features can be specified.
1 | __attribute__((target_clones("sha2+memtag2", "fcma+sve2-pmull128"))) |
(compiler-rt/lib/builtins/cpu_model.c
defines some
symbols like __aarch64_have_lse_atomics
. GCC
commit)
__attribute__((cpu_dispatch(...)))
and __attribute__((cpu_specific(...)))
Supported by Intel C++ Compiler and later ported to Clang. GCC
doesn't support the two attributes. They feel like legacy and are a
subset of target_clones
.
The declaration and definition can be in different translation units
like target_clones
, but different attributes are used.
1 | echo '__attribute__((cpu_dispatch(ivybridge, atom, sandybridge))) void foo(void); int main(void) { foo(); }' > a.c |
__attribute__((target_version(...)))
Arm C Language Extensions introduced a new GNU attribute
target_version
.
- https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
- Attributes in Clang#target_version
1 | int __attribute__((target_version("default"))) tv(void) { return 0; } |
This feature requires --rtlib=compiler-rt
in Clang. GCC
does not support the attribute as of 2023-08.