# Function multi-versioning

GCC supports some function attributes for function multi-versioning: a way for a function to have multiple implementations, each using a different set of ISA extensions. A function attribute specifies different requirements of ISA extensions. The generated program decodes the CPU model and features at run-time, and picks the most restrictive implementation that is satisfied by the CPU, assuming that the most restrictive implementation has the best performance.

## __attribute__((target(...)))

__attribute__((target(...))) has been available for a long time, even before attributes for function multi-versioning were introduced. Here are some links to relevant documentation.

Usually we use different function names for different implementations and define a dispatch function. This approach is like a manual ifunc.

The function bodies are duplicated. We can define a [[gnu::always_inline]] function shared by the different implementations.

Let's check the behavior of an external linkage. In C++ mode, GCC and Clang emit two symbols _Z3foov and _Z3foov.sse4.2 for the following program:

In C mode, GCC reports error: redefinition of ‘foo’. Clang emits two symbols foo and foo.see4.2.

With more than one declaration, the compiler merges the attributes.

## __attribute__((target_clones(...)))

This is the first attribute that GCC introduced to convenient function multi-versioning. Since GCC 6, we can just define one function with the attribute specifying all supported targets.

See the GCC doc (Common Function Attributes) and Attributes in Clang#target_clones. Clang only supports some basic forms, not arch=.

For the above function, GCC emits three implementations: foo.default, foo.arch_x86_64_v2, and foo.arch_x86_64_v3. foo is a dispatch function that selects one of the implementations. This is implemented as a GNU indirect function (ifunc). The ifunc resolver is called once by rtld at the relocation resolving phase. The resolver references a function and a variable defined in the runtime (libgcc).

The attribute can apply to a non-definition declaration. foo.default, foo.arch_x86_64_v2, and foo.arch_x86_64_v3 are undefined symbols while (GCC: foo, Clang: foo.ifunc) and foo.resolver remain as definitions.

In llvm-project, compiler-rt provides an alternative implementation.

### Drawbacks

Compilers largely don't know the semantics of ifunc and are very conservative. Ifunc defeats most interprocedural optimizations. We can see that the target_clones function foo is not inlined into foo_plus_1. Fortunately, functions called by a target_clones function are still inlinable.

An ifunc call needs a PLT entry, regardless of whether it is preemptive or not. On the contrary, a non-preemptive function does not need a PLT entry.

### x86

The runtime executes cpuid, extracts information about the x86 family model and available CPU features, and stores them into __cpu_model and __cpu_features2. The resolver decodes the information and selects the best implementation.

### AArch64

The support is missing/incomplete as of GCC 12 and Clang 16.0. When implemented, + separated features can be specified.

(compiler-rt/lib/builtins/cpu_model.c defines some symbols like __aarch64_have_lse_atomics. GCC commit)

## __attribute__((cpu_dispatch(...))) and __attribute__((cpu_specific(...)))

Supported by Intel C++ Compiler and later ported to Clang. GCC doesn't support the two attributes. They feel like legacy and are a subset of target_clones.

The declaration and definition can be in different translation units like target_clones, but different attributes are used.

## __attribute__((target_version(...)))

Arm C Language Extensions introduced a new GNU attribute target_version.

The semantics are not very clear in the latest Clang. GCC does not support the attribute as of 2023-02.