GNU indirect function

GNU indirect function (ifunc) is a mechanism making a direct function call resolve to an implementation picked by a resolver. It is mainly used in glibc but has adoption in FreeBSD.

For some performance critical functions, e.g. memcpy/memset/strcpy, glibc provides multiple implementations optimized for different architecture levels. The application just uses memcpy(...) which compiles to call memcpy. The linker will create a PLT for memcpy and produce an associated special dynamic relocation referencing the resolver symbol/address. During relocation resolving at runtime, the return value of the resolver will be placed in the GOT entry and the PLT entry will load the address.

On Mach-O, there is similar feature, N_SYMBOL_RESOLVER.

Representation

ifunc has a dedicated symbol type STT_GNU_IFUNC to mark it different from a regular function (STT_FUNC). The value 10 is in the OS-specific range (10~12). readelf -s tell you that the symbol is ifunc if OSABI is ELFOSABI_GNU or ELFOSABI_FREEBSD.

On Linux, by default GNU as uses ELFOSABI_NONE (0). If ifunc is used, the OSABI will be changed to ELFOSABI_GNU. Similarly, GNU ld sets the OSABI to ELFOSABI_GNU if ifunc is used. gold does not do this PR17735.

Things are loose in LLVM. The integrated assembler and LLD do not set ELFOSABI_GNU. Currently the only problem I know is the readelf -s display. Everything else works fine.

Assembler behavior

In assembly, you can assign the type STT_GNU_IFUNC to a symbol via .type foo, @gnu_indirect_function. An ifunc symbol is typically STB_GLOBAL.

In the object file, st_shndx and st_value of an STT_GNU_IFUNC symbol indicate the resolver. After linking, if the symbol is still STT_GNU_IFUNC, its st_value field indicates the resolver address in the linked image.

Assemblers usually convert relocations referencing a local symbol to reference the section symbol, but this behavior needs to be inhibited for STT_GNU_IFUNC.

Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat > b.s <<e
.global ifunc
.type ifunc, @gnu_indirect_function
.set ifunc, resolver

resolver:
leaq impl(%rip), %rax
ret

impl:
movq $42, %rax
ret
e

cat > a.c <<e
int ifunc(void);
int main() { return ifunc(); }
e

cc a.c b.s
./a.out # exit code 42

GNU as makes transitive aliases to an STT_GNU_IFUNC ifunc as well.

1
2
3
4
5
.type foo,@gnu_indirect_function
.set foo, foo_resolver

.set foo2, foo
.set foo3, foo2

GCC and Clang support a function attribute which emits .type ifunc, @gnu_indirect_function; .set ifunc, resolver:

1
2
3
static int impl(void) { return 42; }
static void *resolver(void) { return impl; }
void *ifunc(void) __attribute__((ifunc("resolver")));

Preemptible ifunc

A preemptible ifunc call is no different from a regular function call from the linker perspective.

The linker creates a PLT entry, reserves an associated GOT entry, and emits an R_*_JUMP_SLOT relocation resolving the address into the GOT entry. The PLT code sequence is the same as a regular PLT for STT_FUNC.

If the ifunc is defined within the module, the symbol type in the linked image is STT_GNU_IFUNC, otherwise (defined in a DSO), the symbol type is STT_FUNC.

The difference resides in the loader.

At runtime, the relocation resolver checks whether the R_*_JUMP_SLOT relocation refers to an ifunc. If it does, instead of filling the GOT entry with the target address, the resolver calls the target address as an indirect function, with ABI specified additional parameters (hwcap related), and places the return value into the GOT entry.

Non-preemptible ifunc

The non-preemptible ifunc case is where all sorts of complexity come from.

First, the R_*_JUMP_SLOT relocation type cannot be used in some cases:

  • A non-preemptible ifunc may not have a dynamic symbol table entry. It can be local. It can be defined in the executable without the need to export.
  • A non-local STV_DEFAULT symbol defined in a shared object is by default preemptible. Using R_*_JUMP_SLOT for such a case will make the ifunc look like preemptible.

Therefore a new relocation type R_*_IRELATIVE was introduced. There is no associated symbol and the address indicates the resolver.

1
2
3
R_*_RELATIVE: B + A
R_*_IRELATIVE: call (B + A) as a function
R_*_JUMP_SLOT: S

When an R_*_JUMP_SLOT can be used, there is a trade-off between R_*_JUMP_SLOT and R_*_IRELATIVE: an R_*_JUMP_SLOT can be lazily resolved but needs a symbol lookup. Currently powerpc can use R_PPC64_JMP_SLOT in some cases PR27203.

A PLT entry is needed for two reasons:

  • The call sites emit instructions like call foo. We need to forward them to a place to perform the indirection. Text relocations are usually not an option (exception: {ifunc-noplt}).
  • If the ifunc is exported, we need a place to mark its canonical address.

Such PLT entries are sometimes referred to as IPLT. They are placed in the synthetic section .iplt. In GNU ld, .iplt will be placed in the output section .plt. In LLD, I decided that .iplt is better https://reviews.llvm.org/D71520.

On many architectures (e.g. AArch64/PowerPC/x86), the PLT code sequence is the same as a regular PLT, but it could be different.

On x86-64, the code sequence is:

1
2
3
jmp *got(%rip)
pushq $0
jmp .plt

Since there is no lazy binding, pushq $0; jmp .plt are not needed. However, to make all PLT entries of the same shape to simplify linker implementations and facilitate analyzers, it is find to keep it this way.

PowerPC32 -msecure-plt IPLT

As a design to work around the lack of PC-relative instructions, PowerPC32 uses multiple GOT sections, one per file in .got2. To support multiple GOT pointers, the addend on each R_PPC_PLTREL24 reloc will have the offset within .got2.

-msecure-plt has small/large PIC differences.

  • -fpic/-fpie: R_PPC_PLTREL24 r_addend=0. The call stub loads an address relative to _GLOBAL_OFFSET_TABLE_.
  • -fPIC/-fPIE: R_PPC_PLTREL24 r_addend=0x8000. (A partial linked object file may have an addend larger than 0x8000.) The call stub loads an address relative to .got2+0x8000.

If a non-preemptible ifunc is referenced in two object files, in -pie/-shared mode, the two object files cannot share the same IPLT entry. When I added non-preemptible ifunc support for PowerPC32 to LLD https://reviews.llvm.org/D71621, I did not handle this case.

.rela.dyn vs .rela.plt

LLD placed R_*_IRELATIVE in the .rela.plt section because many ports of GNU ld behaved this way for function calls. This turns out to be used to make PLT calls in an ifunc resolver work. See below.

While implementing ifunc for PowerPC, I noticed that GNU ld powerpc actually places R_*_IRELATIVE in .rela.dyn and glibc powerpc does not actually support R_*_IRELATIVE in .rela.plt. This makes a lot of sense to me because .rela.plt normally just contains R_*_JUMP_SLOT which can be lazily resolved. ifunc relocations need to be eagerly resolved so .rela.plt was a misplace. Therefore I changed LLD to use .rela.dyn in https://reviews.llvm.org/D65651.

__rela_iplt_start and __rela_iplt_end

BZ #27164

A statically linked position dependent executable traditionally had no dynamic relocations. However, ifunc may lead to R_*_IRELATIVE relocations.

In glibc, csu/libc-start.c has undefined weak references of __rela_iplt_start/__rela_iplt_end. In static non-pie mode, the loader resolves ifunc relocations within the range [__rela_iplt_start,__rela_iplt_end). In static pie mode, however, static pie uses self-relocation (_dl_relocate_static_pie) to take care of R_*_IRELATIVE. It is expected that __rela_iplt_start==__rela_iplt_end, otherwise some ifunc relocations may be repeatly applied, causing SIGSEGV from ARCH_SETUP_IREL.

GNU ld and gold define __rela_iplt_start in -no-pie mode, but not in -pie mode. For {gcc,clang} -fuse-ld=bfd -static-pie -fpie built static pie, the scheme above works because GNU ld leaves __rela_iplt_start=__rela_iplt_end=0 as unresolved weak symbols. Before 13.0.0, LLD defined __rela_iplt_start/__rela_iplt_end for -pie and therefore broke glibc loader's assumption, causing a program to crash.

I made a good argument and also pointed out that the -no-pie special case caused an unneeded difference in the output of diff -u =(ld.bfd --verbose) =(ld.bfd -pie --verbose). My arguments were dismissed by "Ulrich and I designed/implemented IFUNC on x86 and for x86. I consider x86 implementation of IFUC as the gold standard."

In the end, I conceded and changed LLD to only define the two encapsulation symbols for -no-pie with grumbling. Otherwise, {gcc,clang} -fuse-ld=lld -static-pie produced static pie would crash, even if the used glibc was built with GNU ld.

Interestingly, Android bionic turned out to rely on defined __rela_iplt_start/__rela_iplt_end for static pie. https://r.android.com/1809796 removed the reliance.

Address significance

A non-GOT-generating non-PLT-generating relocation referencing a STT_GNU_IFUNC indicates a potential address-taken operation.

With a function attribute, the compilers knows that a symbol indicates an ifunc and will avoid generating such relocations. With assembly such relocations may be unavoidable.

In most cases the linker needs to convert the symbol type to STT_FUNC and create a special PLT entry, which is called a "canonical PLT entry" in LLD. References from other modules will resolve to the PLT entry to keep pointer equality: the address taken from the defining module should match the address taken from another module.

This approach has pros and cons:

  • With a canonical PLT entry, the resolver of a symbol is called only once. There is exactly one R_*_IRELATIVE relocation.
  • If the relocation appears in a non-SHF_WRITE section, a text relocation can be avoided.
  • Relocation types which are not valid dynamic relocation types are supported. GNU ld may error relocation R_X86_64_PC32 against STT_GNU_IFUNC symbol `ifunc' isn't supported
  • References will bind to the canonical PLT entry. A function call needs to jump to the PLT, loads the value from the GOT, then does an indirect call.

For a symbolic relocation type (a special case of absolute relocation types where the width matches the word size) like R_X86_64_64, when the addend is 0 and the section has the SHF_WRITE flag, the linker can emit an R_X86_64_IRELATIVE. https://reviews.llvm.org/D65995 dropped the case.

For the following example, GNU ld linked a.out calls fff_resolver three times while LLD calls it once.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// RUN: split-file %s %t
// RUN: clang -fuse-ld=bfd -fpic %t/dso.c -o %t/dso.so --shared
// RUN: clang -fuse-ld=bfd %t/main.c %t/dso.so -o %t/a.out
// RUN: %t/a.out

//--- dso.c
typedef void fptr(void);
extern void fff(void);

fptr *global_fptr0 = &fff;
fptr *global_fptr1 = &fff;

//--- main.c
#include <stdio.h>

static void fff_impl() { printf("fff_impl()\n"); }
static int z;
void *fff_resolver() { return (char *)&fff_impl + z++; }

__attribute__((ifunc("fff_resolver"))) void fff();
typedef void fptr(void);
fptr *local_fptr = fff;
extern fptr *global_fptr0, *global_fptr1;

int main() {
printf("local %p global0 %p global1 %p\n", local_fptr, global_fptr0, global_fptr1);
return 0;
}

-z ifunc-noplt

Mark Johnston introduced -z ifunc-noplt for FreeBSD https://reviews.llvm.org/D61613. With this option, all relocations referencing STT_GNU_IFUNC will be emitted as dynamic relocations (if .dynsym is created). The canonical PLT entry will not be used.

Miscellaneous

GNU ld has implemented a diagnostic ("i686 ifunc and non-default symbol visibility") to flag R_386_PC32 referencing non-default visibility ifunc in -pie and -shared links. This diagnostic looks like the most prominent reason blocking my proposal to use R_386_PLT32 for call/jump foo. See Copy relocations, canonical PLT entries and protected visibility for details.

https://sourceware.org/glibc/wiki/GNU_IFUNC misses a lot of information. There are quite a few arch differences. I asked for clarification https://sourceware.org/pipermail/libc-alpha/2021-January/121752.html

Dynamic loader

In glibc, _dl_runtime_resolver needs to save and restore vector and floating point registers. ifunc resolvers add another reason that _dl_runtime_resolver cannot only use integer registers. (The other reasons are that ld.so has string function calls which may use vectors and external calls to libc.so.)

Relocation resolving order

R_*_IRELATIVE relocations are resolved eagerly. In glibc, there used to be a problem where ifunc resolvers ran before GL(dl_hwcap) and GL(dl_hwcap2) were set up https://sourceware.org/bugzilla/show_bug.cgi?id=27072.

For the relocation resolver, the main executable needs to be processed the last to process R_*_COPY. Without ifunc, the resolving order of shared objects can be arbitrary.

For ifunc, if the ifunc is defined in a processed module, it is fine. If the ifunc is defined in an unprocessed module, it may crash.

For an ifunc defined in an executable, calling it from a shared object can be problematic because the executable's relocations haven't been resolved. The issue can be circumvented by converting the non-preemptible ifunc defined in the executable to STT_FUNC. GNU ld's x86 port made the change PR23169.


Let's see a case where an ifunc resolver has PLT calls.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > a.c <<eof
#include <stdio.h>

int a_impl() { return 42; }
void *a_resolver() {
puts("a_resolver");
return (void *)a_impl;
}
int a() __attribute__((ifunc("a_resolver")));

// .rela.dyn.rel => R_X86_64_64 referencing STT_GNU_IFUNC in .rela.dyn
int (*fptr_a)() = a;

int main() { printf("%d\n", a()); b(); }
eof

cc -fpie -c a.c
cc -fuse-ld=bfd -pie a.o -o a

GNU ld produces an R_X86_64_IRELATIVE in .rela.dyn. In lazy PLT mode, glibc ld.so will call the ifunc resolver before the R_X86_64_JUMP_SLOT for puts is set up, and segfault.


FreeBSD rtld uses multiple phases:

  • Resolve non-COPY non-IRELATIVE non-STT_GNU_IFUNC relocations in all objects. Record what ifunc relocations categories have appeared.
  • Resolve COPY relocations
  • Initialize states needed by ifunc resolvers
  • Prepare a list used to call init functions (if A depends on B, B is ordered before A)
  • Resolve relocations in the init order
    • Resolve IRELATIVE relocations
    • Resolve other .rela.dyn relocations referencing STT_GNU_IFUNC (mostly absolute and GLOB_DAT relocations)
    • If neither LD_BIND_NOW nor DF_1_NOW, resolve JUMP_SLOT relocations referencing STT_GNU_IFUNC

In the lazy binding mode, when a JUMP_SLOT relocation is called, the PLT trampoline calls the ifunc resolver.

The approach turns out to be very robust. Many segfault examples on glibc work on FreeBSD.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
cat > ./a.c <<eof
#include <stdio.h>

int a_impl() { return 42; }
void *a_resolver() {
puts("a_resolver");
return (void *)a_impl;
}
int a() __attribute__((ifunc("a_resolver")));

int (*fptr_a)() = a;
int b(); extern int (*fptr_b)();
int c(); extern int (*fptr_c)();

int main() {
printf("%d\n", a());
b();
printf("b: %p %p\n", fptr_b, b);
printf("c: %p %p\n", fptr_c, c);
}
eof
cat > ./b.c <<eof
#include <stdio.h>

void c();
int b_impl() { return 42; }
void *b_resolver() {
puts("b_resolver");
c();
return (void *)b_impl;
}
int b() __attribute__((ifunc("b_resolver")));
int (*fptr_b)() = b;
eof
cat > ./c.c <<eof
#include <stdio.h>

int d();
int c_impl() { return 42; }
void *c_resolver() {
puts("c_resolver");
d();
return (void *)c_impl;
}
int c() __attribute__((ifunc("c_resolver")));
int (*fptr_c)() = c;
eof
cat > ./d.c <<eof
#include <stdio.h>

int d_impl() { return 42; }
void *d_resolver() {
puts("d_resolver");
return (void *)d_impl;
}
int d() __attribute__((ifunc("d_resolver")));
int (*fptr_d)() = d;
eof
cat > ./Makefile <<'eof'
a: a.c b.so c.so d.so
${CC} -g -fpie a.c -Wl,--no-as-needed ./b.so ./c.so ./d.so -pie -ldl -o $@

.SUFFIXES: .so
.c.so:
${CC} -g -fpic $< -shared -o $@
eof

Note that in b.so, the ifunc resolver has a PLT call to c.so. In c.so, the ifunc resolver has a PLT call to d.so. The ifunc resolvers thus have dependencies. In real world applications, d called by other ifunc resolvers may be some performance critical functions like memset.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
% ./a
# resolve d.so in init order
d_resolver # R_X86_64_64

# resolve c.so in init order
c_resolver # R_X86_64_64
d_resolver # triggered lazy JUMP_SLOT

# resolve b.so in init order
b_resolver # R_X86_64_64
c_resolver # triggered lazy JUMP_SLOT

# resolve a in init order
a_resolver # R_X86_64_IRELATIVE
b_resolver # R_X86_64_GLOB_DAT
c_resolver # R_X86_64_GLOB_DAT

42

b_resolver # lazy JUMP_SLOT

b: 0x80106f670 0x80106f670
c: 0x801073670 0x801073670

% LD_BIND_NOW=1 ./a
# resolve d.so in init order
d_resolver

# resolve c.so in init order
d_resolver # eager JUMP_SLOT
c_resolver # R_X86_64_64

# resolve b.so in init order
c_resolver # eager JUMP_SLOT
b_resolver # R_X86_64_64

# resolve a in init order
a_resolver # R_X86_64_IRELATIVE
b_resolver # eager JUMP_SLOT
b_resolver # R_X86_64_GLOB_DAT
c_resolver # R_X86_64_GLOB_DAT

42
b: 0x80106f670 0x80106f670
c: 0x801073670 0x801073670

When a is built with -no-pie -fno-pic, copy relocataions and canonical PLT entries are used. R_X86_64_64 relocations in b.so and c.so are bound to canonical PLT entries, so there are fewer resolver calls.

1
2
3
4
5
6
7
8
d_resolver
a_resolver
42
b_resolver
c_resolver
d_resolver
c: 0x201ca0 0x201ca0
b: 0x201c90 0x201c90

We can let b.so depend on c.so, or let c.so depend on b.so, or swap the link order of ./b.so and ./c.so. Still works.

On glibc, https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/fw/bug21041 can improve robustness of ifunc resolvers.