UNDER CONSTRUCTION
This article describes target-specific details about AArch32 in ELF linkers. I described AArch64 in a previous article.
AArch32 is the 32-bit execution state for the Arm architecture and runs the A32 and T32 instruction sets. A32 refers to the old ISA with a 32-bit fixed width, while T32 refers to the mixed 16-bit and 32-bit Thumb2 instructions.
"AArch32", "A32", and "T32" are new names. Many projects use "ARM", "Arm", or "arm" as their port name.
ABI documents
- ELF for the Arm® Architecture
- Procedure Call Standard for the Arm® Architecture
- C++ ABI for the Arm® Architecture
- Exception Handling ABI for the Arm® Architecture
Global Offset Table
The Global Offset Table consists of two sections:
.got.plt
holds code addresses for PLT..got
holds other addresses and offsets.
The symbol _GLOBAL_OFFSET_TABLE_
is defined at the
beginning of the .got
section. GNU ld reserves a single
entry for .got
and .got[0]
holds the link-time
address of _DYNAMIC
for a legacy reason Versions of glibc
prior to 2.35 have the _DYNAMIC
requirement. See All
about Global Offset Table.
Procedure Linkage Table
The PLT header looks like:
1 | L1: str lr, [sp, #-4]! |
If .git.plt-.plt-4
exceeds the +-128MiB range, a
long-form PLT header is needed.
Cortex-M Security Extensions
--cmse-implib
This option is for linker support for the Cortex-M Security Extensions (CMSE). It does two jobs:
- synthesize secure gateway veneers
- write a CMSE import library when
--out-implib=
is specified
If a non-local symbol __acle_se_$sym
is present, report
an error if $sym
is not defined. Otherwise
$sym
is considered a secure gateway veneer. Both
__acle_se_$sym
and $sym
must be a non-absolute
function with an odd st_value
.
If the addresses of __acle_se_$sym
and $sym
are not equal, the linker considers that there is an inline secure
gateway and doesn't do anything special; otherwise the linker
synthesizes a secure gateway veneer in a special section
.gnu.sgstubs
with the following logic.
The linker allocates an input section in .gnu.sgstubs
and defines $sym
relative to it. In the output file,
$sym
is moved to .gnu.sgstubs
, a different
text section. 1
2
3
4
5<.gnu.sgstubs>:
...
$sym:
sg
b.w __acle_se_$sym
If --in-implib
is specified and the library defines
$sym
(say the address is $addr
), in the output
$sym
has a fixed address of $addr
. Otherwise,
the linker assigns an address (larger than all synthesized secure
gateway veneers with fixed addresses).
--out-implib=out.lib
Used with --cmse-implib
. Write the CMSE import library
to out.lib
.
out.lib
will have 3 sections:
.symtab, .strtab, .shstrtab
. For every synthesized Secure
Gateway veneer, write a SHN_ABS
symbol whose address is
$addr
(if specified by the --in-implib
library) or the linker-assigned address. (The CMSE import library does
not contain text sections, so a defined symbol has to use
SHN_ABS
.)
Thread Local Storage
AArch32 uses a variant of TLS Variant I: the static TLS blocks are placed above the thread pointer. The thread pointer points to the end of the thread control block.
The linker doesn't perform TLS optimization.
The traditional general dynamic and local dynamic TLS models are used by default. There is a TLSDESC ABI.
See All about thread-local storage.
Thunks
A destination not reachable by the branch instruction needs a range extension thunk. ARM and Thumb state changes need a thunk as well.
v4, v4T
ARMv4T introduced the 16-bit Thumb instruction set. These processors do not support BLX. ARM/Thumb state change must be done using BX.
In the ARM state, the relocation types
R_ARM_PC24/R_ARM_PLT32/R_ARM_JUMP24/R_ARM_CALL
may need
range extension or state change. lld 16.0.0 has added thunk support for
Thumb. We can build Game Boy Advance or Nintendo DS roms using Thumb
code with ld.lld.
1 | // ARM to ARM, absolute |
R_ARM_THM_CALL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27// Thumb to ARM, absolute
bx
b #-6
ldr pc, [pc, #-4]
L1: .word S
// Thumb to Thumb, absolute
bx
b #-6
ldr ip, [pc]
bx ip
L1: .word S
// Thumb to ARM, position-independent
bx
b #-6
ldr ip, [pc, #]
L1: add ip, pc, ip
L2: .word S - (L1 + 8)
// Thumb to Thumb, position-independent
bx
b #-6
ldr ip, [pc, #-4]
L1: add ip, pc, ip
bx ip
L2: .word S - (L1 + 8)
v5, v6, v6-K, v6-KZ
These are pre-Cortex processors that support BLX, but not Thumb branch range extension or MOVT/MOVW.
There is no Thumb branch instruction in ARMv5 that supports thunks. LDR can switch processor states.
1 | // absolute (LDR can switch processor states) |
v6-M
Only Thumb instructions are supported. These processors support BLX and J1 J2 encodings (branch range extension), but not MOVT/MOVW.
1 | // near |
v6T2, v7 (v7-A, v7-R, v7-M, v7E-M) and newer
ARMv6T2 introduced Thumb-2. These processors support BLX/MOVT/MOVW and Thumb branch range extension. (All architectures used in Cortex processors with the exception of v6-M and v6S-M have the MOVT/MOVW instructions.)
In the ARM state, these relocation types
R_ARM_PC24, R_ARM_PLT32, R_ARM_JUMP24, R_ARM_CALL
may need
thunk for range extension or state change.
1 | // near and the destination is in the ARM state |
In the Thumb state, these relocation types
R_ARM_THM_JUMP19, R_ARM_THM_JUMP24, R_ARM_THM_CALL
may need
thunk for range extension or state change.
1 | // near and the destination is in the Thumb state |
--fix-cortex-a8
This option enables a linker workaround for Arm Cortex-A8 Errata 657417. Linkers scan a 4-byte Thumb-2 branch instruction (Bcc.w, B.w, BLX.w, BL.w) that spans two 4KiB pages, and the target address of the branch falls within the first region. The branch instruction follows a 4-byte non-branch instruction. This may result in an incorrect instruction fetch or processor deadlock.
Oncea erratum condition is detected, linkers try to rewrite it into an alternative code sequence. See the comments in the implementations for detail.
SHT_ARM_ATTRIBUTES
Usually named .ARM.attributes
.
SHT_ARM_EXIDX
(.ARM.exidx
) and .ARM.extab
See Stack unwinding
--be8
When linking big-endian images there are the deprecated BE-32 mode (word-invariant addressing big-endian mode) and the new BE-8 mode (byte-invariant addressing big-endian mode).
BE-32 is used by older architectures like arm7tdmi and arm926ej-s.
For ARMv6-M, ARMv7, and later architectures the default is BE8. The
relocatable object files have big-endian code and data. Compiler drivers
pass --be8
to the linker to to convert big-endian code to
little-endian.
The linker finds $a
/$t
/%t
mapping symbols to locate ARM and Thumb code and perform byte swapping.
ARM code is reversed as 4-byte units, Thumb code is reversed as 2-byte
units, while data is unchanged. The linker sets the
EF_ARM_BE8
flag in the ELF header.
1 | cat > a.s <<e |
a.o
and a
have data (.word
) of
the same endianness, but a
has little-endian
instructions.
1 | % llvm-objdump -s -d -j .text -j .text.2 a.o |