This article describes linker notes about Portable Executable (PE) and Common Object File Format (COFF) used on Windows and UEFI environments.
In ELF, an object file can be a relocatable file, an executable file, or a shared object file. On Windows, the term "object file" usually refers to relocatable files like ELF. Such files use the Common Object File Format (COFF) while image files (e.g. executables and DLLs) use the Portable Executable (PE) format.
Input files
The input files to the linker can be object files, archive files, and import libraries. GNU ld and lld-link allow linking against DLL files without an import library.
Object files
Import files
An import file (.lib
) is a special archive file. Each
member represents a symbol to be imported. The symbol
__imp_$sym
is inserted to the global symbol table.
The import header has a Type
field indicating
IMPORT_OBJECT_CODE/IMPORT_OBJECT_DATA/IMPORT_OBJECT_CONST
.
For an import type of IMPORT_OBJECT_DATA
, the symbol
$sym
is defined as an alias for
__imp_$sym
.
For an import type of IMPORT_OBJECT_CODE
, the symbol
$sym
is defined as an import thunk, which is like a PLT
entry in ELF.
GNU ld and lld-link allow linking against DLL files without an import library. The behavior is as if the linker synthesizes an import library from a DLL file.
Symbols
An object file contributes defined and undefined symbols. An import
file contributes defined symbols in a DLL that can be referenced by
__imp_$sym
.
A defined symbol can be any of the following kinds:
- special (ignored in the global symbol table)
- common (section number is
IMAGE_SYM_UNDEFINED
and value is not 0) - absolute (section number is -1)
- regular (section number is positive)
An undefined symbol has a storage class of
IMAGE_SYM_CLASS_EXTERNAL
, a section number of
IMAGE_SYM_UNDEFINED
(zero), and a value of zero.
An undefined symbol with a storage class of
IMAGE_SYM_CLASS_WEAK_EXTERNAL
is a weak external, which is
actually like a weak definition in ELF.
PE requires explicit annotations for exported symbols and imported symbols in DLL files. There are differences between code symbols and function symbols.
COMDAT
Refer to COMDAT and section group.
Imported code symbols
1 | // b.dll |
Linking b.dll
gives us b.lib
(see "Import
files" above). 1
2
3
4
5
6# b.dll
.globl f
f:
.section .drectve,"yni"
.ascii " -export:f"
a.obj
has two function calls. The call to f
references the prefixed symbol __imp_f
. 1
2
3# a.obj
callq local
callq *__imp_f(%rip)
call *__imp_f(%rip)
is like -fno-plt
codegen for ELF. In this case when we know that f
is
defined elsewhere, the generated code is more efficient.
When linking a.exe
, we need to make the import file
b.lib
as an input file. The linker parses the import file
and creates a definition for __imp_f
pointing to the import
address table entry.
TODO import table
Actually, when __imp_f
is defined, the unprefixed symbol
f
is also defined. Normally, the unprefixed f
is unused and will be discarded. However, if the user code calls the
unprefixed symbol (e.g. call f
; like ELF
-fplt
), the f
definition will be retained in
the linker output and point to a thunk: 1
2
3
4 call f # generated code without using dllimport
f: # x86-64 thunk
jmpq *__imp_f(%rip)
Different architectures have different thunk implementations.
1
2
3
4
5
6
7
8
9
10
11
12// x86-32 and x86-64
jmp *0x0 // references an entry in the import address table
// AArch32
mov.w ip, #0
mov.t ip, #0
ldr.w pc, [ip]
// AArch64
adrp x16, #0
ldr x16, [x16]
br x16
TODO link.exe will issue a warning.
Imported data symbols
1 | // b.dll |
1 | # b.dll |
The linker parses the import file and creates a definition for
__imp_var
pointing to the import address table entry.
Unlike a code symbol, the linker does not create a definition for
var
(without the __imp_
prefix).
With a dllimport
: 1
2movq __imp_var(%rip), %rax
movl (%rax), %eax
If dllimport
is not specified, we get a referenced to
the unprefixed symbol: 1
movq var(%rip), %rax
link.exe will report an error.
MinGW implements runtime pseudo relocations to patch the text section
so that absolute pointers and relative offsets to the symbol will be
rewritten to bind to the actual definition. 1
movq var(%rip), %rax # the runtime will rewrite this to point to the definition in b.dll
If the variable is defined out of the +-2GiB range from the current location, the runtime pseudo relocation can't fix the issue. See crt: Check pseudo relocations for overflows and error out clearly.
For a non-definition declaration, GCC conservatively thinks the
variable may be defined in a DLL and generate indirection. This is
similar to a GOT code sequence in ELF. 1
2extern int extern_var;
int main() { return extern_var; }
1 | // MSVC |
Non-dllexport definition and dllimport
A dllimport
symbol referenced by an object file is
normally satisfied by an import file. link.exe allows another object
file to provide the definition. In such a case, link.exe will issue a
warning (Linker
Tools Warning LNK4217). lld-link has implemented this feature for
compatibility.
1 | echo '__declspec(dllimport) int foo(); int main() { return foo(); }' > a.cc |
1 | lld-link: warning: a.obj: locally defined symbol imported: int __cdecl foo(void) (defined in b.obj) [LNK4217] |
MinGW
MinGW provides auto exporting and auto importing features to make PE
DLL files work like ELF shared objects. When producing a DLL file, if no
symbol is chosen to be exported, almost all symbols are exported by
default (--export-all-symbols
).
If an undefined symbol $sym
is unresolved and
__imp_$sym
is defined, $sym
will be aliased to
__imp_$sym
. TODO: example
If the symbol .refptr.$sym
is present, it will be
aliased to __imp_$sym
as well. mingw-w64 defaults to
-mcmodel=medium
and uses .refptr.$sym
. TODO:
example
https://github.com/ziglang/zig/issues/9845
Manual __imp_
definition
The user can define __imp_
instead of letting the linker
does.
https://github.com/llvm/llvm-project/issues/57982 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15$ cat lto-dllimp1.c
void __declspec(dllimport) importedFunc(void);
void other(void);
void entry(void) {
importedFunc();
other();
}
$ cat lto-dllimp2.c
static void importedFuncReplacement(void) {
}
void (*__imp_importedFunc)(void) = importedFuncReplacement;
void other(void) {
}
Range extension thunks
TODO
Shared library comparison with ELF
The design of share libraries has major advancements around 1988. Before 1988, there were shared libraries implementations in a.out and COFF objec file formats, but they had severe limitations, such as fixed addresses and the requirement of extra files like import files.
Such limitations are evidenced in 1986 Summer USENIX Technical Conference & Exhibition Proceedings, Shared Libraries on UNIX System V from AT&T. Its shared library (presumably using the COFF object file format) must have a fixed virtual address, which is called "static shared library" in Linkers and Loaders's term.
In 1988, SunOS 4.0 was released with an extended a.out binary format with dynamic shared library support. Unlike previous static shared library schemes, the a.out shared libraries are position independent and can be loaded at different addresses. The dynamic linker source code is available somewhere and I find that its GOT and PLT schemes are exacly like what we have for ELF today.
AT&T and Sun collaborated to create the first System V release 4 ABI (using ELF). AT&T contributed the ELF object format. Sun contributed all of the dynamic linking implementation from SunOS 4.x. In 1992, SunOS 5.0 (Solaris 2.0) switched to ELF.
For ELF, the designers tried to make shared libraries similar to static libraries. There is no need to annotate export and import symbols to work with shared libraries.
I cannot find more information about System V release 3's shared library support, but the Windows DLL is assuredly inspired by it, given that the PE object file format is based on COFF and the PE specification refers to COFF in numerous places.
So, is the shared library design in ELF more advanced? It is. However, two aspects are worth deep thoughts.
- The manual export and import annotations have its stregth.
- Choices made to make ELF shared libraries flexible had major
downsides.
- Performance downside due to symbol interposition on the compiler side. See -fno-semantic-interposition
- Performance downside due to symbol interposition on the linker and loader side. See ELF interposition and -Bsymbolic
- Underlinking problems exacebated by the
-z undefs
default in linkers. See Dependency related linker options.
Limitations
The number of symbols cannot exceed 65535. Several open-source
projects have faced problems that a DLL file cannot export more than
65535 symbols. (GNU ld has a diagnostic
error: export ordinal too large:
).
A section header has only 8 bytes for the name field. link.exe
truncates long section names to 8 bytes. For a section with a long name
and the IMAGE_SCN_MEM_DISCARDABLE
flag, lld uses a
non-standard string table and issues a warning.
COMDAT limitation: MSVC link.exe will report a duplicate
symbol error (error LNK2005) for an external symbol defined in an
IMAGE_COMDAT_SELECT_ASSOCIATIVE
section, even if it would
be discarded after handling the leader symbol.