Programming language behavior
FORTRAN 77 COMMON blocks compiled to COMMON symbols. You could declare a COMMON block in more than one file, with each specifying the number, type, and size of the variable. The linker allocated enough space to satisfy the largest size.
This feature was somehow ported to C. Unix C compilers traditionally permitted a variable using tentative definition in different compilation units and the linker would allocate enough space without reporting an error.
This behavior is constrast to both C and C++ standards, but GCC and
Clang traditionally defaulted to -fcommon
for C. GCC since
10 and Clang since
11
default to -fno-common
.
1 | % echo 'int x;' > a.c |
Assembler behavior
The directive .comm identifier, size[, alignment]
instructs the assembler to define a COMMON symbol with the specified
size and the optional alignment.
In the ELF object file format, the symbol is represented as a
STT_OBJECT
STB_GLOBAL
symbol whose
st_shndx
field holds SHN_COMMON
. In readelf,
the SHN_COMMON
value is shown as COM
.
1 | typedef struct { |
The st_value
field holds the alignment. This is an
interesting abuse. Regular definitions are relative to a section
(st_value
is a section offset) and the section alignment
(sh_addralign
) is sufficient to encode the symbol alignment
information. For COMMON symbols, the section information is unavailable
but fortunately st_value
is vacant.
1 | % cat a.s |
The binding STB_WEAK
is not allowed. Other types are not
allowed: 1
2
3
4
5
6
7
8
9
10
11
12
13
14% >err.s cat <<e
.comm x,4,4
.weak x
e
% as err.s
err.s: Assembler messages:
err.s: Error: symbol `x' can not be both weak and common
% >err.s cat <<e
.comm x,4,4
.type x,@function
e
% as err.s
err.s: Assembler messages:
err.s:2: Error: cannot change type of common symbol 'x'
The generic ABI supports STT_COMMON
as another way to
label a COMMON symbol. It says:
Symbols with type
STT_COMMON
label uninitialized common blocks. In relocatable objects, these symbols are not allocated and must have the special section indexSHN_COMMON
(see below). In shared objects and executables these symbols must be allocated to some section in the defining object.In relocatable objects, symbols with type
STT_COMMON
are treated just as other symbols with indexSHN_COMMON
. If the link-editor allocates space for theSHN_COMMON
symbol in an output section of the object it is producing, it must preserve the type of the output symbol asSTT_COMMON
.When the dynamic linker encounters a reference to a symbol that resolves to a definition of type
STT_COMMON
, it may (but is not required to) change its symbol resolution rules as follows: instead of binding the reference to the first symbol found with the given name, the dynamic linker searches for the first symbol with that name with type other thanSTT_COMMON
. If no such symbol is found, it looks for theSTT_COMMON
definition of that name that has the largest size.
--elf-stt-common=yes
causes GNU assembler to use
STT_COMMON
. It is super rare in the wild, though.
1
2
3
4
5
6
7% as a.s --elf-stt-common=yes -o a.o
% readelf -Ws a.o
Symbol table '.symtab' contains 2 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000004 4 COMMON GLOBAL DEFAULT COM x
Linker behavior
Symbol resolution
The key is: a COMMON symbol does not lead to a duplicate definition
error with any kind of definitions. 1
2
3
4
5
6# Two STB_GLOBAL definitions lead to a duplicate definition error.
% as -o b.o <<< '.data; .globl x; x:'
% ld.lld -e 0 b.o b.o
ld.lld: error: duplicate symbol: x
>>> defined at b.o:(.data+0x0)
>>> defined at b.o:(.data+0x0)
However, the size and alignment fields may be updated when two COMMON symbols are merged. The quoted generic ABI text describes the behavior when a COMMON symbol has different sizes in relocatable objects. The output symbol gets the largest size.
Platforms differ in how the alignment is selected. GNU ld and ld.lld
pick the largest alignment. 1
2
3as -o a.o <<< '.comm x,8,4'
as -o b.o <<< '.comm x,4,8'
ld a.o b.o # st_size==8, aligned by 8
Mach-O ld64 lets the copy with the largest size decide the alignment.
When a common symbol is merged with a shared symbol, GNU ld and
ld.lld (see D71161)
increase st_size
if the shared symbol has a larger
st_size
. 1
2ld -shared a.o -o a.so
ld a.so b.o # st_size==8, aligned by 4
IN ELF, the precedence is
STB_GLOBAL > COMMON > STB_WEAK
.
When the link editor combines several relocatable object files, it does not allow multiple definitions of
STB_GLOBAL
symbols with the same name. On the other hand, if a defined global symbol exists, the appearance of a weak symbol with the same name will not cause an error. The link editor honors the global definition and ignores the weak ones. Similarly, if a common symbol exists (that is, a symbol whosest_shndx
field holdsSHN_COMMON
), the appearance of a weak symbol with the same name will not cause an error. The link editor honors the common definition and ignores the weak ones.
1 | as -o a.o <<< '.comm x,8,4' |
1 | % ld.bfd -e 0 a.o b.o # b.o wins (COMMON < STB_GLOBAL) |
GNU ld ported a strange rule from SUN's linker in 1999-12: GNU-ld behaviour does not match native linker behaviour.
Here is a table showing when an element is pulled in from an archive with the Solaris 2.6 linker and ar program:
1 | main program\archive undefined common defined |
When a symbol is COMMON and ld sees an archive, ld checks whether the
archive index provides a STB_GLOBAL
definition of the
symbol. If yes, ld extracts the archive as well. This is in contrary to
the usual rule that only an undefined symbol leads to archive member
extraction.
ld.lld since 12.0.0 has this behavior (D86142) with the
enabled-by-default --fortran-common
option.
Say b0.a
and b1.a
are mostly identical
archives, but b0.a
objects are compiled with
-fcommon
while b1.a
objects are compiled with
-fno-common
. If a.o
references
b0.a
, this archive lookup behavior may cause a duplicate
definition error for ld a.o b0.a b1.a
while
b1.a
can be shadowed by b0.a
without the
rule.
1 | echo 'extern int ret; int main() { return ret; }' > a.c |
1 | # ret in b0.a(b0.o) is COMMON. b1.a(b1.o) is extracted to override the COMMON symbol with a STB_GLOBAL definition. |
What I am most concerned with is how to parallelize symbol resolution in the presence of this archive lookup rule.
GNU ld and ld.lld treat COMMON symbols as though they are in an input
section named COMMON
. *(COMMON)
in a linker
script can match these symbols.
Error-prone COMMON symbols
With -fcommon
, due to the linker symbol resolution rule,
a tentative definition int x;
may be overridden by a
STB_GLOBAL
definition in another compilation unit. This is
error-prone since the user may assume an initial value of zero if unware
of int x = 1;
.
1 | gcc -c -fcommon -xc - -o a.o <<< 'int x;' |
GNU ld and ld.lld support --warn-common
which detects
the error-prone overridding. 1
2% gcc -shared -fuse-ld=bfd -Wl,--warn-common a.o b.o
/usr/bin/ld.bfd: b.o: warning: definition of `x' overriding common from a.o
Some legacy code may inadvertently rely on COMMON symbols by having
something like int x;
in a header file. Such code may not
compile with -fno-common
.
.bss
allocation
When producing an executable or shared object, the linker allocates
space in .bss
to hold COMMON symbols. In GNU ld, COMMON
symbols are placed after .bss
and .bss.*
input
sections.
1 | .bss : |
In a relocatable link, COMMON symbols remain COMMON.
Run-time behavior
1 | // a.c |
When a.c
and b.c
are in the same component
(main executable or shared object), with -fcommon
, it's
clear that the two x
resolves to the same copy and the
output is 1. 1
2% gcc -fcommon a.c b.c && ./a.out
1
If b.c
is compiled and linked into a different
component, this works with the help of ELF symbol interposition. When
linking the shared object, x
is preemptible (default
visibility non-local binding) and its access requires GOT indirection.
When linking the executable, the linker exports x
to the
dynamic symbol table because it is used by an input shared object.
1
2
3
4
5
6
7
8
9
10
11
12% gcc -fpic -shared -fcommon b.c -o b.so && gcc -fcommon a.c ./b.so && ./a.out
1
% readelf -W --dyn-syms a.out | egrep 'Num:| x'
Num: Value Size Type Bind Vis Ndx Name
8: 0000000000003af8 4 OBJECT GLOBAL DEFAULT 25 x
% readelf -W --dyn-syms b.so | egrep 'Num:| x'
Num: Value Size Type Bind Vis Ndx Name
5: 00000000000037a0 4 OBJECT GLOBAL DEFAULT 20 x
% readelf -Wr --dyn-syms b.so | egrep 'Num:| x'
0000000000002770 0000000500000006 R_X86_64_GLOB_DAT 00000000000037a0 x + 0
Num: Value Size Type Bind Vis Ndx Name
5: 00000000000037a0 4 OBJECT GLOBAL DEFAULT 20 x
If you make x
non-preemptible (e.g. vi
-Bsymbolic
) in b.so
, b.so
will
get its own copy. 1
2% gcc -fpic -shared -fcommon -Wl,-Bsymbolic b.c -o b.so && gcc -fcommon a.c ./b.so && ./a.out
0
Miscellaneous related linker options
--no-define-common
In 2001-09, optionally
postpone assignment of Common added this option to be used with
-shared
.
Here is my understanding: glibc around 2.1.3 used to have a ld.so bug
that the ELF interposition might not work. Using
--no-define-common
with shared objects can make COMMON
symbols undefined and circumvent the bug. 1
2
3
4% gcc -fpic -Wl,--no-define-common -shared -fuse-ld=bfd -fcommon b.c -o b.so
% readelf -W --dyn-syms b.so | egrep 'Num:| x'
Num: Value Size Type Bind Vis Ndx Name
1: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND x
gold confuses
--define-common
with
-d
/FORCE_COMMON_ALLOCATION
and implements
--define-common
with -d
semantics. Its
--no-define-common
is incompatible with GNU ld.
-d
, -dc
,
-dp
In a relocatable link, a COMMON symbol remains COMMON in the output.
If -dc
is specified, the linker will allocate space to
COMMON symbols. -d
and -dp
are aliases for
-dc
.
1 | % gcc -fpic -fcommon -r -fuse-ld=bfd b.c -o b.ro && readelf -Ws b.ro | egrep 'Num:| x' |
The output has a regular STB_GLOBAL
definition. Linking
the relocatable output with another which defines x
will
lead to a duplicate definition error. 1
2
3
4
5ld.bfd -r b.o -o b.ro
ld.bfd b.ro b.ro # ok. COMMON symbols are merged
ld.bfd -r -dc b.o -o b.ro
ld.bfd b.ro b.ro # duplicate definition error
The options are obscure and might be used to work around some legacy
programs. If the relocatable output is fed into the linker again,
ignoring -dc
should usually work as well. Only when the
program inspects relocatable output by itself and does not recognize
COMMON symbols, there may be a problem. This implies that the program
cannot process a relocatable object with COMMON symbols produced by the
assembler.
For ld.lld, I removed -dp
and ignored
-d
/-dc
for 15.0.0: https://github.com/llvm/llvm-project/issues/53660.
--sort-common
By sorting COMMON symbols by decreasing alignment, some padding can
be saved. However, I think this hardly ever has any size benefit. For
example, musl specifies --sort-common
by default. With
-fcommon
, I see a 24 byte decrease of .bss
.
The total size of .bss
is 11344 bytes.
1 |
|
Actually, this can degrade performance if COMMON symbols in an object
file have locality and --sort-common
breaks the
locality.
edata
, end
, and
etext
For legacy reasons GNU ld's internal linker script has
PROVIDE(edata = .);
and similar symbol assignments for the
other two symbols. In GNU ld, the definition precedence is: regular
symbol assignment > relocatable object definition >
PROVIDE
symbol assignment.
If a relocatable object file defines end
, it will take
precedence over the internal linker script
PROVIDE(end = .);
. This makes sense because the global
variable int end;
is valid in C and C++.
Before ld.lld 15, int end;
compiled with
-fcommon
is overridden by the linker definition. This will
be fixed by D120389.
LLVM IR
In LLVM IR, a COMMON symbol has the "common" linkage. It is an interposable linkage and some optimizations are suppressed. For example:
- InstCombine assumes that the addresses of a
common global i8
and anexternal global i32
may be the same. llvm.objectsize
intrinsic does not know the size. This may lead to conservative assumptions for some_chk
functions.