# Competitive programming in Nim

UNDER CONSTRUCTION

In the afternoon, I came cross the Nim programming language again on Lobsters. I first learned some basics of the language in 2015, but had not touched it since then.

"Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula.", according to its website.

Basic features: parametric polymorphism. Advanced features: macros (including term-rewriting macros), compile-time function execution, effect system, concepts

An idea popped into my mind: why not solve some coding challenges in Nim?

As a niche language, it is not supported on many coding challenge websites. Fortunately, the Nim compiler generates C code. With a small amount of work, we can build a self-contained C source file suitable for submission.

Let's take a LeetCode challenge as an example. We write the main algorithm in Nim and use the emit pragma to write a C wrapper.

## Branch target

Many architectures encode a branch/jump/call instruction with PC-relative addressing, i.e. the distance to the target is encoded in the instruction. In an executable or shared object (called a component in ELF), if the target is bound to the same component, the instruction has a fixed encoding at link time; otherwise the target is unknown at link time and there are two choices:

• text relocation
• indirection

In All about Global Offset Table, I mentioned that linker/loader developers often frowned upon text relocations because the text segment will be unshareable. In addition, the number of relocations would be dependent on the number of calls, which can be large.

# Build glibc with LLD 13

LLD is the LLVM linker. It started at the end of 2011 as a work-in-progress rewrite of ld64 for the Mach-O binary format based on the atom model. COFF and ELF ports based on the atom model were contributed subsequently. They shared one symbol resolution model. (IMO due to Mach-O's unfortunate limitation of 255 section .subsections_via_symbols was invented. The atom model was an incarnation of the concept but it did not fit into ELF/PE where sections are the better basic units.)

In 2015, both COFF and ELF ports were rewritten. (See "LLD improvement plan") Today, LLD is a mature and fast linker supporting multiple binary formats (ELF, Mach-O, PE/COFF, WebAssembly). FreeBSD, Android, and Chrome OS have adopted it as the main linker.

As a main contributor of LLD's ELF port who has fixed numerous corner cases in recent years, I consider that its x86-64 support has been mature since the 8.0.0 release and is in a great shape since 9.0.0. The AArch64 and PowerPC32/PowerPC64 support has been great since the 10.0.0 release. The 11.0.0 release has very solid linker script support. (When people complain that GNU ld's linker script is not immediately usable with LLD, it is almost assuredly the problem of the script itself.) So, what's next? Build glibc with LLD!

# All about Global Offset Table

In an executable or shared object (called a component in ELF), a text section may need the absolute virtual address of a symbol (e.g. a function or a variable). The reference arises from an address taken operation or a PLT entry. The address may be:

• dependent on runtime computation by ld.so

# Extract an archive member to satisfy a DSO undef

In ELF linkers, undefined symbols from DSO participate in symbol resolution, just like undef from .o. The differences with undef from .o are:

• undef purely from DSO do not need .symtab entries
• --no-allow-shlib-undefined can error for such undef (-rpath-link ("load dependent libraries") behavior can suppress some errors.

ELF linkers extract an archive member to satisfy an undefined symbol from a shared object. Here is an example:

# Toolchain testing

Some random notes about toolchain testing.

Testing levels.

# COMDAT and section group

In C++, inline functions, template instantiations, and a few other things can be defined in multiple object files but need deduplication at link time. In the dark ages the functionality was implemented by weak definitions: the linker does not report duplicate definition errors and resolves the references to the first definition. The downside is that unneeded copies remained in the linked image.

# SECTIONS and OVERWRITE_SECTIONS

The main task of a linker script is to define extra symbols and define how input sections map into output sections. It has other miscellaneous features which can be implemented via command line options:

• The ENTRY command can be replaced by --entry.
• The OUTPUT_FORMAT command can usually be replaced by -m.
• The SEARCH_DIRS command can be replaced by -L.
• The VERSION command can be replaced by --version-script.
• The INPUT and GROUP commands can add other files as input. This provides a mechanism to split an archive/shared object into multiple files.

# Symbol processing

UNDER CONSTRUCTION (COFF, Mach-O)

Symbol processing is a major step in a linker. In most binary formats, the linker maintains a global symbol table and performs symbols resolution for each input file (object file, shared object, archive, LLVM bitcode file). Some command line options can define/undefine symbols as well. The symbol resolution can affect archive processing and many subsequent steps (LTO, relocation processing, as-needed shared objects, etc).