Linker (ld):
Description:
Expected linker behaviour except for the following caveats:
Symbol Resolution:
- Segments are considered undefined objects for a library.
If an empty symbol references a segment and that segment exists
then that object file will be linked.
Rational:
- Allows library implementors to have a guaranteed way to have
initialization code linked.
Linker Script:
- The linker script will follow the linker script commands. If that means
copying symbols to another section it will. The default flags for
that section take precedence.
PHDRS:
- Headers with memsz = 0 are not included in final executable. Can force
inclusion by using KEEP(<header_name>).
SECTIONS:
- Sections with total size of 0 are not included in final symbol
table.
Rational:
- Keeps symbol table minimal. Often times, dummy symbols
are used for the sole purpose to force inclusion
of other files along the search path.
Static binaries:
TLS:
- TLS variables for the main thread are located in the data and bss
sections:
.data : {
*(.data .data.*)
*(.tdata .tdata.*)
}
.bss (NOLOAD) : {
*(.tbss .tbss.*)
*(.bss .bss.*)
}
Rational:
- Static binaries using tls need not initialize tls except to first
set the fs_base MSR register.
Dynamic binaries:
** Unchanged. **
Process left to right object files and static libraries on the command
line order.
Final section ordering is typically: .text, .rodata, .data, .bss, etc.
Within a section, order of input object file contributions follows
command-line/object-file order and internal order of symbols
within object files.
Default library search path:
/usr/lib, /lib, /usr/local/lib
- Only support for ELF format.
- Options:
-B mode (mode = static or dynamic).
-e Name of entry point symbol.
-G Relocations will be position independent.
-l name (Link with libname.a or libname.so).
-L dir (Add directory dir to the library search path).
-o name (Output filename name).
-r Create relocatable object.
-s Strip symbol table and debug info from output.
-T scriptname (Use linker script scriptname).
-u symbol (force symbol to be undefined so it is pulled from library).
- Make simple linking decisions so -M flag is unnecessary.
- Can also use readelf or similar utilities to get info about linker decisions.
Notes:
- Linking is NP-hard (set cover/SAT-like)..
- But with careful library design and careful programming this
computationally hard problem can still be fast in practice.
Goals:
- Heuristic: Choose the set of object files that satisfy all undefined
symbols while choosing the minimum amount of symbols.
- Rationale: Should produce clean binaries with minimal fluff.
- Most object files in libraries are self contained and usually don't
introduce extra undefined symbols.
- In the C library the only places where these decisions matter is in
the entry and exit code.
- The rest of the object files would just satisfy the final executable
requirements.
Design:
- Starting from the command line files, build a table of defined symbols
and undefined symbols.
- Go to each library in the linker search path and create a graph of
objects that satisfy the undefined symbols, minimizing the symbols
used as a heuristic.
- Once in the C library the linker will find the entry/exit points. The
linker should be able to decide which entry point to choose
because the main() object will have the undefined symbols it
needs.
- C Library entry points:
- Each entry point must initialize different variables before
calling into main().
- entry.s (main() does not use argv0 or environ).
- entry_argv0.s (main() only uses argv0).
- entry_argv0_environ.s (main() uses both argv0 and environ).
- entry_environ.s (main() only uses environ).
- _start() is a weak symbol and can be redefined by the user if
they want to implement their own entry point. Note:
the redefined _start symbol should call into main and once
main() returns implement it's own exit point or call into
_cexit() (although this is optional).
Also note the user must have knowledge of OS specifics as
the binary entry point has different calling conventions
than in the C runtime. Example: AMD64, SysV ABI: argc,
argv, and envp are all on the stack and must be moved into
registers rdi, rsi, rdx respectively before calling into
main().
- C Library exit points:
- cexit.s (main() does not use any standard streams or atexit()
therefore, _cexit() is an alias for _exit()).
- cexit_atexit.s (main() only uses atexit(), _cexit() is an alias
for the exit() version without standard stream cleanup).
- cexit_stdio.s (main() uses the standard streams, _cexit() is an
alias for the exit() version with stdandard stream
cleanup).
- _cexit() is defined as a weak symbol and can be redefined by
the user if they want to implement their own exit point.
-r
No dead code elimination (that is what the compiler is for).