Linker (ld): - Only support ELF format. - Options: -B mode (mode = static or dynamic). -e Name of entry point symbol. -G Relocations will be position independent. -l name (Link with libname.a or libname.so). -L dir (Add directory dir to the library search path). -o name (Output filename name). -r Create relocatable object. -s Strip symbol table and debug info from output. -T scriptname (Use linker script scriptname). -u symbol (force symbol to be undefined so it is pulled from library). - Make simple linking decisions so -M flag is unnecessary. - Can also use readelf or similar utilities to get info about linker decisions. Notes: - Linking is NP-hard (set cover/SAT-like).. - But with careful library design and careful programming this computationally hard problem can still be fast in practice. Goals: - Heuristic: Choose the set of object files that satisfy all undefined symbols while choosing the minimum amount of symbols. - Rationale: Should produce clean binaries with minimal fluff. - Most object files in libraries are self contained and usually don't introduce extra undefined symbols. - In the C library the only places where these decisions matter is in the entry and exit code. - The rest of the object files would just satisfy the final executable requirements. Design: - Starting from the command line files, build a table of defined symbols and undefined symbols. - Go to each library in the linker search path and create a graph of objects that satisfy the undefined symbols, minimizing the symbols used as a heuristic. - Once in the C library the linker will find the entry/exit points. The linker should be able to decide which entry point to choose because the main() object will have the undefined symbols it needs. - C Library entry points: - Each entry point must initialize different variables before calling into main(). - entry.s (main() does not use argv0 or environ). - entry_argv0.s (main() only uses argv0). - entry_argv0_environ.s (main() uses both argv0 and environ). - entry_environ.s (main() only uses environ). - _start() is a weak symbol and can be redefined by the user if they want to implement their own entry point. Note: the redefined _start symbol should call into main and once main() returns implement it's own exit point or call into _cexit() (although this is optional). Also note the user must have knowledge of OS specifics as the binary entry point has different calling conventions than in the C runtime. Example: AMD64, SysV ABI: argc, argv, and envp are all on the stack and must be moved into registers rdi, rsi, rdx respectively before calling into main(). - C Library exit points: - cexit.s (main() does not use any standard streams or atexit() therefore, _cexit() is an alias for _exit()). - cexit_atexit.s (main() only uses atexit(), _cexit() is an alias for the exit() version without standard stream cleanup). - cexit_stdio.s (main() uses the standard streams, _cexit() is an alias for the exit() version with stdandard stream cleanup). - _cexit() is defined as a weak symbol and can be redefined by the user if they want to implement their own exit point. -r No dead code elimination (that is what the compiler is for).