cmd/link: use a two-pass approach for trampoline insertion
Currently in the linker, for trampoline insertion it does a one-pass
approach, where it assigns addresses for each function and inserts
trampolines on the go. For this to work and not to emit too many
unnecessary trampolines, the functions need to be laid out in
dependency order, so a direct call's target is always as a known
address (or known to be not too far).
This mostly works, but there are a few exceptions:
- linkname can break dependency tree and cause cycles.
- in internal linking mode, on some platforms, some calls are turned
into calls via PLT, but the PLT stubs are inserted rather late.
Also, this is expensive in that it has to investigate all CALL
relocations.
This CL changes it to use a two-pass approach. The first pass is
just to assign addresses without inserting any trampolines, assuming
the program is not too big. If this succeeds, no extra work needs to
be done. If this fails, start over and insert trampolines for too-
far targets as well as targets with unknown addresses. This should
make it faster for small programs (most cases) and generate fewer
conservative trampolines.