This per-symbol table was written with the strategy:
1. record offset and write fake header
2. write body
3. seek back to fake header
4. write real header
This CL collects the per-symbol body into a []byte, then writes the
real header followed by the body to the output file. This saves two
seeks per-symbol and overwriting the fake header.
Small performance improvement (3.5%) in best-of-ten links of godoc:
tip: real 0m1.132s user 0m1.256s
this: real 0m1.090s user 0m1.210s
I'm not sure if the performance measured here alone justifies it,
but I think this is an easier to read style of code.