Introduces a parallel OutBuf implementation, and POC on amd64. Due to
some of the weird behaviors I saw on MacOS (SIGBUS while calling msync),
I will wait for feedback to port to other architectures.
On my mac, sped up Asmb by ~78% for cmd/compile (below). Will likely
have an appreciable speedup on kubelet benchmark.