Reorder how register & stack allocation is done. We used to allocate
registers, then fix up merge edges, then allocate stack slots. This
lead to lots of unnecessary copies on merge edges:
v2 = LoadReg v1
v3 = StoreReg v2
If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary. But at regalloc time we didn't know the homes of v1 and
v3.
To fix this problem, allocate all the stack slots before fixing up the
merge edges. That way, we know what stack slots values use so we know
what copies are required.
Use a good technique for shuffling values around on merge edges.
Improves performance of the go1 TimeParse benchmark by ~12%
Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915 Reviewed-by: David Chase <drchase@google.com>