cmd/compile/internal/ssa: combine consecutive loads and stores on amd64
Sometimes (often for calls) we generate code like this:
MOVQ (addr),AX
MOVQ 8(addr),BX
MOVQ AX,(otheraddr)
MOVQ BX,8(otheraddr)
Replace it with
MOVUPS (addr),X0
MOVUPS X0,(otheraddr)
For completeness do the same for 8,16,32-bit loads/stores too.
Shaves 1% from code sections of go tool.
/localdisk/itocar/golang/bin/go
10293917
go_old
10334877 [40960 bytes]
read-only data = 682 bytes (0.040769%)
global text (code) = 38961 bytes (1.036503%)
Total difference 39643 bytes (0.674628%)
Updates #6853
Change-Id: I1f0d2f60273a63a079b58927cd1c4e3429d2e7ae
Reviewed-on: https://go-review.googlesource.com/57130
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>