cmd/6g: avoid MOVSD between registers
MOVSD only copies the low half of the packed register pair,
while MOVAPD copies both halves. I assume the internal
register renaming works better with the latter, since it makes
our code run 25% faster.
Before:
mandelbrot 16000
gcc -O2 mandelbrot.c 28.44u 0.00s 28.45r
gc mandelbrot 44.12u 0.00s 44.13r
gc_B mandelbrot 44.17u 0.01s 44.19r
After:
mandelbrot 16000
gcc -O2 mandelbrot.c 28.22u 0.00s 28.23r
gc mandelbrot 32.81u 0.00s 32.82r
gc_B mandelbrot 32.82u 0.00s 32.83r
R=ken2
CC=golang-dev
https://golang.org/cl/
6248068