runtime: use MOVSB instead of MOVSQ for unaligned moves
MOVSB is quite a bit faster for unaligned moves.
Possibly we should use MOVSB all of the time, but Intel folks
say it might be a bit faster to use MOVSQ on some processors
(but not any I have access to at the moment).
benchmark old ns/op new ns/op delta
BenchmarkMemmove4096-8 93.9 93.2 -0.75%
BenchmarkMemmoveUnalignedDst4096-8 256 151 -41.02%
BenchmarkMemmoveUnalignedSrc4096-8 175 90.5 -48.29%
Fixes #14630
Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff
Reviewed-on: https://go-review.googlesource.com/20293 Reviewed-by: Ian Lance Taylor <iant@golang.org>