Cypherpunks repositories - gostls13.git/commit

author	Rui Ueyama <ruiu@google.com>
	Mon, 23 Jun 2014 19:06:26 +0000 (12:06 -0700)
committer	Rui Ueyama <ruiu@google.com>
	Mon, 23 Jun 2014 19:06:26 +0000 (12:06 -0700)
commit	a712e20a1d1587e115c1295df0850120cdfa3e44
tree	0159ea6a16dd2dac7790c39402529d7c5ef68b85	tree
parent	e3e48cd075091d8f0e1265ae6a18e69ac83d2af4	commit \| diff

runtime: speed up amd64 memmove

MOV with SSE registers seems faster than REP MOVSQ if the
size being copied is less than about 2K. Previously we
didn't use MOV if the memory region is larger than 256
byte. This patch improves the performance of 257 ~ 2048
byte non-overlapping copy by using MOV.

Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem).

benchmark               old ns/op    new ns/op    delta
BenchmarkMemmove16              4            4   +0.42%
BenchmarkMemmove32              5            5   -0.20%
BenchmarkMemmove64              6            6   -0.81%
BenchmarkMemmove128             7            7   -0.82%
BenchmarkMemmove256            10           10   +1.92%
BenchmarkMemmove512            29           16  -44.90%
BenchmarkMemmove1024           37           25  -31.55%
BenchmarkMemmove2048           55           44  -19.46%
BenchmarkMemmove4096           92           91   -0.76%

benchmark                old MB/s     new MB/s  speedup
BenchmarkMemmove16        3370.61      3356.88    1.00x
BenchmarkMemmove32        6368.68      6386.99    1.00x
BenchmarkMemmove64       10367.37     10462.62    1.01x
BenchmarkMemmove128      17551.16     17713.48    1.01x
BenchmarkMemmove256      24692.81     24142.99    0.98x
BenchmarkMemmove512      17428.70     31687.72    1.82x
BenchmarkMemmove1024     27401.82     40009.45    1.46x
BenchmarkMemmove2048     36884.86     45766.98    1.24x
BenchmarkMemmove4096     44295.91     44627.86    1.01x

LGTM=khr
R=golang-codereviews, gobot, khr
CC=golang-codereviews
https://golang.org/cl/90500043