runtime: improve IndexByte for ppc64x
This change adds a better implementation of IndexByte in asm that uses the
vector registers/instructions on ppc64x.
benchmark old ns/op new ns/op delta
BenchmarkIndexByte/10-8 9.70 9.37 -3.40%
BenchmarkIndexByte/32-8 10.9 10.9 +0.00%
BenchmarkIndexByte/4K-8 254 92.8 -63.46%
BenchmarkIndexByte/4M-8 249246 118435 -52.48%
BenchmarkIndexByte/64M-8
10737987 7383096 -31.24%
benchmark old MB/s new MB/s speedup
BenchmarkIndexByte/10-8 1030.63 1067.24 1.04x
BenchmarkIndexByte/32-8 2922.69 2928.53 1.00x
BenchmarkIndexByte/4K-8 16065.95 44156.45 2.75x
BenchmarkIndexByte/4M-8 16827.96 35414.21 2.10x
BenchmarkIndexByte/64M-8 6249.67 9089.53 1.45x
Change-Id: I81dbdd620f7bb4e395ce4d1f2a14e8e91e39f9a1
Reviewed-on: https://go-review.googlesource.com/71710
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>