crypto/sha256: improve performance for sha256.block on ppc64le
Adds an assembly implementation of sha256.block for ppc64le to improve its
performance. This implementation is largely based on the original amd64
implementation, which unrolls the 64 iterations of the inner loop.
Fixes #17652
benchmark old ns/op new ns/op delta
BenchmarkHash8Bytes 1263 767 -39.27%
BenchmarkHash1K 14048 7766 -44.72%
BenchmarkHash8K 102245 55626 -45.60%
benchmark old MB/s new MB/s speedup
BenchmarkHash8Bytes 6.33 10.43 1.65x
BenchmarkHash1K 72.89 131.85 1.81x
BenchmarkHash8K 80.12 147.27 1.84x
Change-Id: Ib4adf429423b20495580400be10bd7e171bcc70b
Reviewed-on: https://go-review.googlesource.com/32318 Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>