crypto/sha512: improve performance for sha512.block on ppc64le
Adds an assembly implementation of sha512.block for ppc64le to improve its
performance. This implementation is largely based on the original amd64
implementation, unrolling the 80 iterations of the inner loop.
Fixes #17660
benchmark old ns/op new ns/op delta
BenchmarkHash8Bytes 1715 1133 -33.94%
BenchmarkHash1K 10098 5513 -45.41%
BenchmarkHash8K 68004 35278 -48.12%
benchmark old MB/s new MB/s speedup
BenchmarkHash8Bytes 4.66 7.06 1.52x
BenchmarkHash1K 101.40 185.72 1.83x
BenchmarkHash8K 120.46 232.21 1.93x
Change-Id: Ifd55a49a24cb159b3a09a8e928c3f37727aca103
Reviewed-on: https://go-review.googlesource.com/32320 Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>