math/big: add assembly implementation of arith for ppc64{le}
The existing implementation used a pure go implementation, leading to slow
cryptographic performance.
Implemented mulWW, subVV, mulAddVWW, addMulVVW, and bitLen for
ppc64{le}.
Implemented divWW for ppc64le only, as the DIVDEU instruction is only
available on Power8 or newer.
benchcmp output:
benchmark old ns/op new ns/op delta
BenchmarkSignP384
28934360 10877330 -62.41%
BenchmarkRSA2048Decrypt
41261033 5139930 -87.54%
BenchmarkRSA2048Sign
45231300 7610985 -83.17%
Benchmark3PrimeRSA2048Decrypt
20487300 2481408 -87.89%
Fixes #16621
Change-Id: If8b68963bb49909bde832f2bda08a3791c4f5b7a
Reviewed-on: https://go-review.googlesource.com/26951
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>