]> Cypherpunks repositories - gostls13.git/commit
math/big,crypto/internal/bigmod: unroll loop in addMulVVW for ppc64x
authorLynn Boger <laboger@linux.vnet.ibm.com>
Tue, 23 Jan 2024 18:46:05 +0000 (12:46 -0600)
committerLynn Boger <laboger@linux.vnet.ibm.com>
Thu, 25 Jan 2024 19:32:43 +0000 (19:32 +0000)
commit4fde3ef2acef11738c83f6ad9147eba03b5da6d1
tree98d5f89431a7e350455e49461361ca989111a637
parent67d30fc315f90207a7fd8cde465ab14687835f1c
math/big,crypto/internal/bigmod: unroll loop in addMulVVW for ppc64x

This updates the assembly implementation of AddMulVVW to
unroll the main loop to do 64 bytes at a time.

The code for addMulVVWx is based on the same code and has
also been updated to improve performance.

goos: linux
goarch: ppc64le
pkg: crypto/internal/bigmod
cpu: POWER10
               │ bg.orig.out │               bg.out               │
               │   sec/op    │   sec/op     vs base               │
ModAdd           116.3n ± 0%   116.9n ± 0%   +0.52% (p=0.002 n=6)
ModSub           111.5n ± 0%   111.5n ± 0%    0.00% (p=0.273 n=6)
MontgomeryRepr   2.195µ ± 0%   1.944µ ± 0%  -11.44% (p=0.002 n=6)
MontgomeryMul    2.195µ ± 0%   1.943µ ± 0%  -11.48% (p=0.002 n=6)
ModMul           4.418µ ± 0%   3.900µ ± 0%  -11.72% (p=0.002 n=6)
ExpBig           5.736m ± 0%   5.117m ± 0%  -10.78% (p=0.002 n=6)
Exp              5.891m ± 0%   5.237m ± 0%  -11.11% (p=0.002 n=6)
geomean          9.901µ        9.094µ        -8.15%

goos: linux
goarch: ppc64le
pkg: math/big
cpu: POWER10
                 │ am.orig.out  │               am.out               │
                 │    sec/op    │   sec/op     vs base               │
AddMulVVW/1         4.456n ± 1%   3.565n ± 0%  -20.00% (p=0.002 n=6)
AddMulVVW/2         4.875n ± 1%   5.938n ± 1%  +21.79% (p=0.002 n=6)
AddMulVVW/3         5.484n ± 0%   5.693n ± 0%   +3.80% (p=0.002 n=6)
AddMulVVW/4         6.370n ± 0%   6.065n ± 0%   -4.79% (p=0.002 n=6)
AddMulVVW/5         7.321n ± 0%   7.188n ± 0%   -1.82% (p=0.002 n=6)
AddMulVVW/10        12.26n ± 8%   11.41n ± 0%   -6.97% (p=0.002 n=6)
AddMulVVW/100      100.70n ± 0%   93.58n ± 0%   -7.08% (p=0.002 n=6)
AddMulVVW/1000      938.6n ± 0%   845.5n ± 0%   -9.92% (p=0.002 n=6)
AddMulVVW/10000     9.459µ ± 0%   8.415µ ± 0%  -11.04% (p=0.002 n=6)
AddMulVVW/100000    94.57µ ± 0%   84.01µ ± 0%  -11.16% (p=0.002 n=6)
geomean             75.17n        71.21n        -5.27%

Change-Id: Idd79f5f02387564f4c2cc28d50b1c12bcd9a400f
Reviewed-on: https://go-review.googlesource.com/c/go/+/557915
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
src/crypto/internal/bigmod/nat_ppc64x.s
src/math/big/arith_ppc64x.s