]> Cypherpunks repositories - gostls13.git/commit
crypto/internal/fips140/edwards25519/field: optimize carryPropagate
authorEgon Elbre <egonelbre@gmail.com>
Tue, 18 Feb 2025 11:56:35 +0000 (13:56 +0200)
committerGopher Robot <gobot@golang.org>
Tue, 25 Feb 2025 20:07:31 +0000 (12:07 -0800)
commitc578670dcbbf088d899dc3e18cd5cbf7146e5697
tree6a0b2a15bbb6520e3009ea36665ed802047dc236
parent6719336428da455c0708c7f1564c873d6f6e2c6d
crypto/internal/fips140/edwards25519/field: optimize carryPropagate

Using pure Go solution for ARM64 seems to perform better when the
operation order is slightly tweaked.

goos: linux
goarch: arm64
pkg: crypto/internal/fips140/edwards25519
                              │     OLD      │                 NEW                 │
                              │    sec/op    │   sec/op     vs base                │
EncodingDecoding-4               158.7µ ± 0%   141.4µ ± 0%  -10.88% (p=0.000 n=10)
ScalarBaseMult-4                 281.2µ ± 0%   260.5µ ± 0%   -7.35% (p=0.000 n=10)
ScalarMult-4                    1008.9µ ± 0%   916.6µ ± 0%   -9.15% (p=0.000 n=10)
VarTimeDoubleScalarBaseMult-4   1003.4µ ± 0%   909.6µ ± 0%   -9.36% (p=0.000 n=10)
geomean                          461.0µ        418.6µ        -9.19%

pkg: crypto/internal/fips140/edwards25519/field
           │     OLD     │                 NEW                 │
           │   sec/op    │   sec/op     vs base                │
Add-4        45.22n ± 0%   33.50n ± 0%  -25.91% (p=0.000 n=10)
Multiply-4   454.0n ± 0%   406.8n ± 0%  -10.41% (p=0.000 n=10)
Square-4     278.2n ± 0%   246.4n ± 0%  -11.43% (p=0.000 n=10)
Invert-4     75.83µ ± 0%   67.37µ ± 0%  -11.16% (p=0.000 n=10)
Mult32-4     78.66n ± 0%   78.68n ± 0%   +0.02% (p=0.022 n=10)
Bytes-4      120.6n ± 0%   110.6n ± 0%   -8.25% (p=0.000 n=10)
geomean      400.2n        354.0n       -11.54%

goos: darwin
goarch: arm64
pkg: crypto/internal/fips140/edwards25519
cpu: Apple M1 Pro
                               │     OLD      │                 NEW                 │
                               │    sec/op    │   sec/op     vs base                │
EncodingDecoding-10              10.095µ ± 0%   7.610µ ± 2%  -24.62% (p=0.000 n=10)
ScalarBaseMult-10                 12.65µ ± 0%   11.54µ ± 0%   -8.80% (p=0.000 n=10)
ScalarMult-10                     51.49µ ± 0%   38.59µ ± 2%  -25.06% (p=0.000 n=10)
VarTimeDoubleScalarBaseMult-10    49.41µ ± 0%   37.10µ ± 0%  -24.92% (p=0.000 n=10)
geomean                           23.88µ        18.83µ       -21.14%

pkg: crypto/internal/fips140/edwards25519/field
            │     OLD     │                 NEW                  │
            │   sec/op    │    sec/op     vs base                │
Add-10        6.009n ± 1%    5.116n ± 5%  -14.85% (p=0.000 n=10)
Multiply-10   19.59n ± 0%    18.00n ± 2%   -8.14% (p=0.000 n=10)
Square-10     18.14n ± 0%    13.66n ± 0%  -24.70% (p=0.000 n=10)
Invert-10     4.854µ ± 0%    3.629µ ± 0%  -25.24% (p=0.000 n=10)
Mult32-10     6.151n ± 0%    6.165n ± 2%        ~ (p=0.224 n=10)
Bytes-10      7.463n ± 1%   10.330n ± 8%  +38.43% (p=0.000 n=10)
geomean       27.94n         25.74n        -7.89%

tags: purego
goos: windows
goarch: amd64
pkg: crypto/internal/fips140/edwards25519
cpu: AMD Ryzen Threadripper 2950X 16-Core Processor
                               │     OLD      │                 NEW                 │
                               │    sec/op    │   sec/op     vs base                │
EncodingDecoding-32              12.856µ ± 0%   9.557µ ± 1%  -25.66% (p=0.000 n=10)
ScalarBaseMult-32                 21.28µ ± 1%   19.14µ ± 2%  -10.04% (p=0.000 n=10)
ScalarMult-32                     74.83µ ± 1%   64.61µ ± 1%  -13.65% (p=0.000 n=10)
VarTimeDoubleScalarBaseMult-32    73.85µ ± 0%   62.36µ ± 1%  -15.56% (p=0.000 n=10)
geomean                           35.06µ        29.30µ       -16.44%

pkg: crypto/internal/fips140/edwards25519/field
            │     OLD     │                 NEW                 │
            │   sec/op    │   sec/op     vs base                │
Add-32        5.700n ± 1%   4.879n ± 1%  -14.40% (p=0.000 n=10)
Multiply-32   29.24n ± 2%   22.75n ± 2%  -22.21% (p=0.000 n=10)
Square-32     23.06n ± 1%   16.46n ± 2%  -28.60% (p=0.000 n=10)
Invert-32     5.952µ ± 2%   4.466µ ± 1%  -24.97% (p=0.000 n=10)
Mult32-32     5.240n ± 1%   5.311n ± 1%   +1.35% (p=0.006 n=10)
Bytes-32      12.39n ± 1%   11.51n ± 1%   -7.10% (p=0.000 n=10)
geomean       33.78n        28.16n       -16.63%

Change-Id: I71fa40307e803caec56227607ee666198e4c0b03
Reviewed-on: https://go-review.googlesource.com/c/go/+/650278
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
src/crypto/internal/fips140/edwards25519/field/fe.go
src/crypto/internal/fips140/edwards25519/field/fe_arm64.go [deleted file]
src/crypto/internal/fips140/edwards25519/field/fe_arm64.s [deleted file]
src/crypto/internal/fips140/edwards25519/field/fe_arm64_noasm.go [deleted file]
src/crypto/internal/fips140/edwards25519/field/fe_generic.go
src/crypto/internal/fips140/edwards25519/field/fe_test.go