]>
Cypherpunks repositories - gostls13.git/commit
crypto/internal/fips140/edwards25519/field: optimize carryPropagate
Using pure Go solution for ARM64 seems to perform better when the
operation order is slightly tweaked.
goos: linux
goarch: arm64
pkg: crypto/internal/fips140/edwards25519
│ OLD │ NEW │
│ sec/op │ sec/op vs base │
EncodingDecoding-4 158.7µ ± 0% 141.4µ ± 0% -10.88% (p=0.000 n=10)
ScalarBaseMult-4 281.2µ ± 0% 260.5µ ± 0% -7.35% (p=0.000 n=10)
ScalarMult-4 1008.9µ ± 0% 916.6µ ± 0% -9.15% (p=0.000 n=10)
VarTimeDoubleScalarBaseMult-4 1003.4µ ± 0% 909.6µ ± 0% -9.36% (p=0.000 n=10)
geomean 461.0µ 418.6µ -9.19%
pkg: crypto/internal/fips140/edwards25519/field
│ OLD │ NEW │
│ sec/op │ sec/op vs base │
Add-4 45.22n ± 0% 33.50n ± 0% -25.91% (p=0.000 n=10)
Multiply-4 454.0n ± 0% 406.8n ± 0% -10.41% (p=0.000 n=10)
Square-4 278.2n ± 0% 246.4n ± 0% -11.43% (p=0.000 n=10)
Invert-4 75.83µ ± 0% 67.37µ ± 0% -11.16% (p=0.000 n=10)
Mult32-4 78.66n ± 0% 78.68n ± 0% +0.02% (p=0.022 n=10)
Bytes-4 120.6n ± 0% 110.6n ± 0% -8.25% (p=0.000 n=10)
geomean 400.2n 354.0n -11.54%
goos: darwin
goarch: arm64
pkg: crypto/internal/fips140/edwards25519
cpu: Apple M1 Pro
│ OLD │ NEW │
│ sec/op │ sec/op vs base │
EncodingDecoding-10 10.095µ ± 0% 7.610µ ± 2% -24.62% (p=0.000 n=10)
ScalarBaseMult-10 12.65µ ± 0% 11.54µ ± 0% -8.80% (p=0.000 n=10)
ScalarMult-10 51.49µ ± 0% 38.59µ ± 2% -25.06% (p=0.000 n=10)
VarTimeDoubleScalarBaseMult-10 49.41µ ± 0% 37.10µ ± 0% -24.92% (p=0.000 n=10)
geomean 23.88µ 18.83µ -21.14%
pkg: crypto/internal/fips140/edwards25519/field
│ OLD │ NEW │
│ sec/op │ sec/op vs base │
Add-10 6.009n ± 1% 5.116n ± 5% -14.85% (p=0.000 n=10)
Multiply-10 19.59n ± 0% 18.00n ± 2% -8.14% (p=0.000 n=10)
Square-10 18.14n ± 0% 13.66n ± 0% -24.70% (p=0.000 n=10)
Invert-10 4.854µ ± 0% 3.629µ ± 0% -25.24% (p=0.000 n=10)
Mult32-10 6.151n ± 0% 6.165n ± 2% ~ (p=0.224 n=10)
Bytes-10 7.463n ± 1% 10.330n ± 8% +38.43% (p=0.000 n=10)
geomean 27.94n 25.74n -7.89%
tags: purego
goos: windows
goarch: amd64
pkg: crypto/internal/fips140/edwards25519
cpu: AMD Ryzen Threadripper 2950X 16-Core Processor
│ OLD │ NEW │
│ sec/op │ sec/op vs base │
EncodingDecoding-32 12.856µ ± 0% 9.557µ ± 1% -25.66% (p=0.000 n=10)
ScalarBaseMult-32 21.28µ ± 1% 19.14µ ± 2% -10.04% (p=0.000 n=10)
ScalarMult-32 74.83µ ± 1% 64.61µ ± 1% -13.65% (p=0.000 n=10)
VarTimeDoubleScalarBaseMult-32 73.85µ ± 0% 62.36µ ± 1% -15.56% (p=0.000 n=10)
geomean 35.06µ 29.30µ -16.44%
pkg: crypto/internal/fips140/edwards25519/field
│ OLD │ NEW │
│ sec/op │ sec/op vs base │
Add-32 5.700n ± 1% 4.879n ± 1% -14.40% (p=0.000 n=10)
Multiply-32 29.24n ± 2% 22.75n ± 2% -22.21% (p=0.000 n=10)
Square-32 23.06n ± 1% 16.46n ± 2% -28.60% (p=0.000 n=10)
Invert-32 5.952µ ± 2% 4.466µ ± 1% -24.97% (p=0.000 n=10)
Mult32-32 5.240n ± 1% 5.311n ± 1% +1.35% (p=0.006 n=10)
Bytes-32 12.39n ± 1% 11.51n ± 1% -7.10% (p=0.000 n=10)
geomean 33.78n 28.16n -16.63%
Change-Id: I71fa40307e803caec56227607ee666198e4c0b03
Reviewed-on: https://go-review.googlesource.com/c/go/+/650278
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>