]>
Cypherpunks repositories - gostls13.git/commit
crypto/sha512: improve performance of riscv64 assembly
Implement optimised versions of Maj and Ch, which reduce the number of
instructions required per round. Reorder instructions for better
interleaving.
This gives around a 10% gain on a StarFive VisionFive 2:
│ sha512.1 │ sha512.2 │
│ sec/op │ sec/op vs base │
Hash8Bytes/New-4 9.310µ ± 0% 8.564µ ± 0% -8.01% (p=0.000 n=10)
Hash8Bytes/Sum384-4 8.833µ ± 0% 7.980µ ± 0% -9.66% (p=0.000 n=10)
Hash8Bytes/Sum512-4 9.293µ ± 0% 8.162µ ± 0% -12.17% (p=0.000 n=10)
Hash1K/New-4 49.60µ ± 0% 44.33µ ± 0% -10.63% (p=0.000 n=10)
Hash1K/Sum384-4 48.93µ ± 0% 43.78µ ± 0% -10.53% (p=0.000 n=10)
Hash1K/Sum512-4 49.48µ ± 0% 43.96µ ± 0% -11.15% (p=0.000 n=10)
Hash8K/New-4 327.9µ ± 0% 292.6µ ± 0% -10.78% (p=0.000 n=10)
Hash8K/Sum384-4 327.3µ ± 0% 292.0µ ± 0% -10.77% (p=0.000 n=10)
Hash8K/Sum512-4 327.8µ ± 0% 292.2µ ± 0% -10.85% (p=0.000 n=10)
geomean 52.87µ 47.31µ -10.51%
│ sha512.1 │ sha512.2 │
│ B/s │ B/s vs base │
Hash8Bytes/New-4 839.8Ki ± 0% 908.2Ki ± 0% +8.14% (p=0.000 n=10)
Hash8Bytes/Sum384-4 888.7Ki ± 1% 976.6Ki ± 0% +9.89% (p=0.000 n=10)
Hash8Bytes/Sum512-4 839.8Ki ± 0% 957.0Ki ± 0% +13.95% (p=0.000 n=10)
Hash1K/New-4 19.69Mi ± 0% 22.03Mi ± 0% +11.86% (p=0.000 n=10)
Hash1K/Sum384-4 19.96Mi ± 0% 22.31Mi ± 0% +11.75% (p=0.000 n=10)
Hash1K/Sum512-4 19.74Mi ± 0% 22.21Mi ± 0% +12.51% (p=0.000 n=10)
Hash8K/New-4 23.82Mi ± 0% 26.70Mi ± 0% +12.09% (p=0.000 n=10)
Hash8K/Sum384-4 23.87Mi ± 0% 26.75Mi ± 0% +12.07% (p=0.000 n=10)
Hash8K/Sum512-4 23.83Mi ± 0% 26.73Mi ± 0% +12.16% (p=0.000 n=10)
geomean 7.334Mi 8.184Mi +11.59%
Change-Id: I66e359e96b25b38efbc4d840e6b2d6a1e5d417ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/605495
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>