]>
Cypherpunks repositories - gostls13.git/commit
math/big: implement subVV in riscv64 assembly
This provides an assembly implementation of subVV for riscv64,
processing up to four words per loop, resulting in a significant
performance gain.
On a StarFive VisionFive 2:
│ subvv.1 │ subvv.2 │
│ sec/op │ sec/op vs base │
SubVV/1-4 73.46n ± 0% 48.08n ± 0% -34.55% (p=0.000 n=10)
SubVV/2-4 88.13n ± 0% 58.76n ± 0% -33.33% (p=0.000 n=10)
SubVV/3-4 102.80n ± 0% 69.45n ± 0% -32.44% (p=0.000 n=10)
SubVV/4-4 117.50n ± 0% 72.11n ± 0% -38.63% (p=0.000 n=10)
SubVV/5-4 132.20n ± 0% 82.80n ± 0% -37.37% (p=0.000 n=10)
SubVV/10-4 216.3n ± 0% 126.9n ± 0% -41.33% (p=0.000 n=10)
SubVV/100-4 1659.0n ± 0% 886.5n ± 0% -46.56% (p=0.000 n=10)
SubVV/1000-4 16.089µ ± 0% 8.401µ ± 0% -47.78% (p=0.000 n=10)
SubVV/10000-4 244.7µ ± 0% 176.8µ ± 0% -27.74% (p=0.000 n=10)
SubVV/100000-4 2.562m ± 0% 1.871m ± 0% -26.96% (p=0.000 n=10)
geomean 1.436µ 904.4n -37.04%
│ subvv.1 │ subvv.2 │
│ B/s │ B/s vs base │
SubVV/1-4 830.9Mi ± 0% 1269.5Mi ± 0% +52.79% (p=0.000 n=10)
SubVV/2-4 1.353Gi ± 0% 2.029Gi ± 0% +49.99% (p=0.000 n=10)
SubVV/3-4 1.739Gi ± 0% 2.575Gi ± 0% +48.06% (p=0.000 n=10)
SubVV/4-4 2.029Gi ± 0% 3.306Gi ± 0% +62.96% (p=0.000 n=10)
SubVV/5-4 2.254Gi ± 0% 3.600Gi ± 0% +59.67% (p=0.000 n=10)
SubVV/10-4 2.755Gi ± 0% 4.699Gi ± 0% +70.53% (p=0.000 n=10)
SubVV/100-4 3.594Gi ± 0% 6.723Gi ± 0% +87.08% (p=0.000 n=10)
SubVV/1000-4 3.705Gi ± 0% 7.095Gi ± 0% +91.52% (p=0.000 n=10)
SubVV/10000-4 2.436Gi ± 0% 3.372Gi ± 0% +38.39% (p=0.000 n=10)
SubVV/100000-4 2.327Gi ± 0% 3.185Gi ± 0% +36.91% (p=0.000 n=10)
geomean 2.118Gi 3.364Gi +58.84%
Change-Id: I361cb3f4195b27a9f1e9486c9e1fdbeaa94d32b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/595396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>