]>
Cypherpunks repositories - gostls13.git/commit
math/big: implement addVV in riscv64 assembly
This provides an assembly implementation of addVV for riscv64,
processing up to four words per loop, resulting in a significant
performance gain.
On a StarFive VisionFive 2:
│ addvv.1 │ addvv.2 │
│ sec/op │ sec/op vs base │
AddVV/1-4 73.45n ± 0% 48.08n ± 0% -34.54% (p=0.000 n=10)
AddVV/2-4 88.14n ± 0% 58.76n ± 0% -33.33% (p=0.000 n=10)
AddVV/3-4 102.80n ± 0% 69.44n ± 0% -32.45% (p=0.000 n=10)
AddVV/4-4 117.50n ± 0% 72.18n ± 0% -38.57% (p=0.000 n=10)
AddVV/5-4 132.20n ± 0% 82.79n ± 0% -37.38% (p=0.000 n=10)
AddVV/10-4 216.3n ± 0% 126.8n ± 0% -41.35% (p=0.000 n=10)
AddVV/100-4 1659.0n ± 0% 885.2n ± 0% -46.64% (p=0.000 n=10)
AddVV/1000-4 16.089µ ± 0% 8.400µ ± 0% -47.79% (p=0.000 n=10)
AddVV/10000-4 245.3µ ± 0% 176.9µ ± 0% -27.88% (p=0.000 n=10)
AddVV/100000-4 2.537m ± 0% 1.873m ± 0% -26.17% (p=0.000 n=10)
geomean 1.435µ 904.5n -36.99%
│ addvv.1 │ addvv.2 │
│ B/s │ B/s vs base │
AddVV/1-4 830.9Mi ± 0% 1269.5Mi ± 0% +52.78% (p=0.000 n=10)
AddVV/2-4 1.353Gi ± 0% 2.029Gi ± 0% +50.00% (p=0.000 n=10)
AddVV/3-4 1.739Gi ± 0% 2.575Gi ± 0% +48.09% (p=0.000 n=10)
AddVV/4-4 2.029Gi ± 0% 3.303Gi ± 0% +62.82% (p=0.000 n=10)
AddVV/5-4 2.254Gi ± 0% 3.600Gi ± 0% +59.69% (p=0.000 n=10)
AddVV/10-4 2.755Gi ± 0% 4.699Gi ± 0% +70.54% (p=0.000 n=10)
AddVV/100-4 3.594Gi ± 0% 6.734Gi ± 0% +87.37% (p=0.000 n=10)
AddVV/1000-4 3.705Gi ± 0% 7.096Gi ± 0% +91.54% (p=0.000 n=10)
AddVV/10000-4 2.430Gi ± 0% 3.369Gi ± 0% +38.65% (p=0.000 n=10)
AddVV/100000-4 2.350Gi ± 0% 3.183Gi ± 0% +35.44% (p=0.000 n=10)
geomean 2.119Gi 3.364Gi +58.71%
Change-Id: I727b3d9f8ab01eada7270046480b1430d56d0a96
Reviewed-on: https://go-review.googlesource.com/c/go/+/595395
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>