]>
Cypherpunks repositories - gostls13.git/commit
cmd/compile: emit classify instructions for infinity tests on riscv64
The 'classify' instruction on RISC-V sets a bit in a mask to indicate
the class a floating point value belongs to (e.g. whether the value is
an infinity, a normal number, a subnormal number and so on). There are
other places this instruction is useful but for now I've just used it
for infinity tests.
The gains are relatively small (~1-2 instructions per IsInf call) but
using FCLASSD does potentially unlock further optimizations. It also
reduces the number of loads from memory and the number of moves
between general purpose and floating point register files.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10)
Acosh 249.8n ± 0% 254.4n ± 0% +1.86% (p=0.000 n=10)
Asin 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10)
Asinh 292.2n ± 0% 283.0n ± 0% -3.15% (p=0.000 n=10)
Atan 119.1n ± 0% 119.0n ± 0% -0.08% (p=0.036 n=10)
Atanh 265.1n ± 0% 271.6n ± 0% +2.43% (p=0.000 n=10)
Atan2 194.9n ± 0% 186.7n ± 0% -4.23% (p=0.000 n=10)
Cbrt 216.3n ± 0% 203.1n ± 0% -6.10% (p=0.000 n=10)
Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.063 n=10)
Copysign 4.897n ± 0% 4.893n ± 3% -0.08% (p=0.038 n=10)
Cos 123.9n ± 0% 107.7n ± 1% -13.03% (p=0.000 n=10)
Cosh 293.0n ± 0% 264.6n ± 0% -9.68% (p=0.000 n=10)
Erf 150.0n ± 0% 133.8n ± 0% -10.80% (p=0.000 n=10)
Erfc 151.8n ± 0% 137.9n ± 0% -9.16% (p=0.000 n=10)
Erfinv 173.8n ± 0% 173.8n ± 0% ~ (p=0.820 n=10)
Erfcinv 173.8n ± 0% 173.8n ± 0% ~ (p=1.000 n=10)
Exp 247.7n ± 0% 220.4n ± 0% -11.04% (p=0.000 n=10)
ExpGo 261.4n ± 0% 232.5n ± 0% -11.04% (p=0.000 n=10)
Expm1 176.2n ± 0% 164.9n ± 0% -6.41% (p=0.000 n=10)
Exp2 220.4n ± 0% 190.2n ± 0% -13.70% (p=0.000 n=10)
Exp2Go 232.5n ± 0% 204.0n ± 0% -12.22% (p=0.000 n=10)
Abs 4.897n ± 0% 4.897n ± 0% ~ (p=0.726 n=10)
Dim 16.32n ± 0% 16.31n ± 0% ~ (p=0.770 n=10)
Floor 31.84n ± 0% 31.83n ± 0% ~ (p=0.677 n=10)
Max 26.11n ± 0% 26.13n ± 0% ~ (p=0.290 n=10)
Min 26.10n ± 0% 26.11n ± 0% ~ (p=0.424 n=10)
Mod 416.2n ± 0% 337.8n ± 0% -18.83% (p=0.000 n=10)
Frexp 63.65n ± 0% 50.60n ± 0% -20.50% (p=0.000 n=10)
Gamma 218.8n ± 0% 206.4n ± 0% -5.62% (p=0.000 n=10)
Hypot 92.20n ± 0% 94.69n ± 0% +2.70% (p=0.000 n=10)
HypotGo 107.7n ± 0% 109.3n ± 0% +1.49% (p=0.000 n=10)
Ilogb 59.54n ± 0% 44.04n ± 0% -26.04% (p=0.000 n=10)
J0 708.9n ± 0% 674.5n ± 0% -4.86% (p=0.000 n=10)
J1 707.6n ± 0% 676.1n ± 0% -4.44% (p=0.000 n=10)
Jn 1.513µ ± 0% 1.427µ ± 0% -5.68% (p=0.000 n=10)
Ldexp 70.20n ± 0% 57.09n ± 0% -18.68% (p=0.000 n=10)
Lgamma 201.5n ± 0% 185.3n ± 1% -8.01% (p=0.000 n=10)
Log 201.5n ± 0% 182.7n ± 0% -9.35% (p=0.000 n=10)
Logb 59.54n ± 0% 46.53n ± 0% -21.86% (p=0.000 n=10)
Log1p 178.8n ± 0% 173.9n ± 6% -2.74% (p=0.021 n=10)
Log10 201.4n ± 0% 184.3n ± 0% -8.49% (p=0.000 n=10)
Log2 79.17n ± 0% 66.07n ± 0% -16.54% (p=0.000 n=10)
Modf 34.27n ± 0% 34.25n ± 0% ~ (p=0.559 n=10)
Nextafter32 49.34n ± 0% 49.37n ± 0% +0.05% (p=0.040 n=10)
Nextafter64 43.66n ± 0% 43.66n ± 0% ~ (p=0.869 n=10)
PowInt 309.1n ± 0% 267.4n ± 0% -13.49% (p=0.000 n=10)
PowFrac 769.6n ± 0% 677.3n ± 0% -11.98% (p=0.000 n=10)
Pow10Pos 13.88n ± 0% 13.88n ± 0% ~ (p=0.811 n=10)
Pow10Neg 19.58n ± 0% 19.57n ± 0% ~ (p=0.993 n=10)
Round 23.65n ± 0% 23.66n ± 0% ~ (p=0.354 n=10)
RoundToEven 27.75n ± 0% 27.75n ± 0% ~ (p=0.971 n=10)
Remainder 380.0n ± 0% 309.9n ± 0% -18.45% (p=0.000 n=10)
Signbit 13.06n ± 0% 13.06n ± 0% ~ (p=1.000 n=10)
Sin 133.8n ± 0% 120.8n ± 0% -9.75% (p=0.000 n=10)
Sincos 160.7n ± 0% 147.7n ± 0% -8.12% (p=0.000 n=10)
Sinh 305.9n ± 0% 277.9n ± 0% -9.17% (p=0.000 n=10)
SqrtIndirect 3.265n ± 0% 3.264n ± 0% ~ (p=0.546 n=10)
SqrtLatency 19.58n ± 0% 19.58n ± 0% ~ (p=0.973 n=10)
SqrtIndirectLatency 19.59n ± 0% 19.58n ± 0% ~ (p=0.370 n=10)
SqrtGoLatency 205.7n ± 0% 202.7n ± 0% -1.46% (p=0.000 n=10)
SqrtPrime 4.953µ ± 0% 4.954µ ± 0% ~ (p=0.477 n=10)
Tan 163.2n ± 0% 150.2n ± 0% -7.99% (p=0.000 n=10)
Tanh 312.4n ± 0% 284.2n ± 0% -9.01% (p=0.000 n=10)
Trunc 31.83n ± 0% 31.83n ± 0% ~ (p=0.663 n=10)
Y0 701.0n ± 0% 669.2n ± 0% -4.54% (p=0.000 n=10)
Y1 704.5n ± 0% 672.4n ± 0% -4.55% (p=0.000 n=10)
Yn 1.490µ ± 0% 1.422µ ± 0% -4.60% (p=0.000 n=10)
Float64bits 5.713n ± 0% 5.710n ± 0% ~ (p=0.926 n=10)
Float64frombits 4.896n ± 0% 4.896n ± 0% ~ (p=0.663 n=10)
Float32bits 12.25n ± 0% 12.25n ± 0% ~ (p=0.571 n=10)
Float32frombits 4.898n ± 0% 4.896n ± 0% ~ (p=0.754 n=10)
FMA 4.895n ± 0% 4.895n ± 0% ~ (p=0.745 n=10)
geomean 94.40n 89.43n -5.27%
Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e
Reviewed-on: https://go-review.googlesource.com/c/go/+/348389
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>