]>
Cypherpunks repositories - gostls13.git/commit
runtime: improve CALLFN macro for loong64
The previous CALLFN macro was copying a single byte at a time
which is inefficient on loong64. In this CL, according to the
argsize, copy 16 bytes or 8 bytes at a time, and copy 1 byte
a time for the rest.
benchmark in reflect on 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: reflect
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
CallArgCopy/size=128 360.2n ± 0% 266.9n ± 0% -25.90% (p=0.000 n=20)
CallArgCopy/size=256 473.2n ± 0% 277.5n ± 0% -41.35% (p=0.000 n=20)
CallArgCopy/size=1024 1128.0n ± 0% 332.9n ± 0% -70.49% (p=0.000 n=20)
CallArgCopy/size=4096 3743.0n ± 0% 672.6n ± 0% -82.03% (p=0.000 n=20)
CallArgCopy/size=65536 58.888µ ± 0% 9.667µ ± 0% -83.58% (p=0.000 n=20)
geomean 2.116µ 693.4n -67.22%
| bench.old | bench.new |
| B/s | B/s vs base |
CallArgCopy/size=128 338.9Mi ± 0% 457.3Mi ± 0% +34.94% (p=0.000 n=20)
CallArgCopy/size=256 516.0Mi ± 0% 879.8Mi ± 0% +70.52% (p=0.000 n=20)
CallArgCopy/size=1024 865.5Mi ± 0% 2933.6Mi ± 0% +238.94% (p=0.000 n=20)
CallArgCopy/size=4096 1.019Gi ± 0% 5.672Gi ± 0% +456.52% (p=0.000 n=20)
CallArgCopy/size=65536 1.036Gi ± 0% 6.313Gi ± 0% +509.13% (p=0.000 n=20)
geomean 699.6Mi 2.085Gi +205.10%
goos: linux
goarch: loong64
pkg: reflect
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
CallArgCopy/size=128 466.6n ± 0% 368.7n ± 0% -20.98% (p=0.000 n=20)
CallArgCopy/size=256 579.4n ± 0% 384.6n ± 0% -33.62% (p=0.000 n=20)
CallArgCopy/size=1024 1273.0n ± 0% 492.0n ± 0% -61.35% (p=0.000 n=20)
CallArgCopy/size=4096 4049.0n ± 0% 978.1n ± 0% -75.84% (p=0.000 n=20)
CallArgCopy/size=65536 69.01µ ± 0% 14.50µ ± 0% -78.99% (p=0.000 n=20)
geomean 2.492µ 997.9n -59.96%
| bench.old | bench.new |
| B/s | B/s vs base |
CallArgCopy/size=128 261.6Mi ± 0% 331.0Mi ± 0% +26.54% (p=0.000 n=20)
CallArgCopy/size=256 421.4Mi ± 0% 634.8Mi ± 0% +50.66% (p=0.000 n=20)
CallArgCopy/size=1024 767.2Mi ± 0% 1985.0Mi ± 0% +158.75% (p=0.000 n=20)
CallArgCopy/size=4096 964.8Mi ± 0% 3993.8Mi ± 0% +313.95% (p=0.000 n=20)
CallArgCopy/size=65536 905.7Mi ± 0% 4310.6Mi ± 0% +375.97% (p=0.000 n=20)
geomean 593.9Mi 1.449Gi +149.76%
Change-Id: I9570395af80b2e4b760058098a1b5b07d4b37ad7
Reviewed-on: https://go-review.googlesource.com/c/go/+/627175
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>