]> Cypherpunks repositories - gostls13.git/commit
cmd/compiler,internal/runtime/atomic: optimize Cas{64,32} on loong64
authorGuoqi Chen <chenguoqi@loongson.cn>
Fri, 20 Sep 2024 03:06:18 +0000 (11:06 +0800)
committerabner chenc <chenguoqi@loongson.cn>
Tue, 19 Nov 2024 01:15:07 +0000 (01:15 +0000)
commit5432cd96fd951bce01bbce9f9744b62871f79b17
treec39427705c0f47fd68575225c3509d4e9c9b824b
parentec7824b6bb12481e7ffe50b7f1cbaa1faf465a44
cmd/compiler,internal/runtime/atomic: optimize Cas{64,32} on loong64

In Loongson's new microstructure LA664 (Loongson-3A6000) and later, the atomic
compare-and-exchange instruction AMCAS[DB]{B,W,H,V} [1] is supported. Therefore,
the implementation of the atomic operation compare-and-swap can be selected according
to the CPUCFG flag LAMCAS: AMCASDB(full barrier) instruction is used on new
microstructures, and traditional LL-SC is used on LA464 (Loongson-3A5000) and older
microstructures. This can significantly improve the performance of Go programs on
new microstructures.

goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
         |  bench.old   |  bench.new                           |
         |   sec/op     |   sec/op       vs base               |
Cas        46.84n ±  0%   22.82n ±  0%  -51.28% (p=0.000 n=20)
Cas-2      47.58n ±  0%   29.57n ±  0%  -37.85% (p=0.000 n=20)
Cas-4      43.27n ± 20%   25.31n ± 13%  -41.50% (p=0.000 n=20)
Cas64      46.85n ±  0%   22.82n ±  0%  -51.29% (p=0.000 n=20)
Cas64-2    47.43n ±  0%   29.53n ±  0%  -37.74% (p=0.002 n=20)
Cas64-4    43.18n ±  0%   25.28n ±  2%  -41.46% (p=0.000 n=20)
geomean    45.82n         25.74n        -43.82%

goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000 @ 2500.00MHz
         |  bench.old  |  bench.new                         |
         |   sec/op    |   sec/op      vs base              |
Cas        50.05n ± 0%   51.26n ± 0%  +2.42% (p=0.000 n=20)
Cas-2      52.80n ± 0%   53.11n ± 0%  +0.59% (p=0.000 n=20)
Cas-4      55.97n ± 0%   57.31n ± 0%  +2.39% (p=0.000 n=20)
Cas64      50.05n ± 0%   51.26n ± 0%  +2.42% (p=0.000 n=20)
Cas64-2    52.68n ± 0%   53.11n ± 0%  +0.82% (p=0.000 n=20)
Cas64-4    55.96n ± 0%   57.26n ± 0%  +2.33% (p=0.000 n=20)
geomean    52.86n        53.83n       +1.82%

[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html

Change-Id: I9b777c63c124fb492f61c903f77061fa2b4e5322
Reviewed-on: https://go-review.googlesource.com/c/go/+/613396
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
15 files changed:
src/cmd/compile/internal/ir/symtab.go
src/cmd/compile/internal/loong64/ssa.go
src/cmd/compile/internal/ssa/_gen/LOONG64.rules
src/cmd/compile/internal/ssa/_gen/LOONG64Ops.go
src/cmd/compile/internal/ssa/opGen.go
src/cmd/compile/internal/ssa/rewriteLOONG64.go
src/cmd/compile/internal/ssagen/intrinsics.go
src/cmd/compile/internal/ssagen/ssa.go
src/cmd/compile/internal/typecheck/_builtin/runtime.go
src/cmd/compile/internal/typecheck/builtin.go
src/cmd/internal/goobj/builtinlist.go
src/internal/runtime/atomic/atomic_loong64.go
src/internal/runtime/atomic/atomic_loong64.s
src/runtime/cpuflags.go
src/runtime/proc.go