cmd/compile: add prefetch intrinsic support on loong64
This CL enables intrinsic support to emit the following prefetch
instructions for loong64 platform:
1.Prefetch - prefetches data from memory address to cache;
2.PrefetchStreamed - prefetches data from memory address, with a
hint that this data is being streamed.
Benchmarks picked from go/test/bench/garbage
Parameters tested with:
GOMAXPROCS=8
tree2 -heapsize=
1000000000 -cpus=8
tree -n=18
parser
peano
Benchmarks Loongson-3A6000-HV @ 2500.00MHz:
| bench.old | bench.new |
| sec/op | sec/op vs base |
Tree2-8 1238.2µ ± 24% 999.9µ ± 453% ~ (p=0.089 n=10)
Tree-8 277.4m ± 1% 275.5m ± 1% ~ (p=0.063 n=10)
Parser-8 3.564 ± 0% 3.509 ± 1% -1.56% (p=0.000 n=10)
Peano-8 39.12m ± 2% 38.85m ± 2% ~ (p=0.353 n=10)
geomean 83.19m 78.28m -5.90%
Change-Id: I59e9aa4f609a106d4f70706e6d6d1fe6738ab72a
Reviewed-on: https://go-review.googlesource.com/c/go/+/671876
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>