[release-branch.go1.19] cmd/compile: sign-extend the 2nd argument of the LoweredAtomicCas32 on loong64,mips64x,riscv64
The function LoweredAtomicCas32 is implemented using the LL-SC instruction pair
on loong64, mips64x, riscv64. However,the LL instruction on loong64, mips64x,
riscv64 is sign-extended, so it is necessary to sign-extend the 2nd parameter
"old" of the LoweredAtomicCas32, so that the instruction BNE after LL can get
the desired result.
The function prototype of LoweredAtomicCas32 in golang:
func Cas32(ptr *uint32, old, new uint32) bool
When using an intrinsify implementation:
case 1: (*ptr) <= 0x80000000 && old < 0x80000000
E.g: (*ptr) = 0x7FFFFFFF, old = Rarg1= 0x7FFFFFFF
After run the instruction "LL (Rarg0), Rtmp": Rtmp = 0x7FFFFFFF
Rtmp ! = Rarg1(old) is false, the result we expect
case 2: (*ptr) >= 0x80000000 && old >= 0x80000000
E.g: (*ptr) = 0x80000000, old = Rarg1= 0x80000000
After run the instruction "LL (Rarg0), Rtmp": Rtmp = 0xFFFFFFFF_80000000
Rtmp ! = Rarg1(old) is true, which we do not expect
When using an non-intrinsify implementation:
Because Rarg1 is loaded from the stack using sign-extended instructions
ld.w, the situation described in Case 2 above does not occur
Benchmarks on linux/loong64:
name old time/op new time/op delta
Cas 50.0ns ± 0% 50.1ns ± 0% ~ (p=1.000 n=1+1)
Cas64 50.0ns ± 0% 50.1ns ± 0% ~ (p=1.000 n=1+1)
Cas-4 56.0ns ± 0% 56.0ns ± 0% ~ (p=1.000 n=1+1)
Cas64-4 56.0ns ± 0% 56.0ns ± 0% ~ (p=1.000 n=1+1)