]> Cypherpunks repositories - gostls13.git/commit
cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x
authorMichael Munday <mike.munday@ibm.com>
Wed, 23 Oct 2019 13:43:23 +0000 (06:43 -0700)
committerBrad Fitzpatrick <bradfitz@golang.org>
Mon, 11 Nov 2019 15:23:59 +0000 (15:23 +0000)
commitb3885dbc93ceae1b12f7e80edd2696baf566edec
treec7f9c66dbe89b891b512347b0e56cf69d253a387
parent75c839af22a50cb027766ea54335e234dac32836
cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x

Intrinsify these functions to match other platforms. Update the
sequence of instructions used in the assembly implementations to
match the intrinsics.

Also, add a micro benchmark so we can more easily measure the
performance of these two functions:

name            old time/op  new time/op  delta
And8-8          5.33ns ± 7%  2.55ns ± 8%  -52.12%  (p=0.000 n=20+20)
And8Parallel-8  7.39ns ± 5%  3.74ns ± 4%  -49.34%  (p=0.000 n=20+20)
Or8-8           4.84ns ±15%  2.64ns ±11%  -45.50%  (p=0.000 n=20+20)
Or8Parallel-8   7.27ns ± 3%  3.84ns ± 4%  -47.10%  (p=0.000 n=19+20)

By using a 'rotate then xor selected bits' instruction combined with
either a 'load and and' or a 'load and or' instruction we can
implement And8 and Or8 with far fewer instructions. Replacing
'compare and swap' with atomic instructions may also improve
performance when there is contention.

Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
src/cmd/compile/internal/gc/ssa.go
src/cmd/compile/internal/s390x/ssa.go
src/cmd/compile/internal/ssa/gen/S390X.rules
src/cmd/compile/internal/ssa/gen/S390XOps.go
src/cmd/compile/internal/ssa/opGen.go
src/cmd/compile/internal/ssa/rewriteS390X.go
src/cmd/internal/obj/s390x/rotate.go [new file with mode: 0644]
src/runtime/internal/atomic/asm_s390x.s
src/runtime/internal/atomic/bench_test.go