]> Cypherpunks repositories - gostls13.git/commit
cmd/compile: emit fused multiply-{add,subtract} instructions on s390x
authorMichael Munday <munday@ca.ibm.com>
Mon, 13 Feb 2017 03:12:12 +0000 (22:12 -0500)
committerMichael Munday <munday@ca.ibm.com>
Tue, 28 Feb 2017 15:34:20 +0000 (15:34 +0000)
commitbd8a39b67a12ec3d271305105dea3b8521aa70bf
tree840138c7b3f628be2a3074d1ddf3eaac075f7070
parenta38a2d02d5c26adb34ec18b495a1d698517c1735
cmd/compile: emit fused multiply-{add,subtract} instructions on s390x

Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:

    - (a * b) + c        // can emit fused multiply-add
    - float64(a * b) + c // cannot emit fused multiply-add

float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).

Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.

Improves the performance of the floating point implementation of
poly1305:

name         old speed     new speed     delta
64           246MB/s ± 0%  275MB/s ± 0%  +11.48%   (p=0.000 n=10+8)
1K           312MB/s ± 0%  357MB/s ± 0%  +14.41%  (p=0.000 n=10+10)
64Unaligned  246MB/s ± 0%  274MB/s ± 0%  +11.43%  (p=0.000 n=10+10)
1KUnaligned  312MB/s ± 0%  357MB/s ± 0%  +14.39%   (p=0.000 n=10+8)

Updates #17895.

Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
28 files changed:
src/cmd/compile/internal/gc/asm_test.go
src/cmd/compile/internal/gc/const.go
src/cmd/compile/internal/gc/ssa.go
src/cmd/compile/internal/gc/testdata/fp.go
src/cmd/compile/internal/gc/typecheck.go
src/cmd/compile/internal/s390x/prog.go
src/cmd/compile/internal/s390x/ssa.go
src/cmd/compile/internal/ssa/gen/386.rules
src/cmd/compile/internal/ssa/gen/AMD64.rules
src/cmd/compile/internal/ssa/gen/ARM.rules
src/cmd/compile/internal/ssa/gen/ARM64.rules
src/cmd/compile/internal/ssa/gen/MIPS.rules
src/cmd/compile/internal/ssa/gen/MIPS64.rules
src/cmd/compile/internal/ssa/gen/PPC64.rules
src/cmd/compile/internal/ssa/gen/S390X.rules
src/cmd/compile/internal/ssa/gen/S390XOps.go
src/cmd/compile/internal/ssa/gen/generic.rules
src/cmd/compile/internal/ssa/gen/genericOps.go
src/cmd/compile/internal/ssa/opGen.go
src/cmd/compile/internal/ssa/rewrite386.go
src/cmd/compile/internal/ssa/rewriteAMD64.go
src/cmd/compile/internal/ssa/rewriteARM.go
src/cmd/compile/internal/ssa/rewriteARM64.go
src/cmd/compile/internal/ssa/rewriteMIPS.go
src/cmd/compile/internal/ssa/rewriteMIPS64.go
src/cmd/compile/internal/ssa/rewritePPC64.go
src/cmd/compile/internal/ssa/rewriteS390X.go
src/cmd/compile/internal/ssa/rewritegeneric.go