]> Cypherpunks repositories - gostls13.git/commit
cmd/compile: lower Add64/Sub64 into ssa on PPC64
authorPaul E. Murphy <murp@ibm.com>
Tue, 8 Jun 2021 18:16:01 +0000 (13:16 -0500)
committerPaul Murphy <murp@ibm.com>
Tue, 10 May 2022 20:03:53 +0000 (20:03 +0000)
commit0c43878baa035db39d9bbf84ce8721cd8a97c78a
tree27bc4f48cd69efdeb8aeb3510befd583dbf9801e
parent526de61c67add8400d73b28bbed3f3680586c472
cmd/compile: lower Add64/Sub64 into ssa on PPC64

math/bits.Add64 and math/bits.Sub64 now lower and optimize
directly in SSA form.

The optimization of carry chains focuses around eliding
XER<->GPR transfers of the CA bit when used exclusively as an
input to a single carry operations, or when the CA value is
known.

This also adds support for handling XER spills in the assembler
which could happen if carry chains contain inter-dependencies
on each other (which seems very unlikely with practical usage),
or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA).

With PPC64 Add64/Sub64 lowering into SSA and this patch, the net
performance difference in crypto/elliptic benchmarks on P9/ppc64le
are:

name                                old time/op    new time/op    delta
ScalarBaseMult/P256                   46.3µs ± 0%    46.9µs ± 0%   +1.34%
ScalarBaseMult/P224                    356µs ± 0%     209µs ± 0%  -41.14%
ScalarBaseMult/P384                   1.20ms ± 0%    0.57ms ± 0%  -52.14%
ScalarBaseMult/P521                   3.38ms ± 0%    1.44ms ± 0%  -57.27%
ScalarMult/P256                        199µs ± 0%     199µs ± 0%   -0.17%
ScalarMult/P224                        357µs ± 0%     212µs ± 0%  -40.56%
ScalarMult/P384                       1.20ms ± 0%    0.58ms ± 0%  -51.86%
ScalarMult/P521                       3.37ms ± 0%    1.44ms ± 0%  -57.32%
MarshalUnmarshal/P256/Uncompressed    2.59µs ± 0%    2.52µs ± 0%   -2.63%
MarshalUnmarshal/P256/Compressed      2.58µs ± 0%    2.52µs ± 0%   -2.06%
MarshalUnmarshal/P224/Uncompressed    1.54µs ± 0%    1.40µs ± 0%   -9.42%
MarshalUnmarshal/P224/Compressed      1.54µs ± 0%    1.39µs ± 0%   -9.87%
MarshalUnmarshal/P384/Uncompressed    2.40µs ± 0%    1.80µs ± 0%  -24.93%
MarshalUnmarshal/P384/Compressed      2.35µs ± 0%    1.81µs ± 0%  -23.03%
MarshalUnmarshal/P521/Uncompressed    3.79µs ± 0%    2.58µs ± 0%  -31.81%
MarshalUnmarshal/P521/Compressed      3.80µs ± 0%    2.60µs ± 0%  -31.67%

Note, P256 uses an asm implementation, thus, little variation is expected.

Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71
Reviewed-on: https://go-review.googlesource.com/c/go/+/346870
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
src/cmd/asm/internal/asm/testdata/ppc64.s
src/cmd/compile/internal/ppc64/ssa.go
src/cmd/compile/internal/ssa/gen/PPC64.rules
src/cmd/compile/internal/ssa/gen/PPC64Ops.go
src/cmd/compile/internal/ssa/opGen.go
src/cmd/compile/internal/ssa/rewritePPC64.go
src/cmd/compile/internal/ssagen/ssa.go
src/cmd/internal/obj/ppc64/asm9.go
test/codegen/mathbits.go