]> Cypherpunks repositories - gostls13.git/commit
math: use SIMD to accelerate some scalar math functions on s390x
authorBill O'Farrell <billo@ca.ibm.com>
Sun, 30 Oct 2016 04:11:37 +0000 (00:11 -0400)
committerMichael Munday <munday@ca.ibm.com>
Fri, 11 Nov 2016 20:20:23 +0000 (20:20 +0000)
commitb6a15683f0c4d177b3711b55724506aebb03f764
tree9b8a4802f885983a80eb9d7d647b4ca42cbec3db
parent9f9d83404f938a0dfb98d3f4a4d420261606069a
math: use SIMD to accelerate some scalar math functions on s390x

Note, most math functions are structured to use stubs, so that they can
be accelerated with assembly on any platform.
Sinh, cosh, and tanh were not structued with stubs, so this CL does
that. This set of routines was chosen as likely to produce good speedups
with assembly on any platform.

Technique used was minimax polynomial approximation using tables of
polynomial coefficients, with argument range reduction.
A table of scaling factors was also used for cosh and log10.

                     before       after      speedup
BenchmarkCos         22.1 ns/op   6.79 ns/op  3.25x
BenchmarkCosh       125   ns/op  11.7  ns/op 10.68x
BenchmarkLog10       48.4 ns/op  12.5  ns/op  3.87x
BenchmarkSin         22.2 ns/op   6.55 ns/op  3.39x
BenchmarkSinh       125   ns/op  14.2  ns/op  8.80x
BenchmarkTanh        65.0 ns/op  15.1  ns/op  4.30x

Accuracy was tested against a high precision
reference function to determine maximum error.
Approximately 4,000,000 points were tested for each function,
producing the following result.
Note: ulperr is error in "units in the last place"

       max
      ulperr
sin    1.43 (returns NaN beyond +-2^50)
cos    1.79 (returns NaN beyond +-2^50)
cosh   1.05
sinh   3.02
tanh   3.69
log10  1.75

Also includes a set of tests to test non-vector functions even
when SIMD is enabled

Change-Id: Icb45f14d00864ee19ed973d209c3af21e4df4edc
Reviewed-on: https://go-review.googlesource.com/32352
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
16 files changed:
src/math/arith_s390x.go [new file with mode: 0644]
src/math/arith_s390x_test.go [new file with mode: 0644]
src/math/cosh_s390x.s [new file with mode: 0644]
src/math/export_s390x_test.go [new file with mode: 0644]
src/math/log10_s390x.s [new file with mode: 0644]
src/math/sin_s390x.s [new file with mode: 0644]
src/math/sinh.go
src/math/sinh_s390x.s [new file with mode: 0644]
src/math/sinh_stub.s [new file with mode: 0644]
src/math/stubs_arm64.s
src/math/stubs_mips64x.s
src/math/stubs_mipsx.s
src/math/stubs_ppc64x.s
src/math/stubs_s390x.s
src/math/tanh.go
src/math/tanh_s390x.s [new file with mode: 0644]