Cypherpunks repositories - gostls13.git/commit

math/big: remove Direct Sqrt computation

The Float.Sqrt method switches (for performance reasons) between
direct (uses Quo) and inverse (doesn't) computation, depending on the
precision, with threshold 128.

Unfortunately the implementation of recursive division in CL 172018
made Quo slightly slower exactly in the range around and below the
threshold Sqrt is using, so this strategy is no longer profitable.

The new division algorithm allocates more, and this has increased the
amount of allocations performed by Sqrt when using the direct method;
on low precisions the computation is fast, so additional allocations
have an negative impact on performance.

Interestingly, only using the inverse method doesn't just reverse the
effects of the Quo algorithm change, but it seems to make performances
better overall for small precisions:

name                 old time/op    new time/op    delta
FloatSqrt/64-4          643ns ± 1%     635ns ± 1%   -1.24%  (p=0.000 n=10+10)
FloatSqrt/128-4        1.44µs ± 1%    1.02µs ± 1%  -29.25%  (p=0.000 n=10+10)
FloatSqrt/256-4        1.49µs ± 1%    1.49µs ± 1%     ~     (p=0.752 n=10+10)
FloatSqrt/1000-4       3.71µs ± 1%    3.74µs ± 1%   +0.87%  (p=0.001 n=10+10)
FloatSqrt/10000-4      35.3µs ± 1%    35.6µs ± 1%   +0.82%  (p=0.002 n=10+9)
FloatSqrt/100000-4      844µs ± 1%     844µs ± 0%     ~     (p=0.549 n=10+9)
FloatSqrt/1000000-4    69.5ms ± 0%    69.6ms ± 0%     ~     (p=0.222 n=9+9)

name                 old alloc/op   new alloc/op   delta
FloatSqrt/64-4           280B ± 0%      200B ± 0%  -28.57%  (p=0.000 n=10+10)
FloatSqrt/128-4          504B ± 0%      248B ± 0%  -50.79%  (p=0.000 n=10+10)
FloatSqrt/256-4          344B ± 0%      344B ± 0%     ~     (all equal)
FloatSqrt/1000-4       1.30kB ± 0%    1.30kB ± 0%     ~     (all equal)
FloatSqrt/10000-4      13.5kB ± 0%    13.5kB ± 0%     ~     (p=0.237 n=10+10)
FloatSqrt/100000-4      123kB ± 0%     123kB ± 0%     ~     (p=0.247 n=10+10)
FloatSqrt/1000000-4    1.83MB ± 1%    1.83MB ± 3%     ~     (p=0.779 n=8+10)

name                 old allocs/op  new allocs/op  delta
FloatSqrt/64-4           8.00 ± 0%      5.00 ± 0%  -37.50%  (p=0.000 n=10+10)
FloatSqrt/128-4          11.0 ± 0%       5.0 ± 0%  -54.55%  (p=0.000 n=10+10)
FloatSqrt/256-4          5.00 ± 0%      5.00 ± 0%     ~     (all equal)
FloatSqrt/1000-4         6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/10000-4        6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/100000-4       6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/1000000-4      10.3 ±13%      10.3 ±13%     ~     (p=1.000 n=10+10)

For example, 1.02µs for FloatSqrt/128 is actually better than what I
was getting on the same machine before the Quo changes.

The .8% slowdown on /1000 and /10000 appears to be real and it is
quite baffling (that codepath was not touched at all); it may be
caused by code alignment changes.

Change-Id: Ib03761cdc1055674bc7526d4f3a23d7a25094029
Reviewed-on: https://go-review.googlesource.com/c/go/+/228062
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>

author	Alberto Donizetti <alb.donizetti@gmail.com>
	Wed, 15 Apr 2020 08:50:30 +0000 (10:50 +0200)
committer	Robert Griesemer <gri@golang.org>
	Wed, 15 Apr 2020 16:37:53 +0000 (16:37 +0000)
commit	813f8eae2738c75151d036906a9008525c1ba0fe
tree	3846225c465f826584e49eadfa4a901f32a93537	tree
parent	435b9dd1a1bae81a32eafb59a9de7fb2873cd51e	commit \| diff