]> Cypherpunks repositories - gostls13.git/commit
math/big: simplify, speed up Karatsuba multiplication
authorRuss Cox <rsc@golang.org>
Fri, 17 Jan 2025 22:26:59 +0000 (17:26 -0500)
committerGopher Robot <gobot@golang.org>
Wed, 12 Mar 2025 12:41:17 +0000 (05:41 -0700)
commitd037ed62bc583af358b2cc5aeb151872a6ba7c2e
treee5b881cc98e4da42322189a19efb2a3546cc22b2
parent26040b1dd7e4e8f7957b2a918c01f3343249c289
math/big: simplify, speed up Karatsuba multiplication

The old Karatsuba implementation only operated on lengths that are
a power of two times a number smaller than karatsubaThreshold.
For example, when karatsubaThreshold = 40, multiplying a pair
of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words
and then has to fix up the answer to include the high 3 words of each.

I suspect this requirement was needed to make the analysis of
how many temporary words to reserve easier, back when the
answer was 3*n and depended on exactly halving the size at
each Karatsuba step.

Now that we have the more flexible temporary allocation stack,
we can change Karatsuba to accept operands of odd length.
Doing so avoids most of the fixup that the old approach required.
For example, multiplying a pair of 99-word numbers runs
karatsuba on all 99 words now.

This is simpler and about the same speed or, for large cases, faster.

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
GCD10x10/WithoutXY-16         99.62n ± 3%   99.10n ± 3%        ~ (p=0.009 n=15)
GCD10x10/WithXY-16            243.4n ± 1%   245.2n ± 1%        ~ (p=0.009 n=15)
GCD100x100/WithoutXY-16       921.9n ± 1%   919.2n ± 1%        ~ (p=0.076 n=15)
GCD100x100/WithXY-16          1.527µ ± 1%   1.526µ ± 0%        ~ (p=0.813 n=15)
GCD1000x1000/WithoutXY-16     9.704µ ± 1%   9.696µ ± 0%        ~ (p=0.532 n=15)
GCD1000x1000/WithXY-16        14.03µ ± 1%   13.96µ ± 0%        ~ (p=0.014 n=15)
GCD10000x10000/WithoutXY-16   206.5µ ± 2%   206.5µ ± 0%        ~ (p=0.967 n=15)
GCD10000x10000/WithXY-16      398.0µ ± 1%   397.4µ ± 0%        ~ (p=0.683 n=15)
Div/20/10-16                  22.22n ± 0%   22.23n ± 0%        ~ (p=0.105 n=15)
Div/40/20-16                  22.23n ± 0%   22.23n ± 0%        ~ (p=0.307 n=15)
Div/100/50-16                 55.47n ± 0%   55.47n ± 0%        ~ (p=0.573 n=15)
Div/200/100-16                174.9n ± 1%   174.6n ± 1%        ~ (p=0.814 n=15)
Div/400/200-16                209.5n ± 1%   210.5n ± 1%        ~ (p=0.454 n=15)
Div/1000/500-16               379.9n ± 0%   383.5n ± 2%        ~ (p=0.123 n=15)
Div/2000/1000-16              780.1n ± 0%   784.6n ± 1%   +0.58% (p=0.000 n=15)
Div/20000/10000-16            25.22µ ± 1%   25.15µ ± 0%        ~ (p=0.213 n=15)
Div/200000/100000-16          921.8µ ± 1%   926.1µ ± 0%        ~ (p=0.009 n=15)
Div/2000000/1000000-16        37.91m ± 0%   35.63m ± 0%   -6.02% (p=0.000 n=15)
Div/20000000/10000000-16       1.378 ± 0%    1.336 ± 0%   -3.03% (p=0.000 n=15)
NatMul/10-16                  166.8n ± 4%   168.9n ± 3%        ~ (p=0.008 n=15)
NatMul/100-16                 5.519µ ± 2%   5.548µ ± 4%        ~ (p=0.032 n=15)
NatMul/1000-16                230.4µ ± 1%   220.2µ ± 1%   -4.43% (p=0.000 n=15)
NatMul/10000-16               8.569m ± 1%   8.640m ± 1%        ~ (p=0.005 n=15)
NatMul/100000-16              376.5m ± 1%   334.1m ± 0%  -11.26% (p=0.000 n=15)
NatSqr/1-16                   27.85n ± 5%   28.60n ± 2%        ~ (p=0.123 n=15)
NatSqr/2-16                   47.99n ± 2%   48.84n ± 1%        ~ (p=0.008 n=15)
NatSqr/3-16                   59.41n ± 2%   60.87n ± 2%   +2.46% (p=0.001 n=15)
NatSqr/5-16                   87.27n ± 2%   89.31n ± 3%        ~ (p=0.087 n=15)
NatSqr/8-16                   124.6n ± 3%   128.9n ± 3%        ~ (p=0.006 n=15)
NatSqr/10-16                  166.3n ± 3%   172.7n ± 3%        ~ (p=0.002 n=15)
NatSqr/20-16                  385.2n ± 2%   394.7n ± 3%        ~ (p=0.036 n=15)
NatSqr/30-16                  622.7n ± 3%   642.9n ± 3%        ~ (p=0.032 n=15)
NatSqr/50-16                  1.274µ ± 3%   1.323µ ± 4%        ~ (p=0.003 n=15)
NatSqr/80-16                  2.606µ ± 4%   2.714µ ± 4%        ~ (p=0.044 n=15)
NatSqr/100-16                 3.731µ ± 4%   3.871µ ± 4%        ~ (p=0.038 n=15)
NatSqr/200-16                 12.99µ ± 2%   13.09µ ± 3%        ~ (p=0.838 n=15)
NatSqr/300-16                 22.87µ ± 2%   23.25µ ± 2%        ~ (p=0.285 n=15)
NatSqr/500-16                 58.43µ ± 1%   58.25µ ± 2%        ~ (p=0.345 n=15)
NatSqr/800-16                 115.3µ ± 3%   116.2µ ± 3%        ~ (p=0.126 n=15)
NatSqr/1000-16                173.9µ ± 1%   174.3µ ± 1%        ~ (p=0.935 n=15)
NatSqr/10000-16               6.133m ± 2%   6.034m ± 1%   -1.62% (p=0.000 n=15)
NatSqr/100000-16              253.8m ± 1%   241.5m ± 0%   -4.87% (p=0.000 n=15)
geomean                       7.745µ        7.760µ        +0.19%

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
                            │     old     │                 new                  │
                            │   sec/op    │    sec/op     vs base                │
GCD10x10/WithoutXY-88         62.17n ± 4%   61.44n ±  0%   -1.17% (p=0.000 n=15)
GCD10x10/WithXY-88            173.4n ± 2%   172.4n ±  4%        ~ (p=0.615 n=15)
GCD100x100/WithoutXY-88       584.0n ± 1%   582.9n ±  0%        ~ (p=0.009 n=15)
GCD100x100/WithXY-88          1.098µ ± 1%   1.091µ ±  2%        ~ (p=0.002 n=15)
GCD1000x1000/WithoutXY-88     6.055µ ± 0%   6.049µ ±  0%        ~ (p=0.007 n=15)
GCD1000x1000/WithXY-88        9.430µ ± 0%   9.417µ ±  1%        ~ (p=0.123 n=15)
GCD10000x10000/WithoutXY-88   153.4µ ± 2%   149.0µ ±  2%   -2.85% (p=0.000 n=15)
GCD10000x10000/WithXY-88      350.6µ ± 3%   349.0µ ±  2%        ~ (p=0.126 n=15)
Div/20/10-88                  13.12n ± 0%   13.12n ±  1%    0.00% (p=0.042 n=15)
Div/40/20-88                  13.12n ± 0%   13.13n ±  0%        ~ (p=0.004 n=15)
Div/100/50-88                 25.49n ± 0%   25.49n ±  0%        ~ (p=0.452 n=15)
Div/200/100-88                115.7n ± 2%   113.8n ±  2%        ~ (p=0.212 n=15)
Div/400/200-88                135.0n ± 1%   136.1n ±  1%        ~ (p=0.005 n=15)
Div/1000/500-88               257.5n ± 1%   259.9n ±  1%        ~ (p=0.004 n=15)
Div/2000/1000-88              567.5n ± 1%   572.4n ±  2%        ~ (p=0.616 n=15)
Div/20000/10000-88            25.65µ ± 0%   25.77µ ±  1%        ~ (p=0.032 n=15)
Div/200000/100000-88          777.4µ ± 1%   754.3µ ±  1%   -2.97% (p=0.000 n=15)
Div/2000000/1000000-88        33.66m ± 0%   31.37m ±  0%   -6.81% (p=0.000 n=15)
Div/20000000/10000000-88       1.320 ± 0%    1.266 ±  0%   -4.04% (p=0.000 n=15)
NatMul/10-88                  151.9n ± 7%   143.3n ±  7%        ~ (p=0.878 n=15)
NatMul/100-88                 4.418µ ± 2%   4.337µ ±  3%        ~ (p=0.512 n=15)
NatMul/1000-88                206.8µ ± 1%   189.8µ ±  1%   -8.25% (p=0.000 n=15)
NatMul/10000-88               8.531m ± 1%   8.095m ±  0%   -5.12% (p=0.000 n=15)
NatMul/100000-88              298.9m ± 0%   260.5m ±  1%  -12.85% (p=0.000 n=15)
NatSqr/1-88                   27.55n ± 6%   28.25n ±  7%        ~ (p=0.024 n=15)
NatSqr/2-88                   44.71n ± 6%   46.21n ±  9%        ~ (p=0.024 n=15)
NatSqr/3-88                   55.44n ± 4%   58.41n ± 10%        ~ (p=0.126 n=15)
NatSqr/5-88                   80.71n ± 5%   81.41n ±  5%        ~ (p=0.032 n=15)
NatSqr/8-88                   115.7n ± 4%   115.4n ±  5%        ~ (p=0.814 n=15)
NatSqr/10-88                  147.4n ± 4%   147.3n ±  4%        ~ (p=0.505 n=15)
NatSqr/20-88                  337.8n ± 3%   337.3n ±  4%        ~ (p=0.814 n=15)
NatSqr/30-88                  556.9n ± 3%   557.6n ±  4%        ~ (p=0.814 n=15)
NatSqr/50-88                  1.208µ ± 4%   1.208µ ±  3%        ~ (p=0.910 n=15)
NatSqr/80-88                  2.591µ ± 3%   2.581µ ±  3%        ~ (p=0.705 n=15)
NatSqr/100-88                 3.870µ ± 3%   3.858µ ±  3%        ~ (p=0.846 n=15)
NatSqr/200-88                 14.43µ ± 3%   14.28µ ±  2%        ~ (p=0.383 n=15)
NatSqr/300-88                 24.68µ ± 2%   24.49µ ±  2%        ~ (p=0.624 n=15)
NatSqr/500-88                 66.27µ ± 1%   66.18µ ±  1%        ~ (p=0.735 n=15)
NatSqr/800-88                 128.7µ ± 1%   127.4µ ±  1%        ~ (p=0.050 n=15)
NatSqr/1000-88                198.7µ ± 1%   197.7µ ±  1%        ~ (p=0.229 n=15)
NatSqr/10000-88               6.582m ± 1%   6.426m ±  1%   -2.37% (p=0.000 n=15)
NatSqr/100000-88              274.3m ± 0%   267.3m ±  0%   -2.57% (p=0.000 n=15)
geomean                       6.518µ        6.438µ         -1.22%

goos: linux
goarch: arm64
pkg: math/big
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
GCD10x10/WithoutXY-16         61.70n ± 1%   61.32n ± 1%        ~ (p=0.361 n=15)
GCD10x10/WithXY-16            217.3n ± 1%   217.0n ± 1%        ~ (p=0.395 n=15)
GCD100x100/WithoutXY-16       569.7n ± 0%   572.6n ± 2%        ~ (p=0.213 n=15)
GCD100x100/WithXY-16          1.241µ ± 1%   1.236µ ± 1%        ~ (p=0.157 n=15)
GCD1000x1000/WithoutXY-16     5.558µ ± 0%   5.566µ ± 0%        ~ (p=0.228 n=15)
GCD1000x1000/WithXY-16        9.319µ ± 0%   9.326µ ± 0%        ~ (p=0.233 n=15)
GCD10000x10000/WithoutXY-16   126.4µ ± 2%   128.7µ ± 3%        ~ (p=0.081 n=15)
GCD10000x10000/WithXY-16      279.3µ ± 0%   278.3µ ± 5%        ~ (p=0.187 n=15)
Div/20/10-16                  15.12n ± 1%   15.21n ± 1%        ~ (p=0.490 n=15)
Div/40/20-16                  15.11n ± 0%   15.23n ± 1%        ~ (p=0.107 n=15)
Div/100/50-16                 26.53n ± 0%   26.50n ± 0%        ~ (p=0.299 n=15)
Div/200/100-16                123.7n ± 0%   124.0n ± 0%        ~ (p=0.086 n=15)
Div/400/200-16                142.5n ± 0%   142.4n ± 0%        ~ (p=0.039 n=15)
Div/1000/500-16               259.9n ± 1%   261.2n ± 1%        ~ (p=0.044 n=15)
Div/2000/1000-16              539.4n ± 1%   532.3n ± 1%   -1.32% (p=0.001 n=15)
Div/20000/10000-16            22.43µ ± 0%   22.32µ ± 0%   -0.49% (p=0.000 n=15)
Div/200000/100000-16          898.3µ ± 0%   889.6µ ± 0%   -0.96% (p=0.000 n=15)
Div/2000000/1000000-16        38.37m ± 0%   35.11m ± 0%   -8.49% (p=0.000 n=15)
Div/20000000/10000000-16       1.449 ± 0%    1.384 ± 0%   -4.48% (p=0.000 n=15)
NatMul/10-16                  182.0n ± 1%   177.8n ± 1%   -2.31% (p=0.000 n=15)
NatMul/100-16                 5.537µ ± 0%   5.693µ ± 0%   +2.82% (p=0.000 n=15)
NatMul/1000-16                229.9µ ± 0%   224.8µ ± 0%   -2.24% (p=0.000 n=15)
NatMul/10000-16               8.985m ± 0%   8.751m ± 0%   -2.61% (p=0.000 n=15)
NatMul/100000-16              371.1m ± 0%   331.5m ± 0%  -10.66% (p=0.000 n=15)
NatSqr/1-16                   46.77n ± 6%   42.76n ± 1%   -8.57% (p=0.000 n=15)
NatSqr/2-16                   66.99n ± 4%   63.62n ± 1%   -5.03% (p=0.000 n=15)
NatSqr/3-16                   76.79n ± 4%   73.42n ± 1%        ~ (p=0.007 n=15)
NatSqr/5-16                   99.00n ± 3%   95.35n ± 1%   -3.69% (p=0.000 n=15)
NatSqr/8-16                   160.0n ± 3%   155.1n ± 1%   -3.06% (p=0.001 n=15)
NatSqr/10-16                  178.4n ± 2%   175.9n ± 0%   -1.40% (p=0.001 n=15)
NatSqr/20-16                  361.9n ± 2%   361.3n ± 0%        ~ (p=0.083 n=15)
NatSqr/30-16                  584.7n ± 0%   586.8n ± 0%   +0.36% (p=0.000 n=15)
NatSqr/50-16                  1.327µ ± 0%   1.329µ ± 0%        ~ (p=0.349 n=15)
NatSqr/80-16                  2.893µ ± 1%   2.925µ ± 0%   +1.11% (p=0.000 n=15)
NatSqr/100-16                 4.330µ ± 1%   4.381µ ± 0%   +1.18% (p=0.000 n=15)
NatSqr/200-16                 16.25µ ± 1%   16.43µ ± 0%   +1.07% (p=0.000 n=15)
NatSqr/300-16                 27.85µ ± 1%   28.06µ ± 0%   +0.77% (p=0.000 n=15)
NatSqr/500-16                 76.01µ ± 0%   76.34µ ± 0%        ~ (p=0.002 n=15)
NatSqr/800-16                 146.8µ ± 0%   148.1µ ± 0%   +0.83% (p=0.000 n=15)
NatSqr/1000-16                228.2µ ± 0%   228.6µ ± 0%        ~ (p=0.123 n=15)
NatSqr/10000-16               7.524m ± 0%   7.426m ± 0%   -1.31% (p=0.000 n=15)
NatSqr/100000-16              316.7m ± 0%   309.2m ± 0%   -2.36% (p=0.000 n=15)
geomean                       7.264µ        7.172µ        -1.27%

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                            │     old     │                new                 │
                            │   sec/op    │   sec/op     vs base               │
GCD10x10/WithoutXY-12         32.61n ± 1%   32.42n ± 1%       ~ (p=0.021 n=15)
GCD10x10/WithXY-12            87.70n ± 1%   88.42n ± 1%       ~ (p=0.010 n=15)
GCD100x100/WithoutXY-12       305.9n ± 0%   306.4n ± 0%       ~ (p=0.003 n=15)
GCD100x100/WithXY-12          560.3n ± 2%   556.6n ± 1%       ~ (p=0.018 n=15)
GCD1000x1000/WithoutXY-12     3.509µ ± 2%   3.464µ ± 1%       ~ (p=0.145 n=15)
GCD1000x1000/WithXY-12        5.347µ ± 2%   5.372µ ± 1%       ~ (p=0.046 n=15)
GCD10000x10000/WithoutXY-12   73.75µ ± 1%   73.99µ ± 1%       ~ (p=0.004 n=15)
GCD10000x10000/WithXY-12      148.4µ ± 0%   147.8µ ± 1%       ~ (p=0.076 n=15)
Div/20/10-12                  9.481n ± 0%   9.462n ± 1%       ~ (p=0.631 n=15)
Div/40/20-12                  9.457n ± 0%   9.462n ± 1%       ~ (p=0.798 n=15)
Div/100/50-12                 14.91n ± 0%   14.79n ± 1%  -0.80% (p=0.000 n=15)
Div/200/100-12                84.56n ± 1%   84.60n ± 1%       ~ (p=0.271 n=15)
Div/400/200-12                103.8n ± 0%   102.8n ± 0%  -0.96% (p=0.000 n=15)
Div/1000/500-12               181.3n ± 1%   184.2n ± 2%       ~ (p=0.091 n=15)
Div/2000/1000-12              397.5n ± 0%   397.4n ± 0%       ~ (p=0.299 n=15)
Div/20000/10000-12            14.04µ ± 1%   13.99µ ± 0%       ~ (p=0.221 n=15)
Div/200000/100000-12          523.1µ ± 0%   514.0µ ± 3%       ~ (p=0.775 n=15)
Div/2000000/1000000-12        21.58m ± 0%   20.01m ± 1%  -7.29% (p=0.000 n=15)
Div/20000000/10000000-12      813.5m ± 0%   796.2m ± 1%  -2.13% (p=0.000 n=15)
NatMul/10-12                  80.46n ± 1%   80.02n ± 1%       ~ (p=0.063 n=15)
NatMul/100-12                 2.904µ ± 0%   2.979µ ± 1%  +2.58% (p=0.000 n=15)
NatMul/1000-12                127.8µ ± 0%   122.3µ ± 0%  -4.28% (p=0.000 n=15)
NatMul/10000-12               5.141m ± 0%   4.975m ± 1%  -3.23% (p=0.000 n=15)
NatMul/100000-12              208.8m ± 0%   189.6m ± 3%  -9.21% (p=0.000 n=15)
NatSqr/1-12                   11.90n ± 1%   11.76n ± 1%       ~ (p=0.059 n=15)
NatSqr/2-12                   21.33n ± 1%   21.12n ± 0%       ~ (p=0.063 n=15)
NatSqr/3-12                   26.05n ± 1%   25.79n ± 0%       ~ (p=0.002 n=15)
NatSqr/5-12                   37.31n ± 0%   36.98n ± 1%       ~ (p=0.008 n=15)
NatSqr/8-12                   63.07n ± 0%   62.75n ± 1%       ~ (p=0.061 n=15)
NatSqr/10-12                  79.48n ± 0%   79.59n ± 0%       ~ (p=0.455 n=15)
NatSqr/20-12                  173.1n ± 0%   173.2n ± 1%       ~ (p=0.518 n=15)
NatSqr/30-12                  288.6n ± 1%   289.2n ± 0%       ~ (p=0.030 n=15)
NatSqr/50-12                  653.3n ± 0%   653.3n ± 0%       ~ (p=0.361 n=15)
NatSqr/80-12                  1.492µ ± 0%   1.496µ ± 0%       ~ (p=0.018 n=15)
NatSqr/100-12                 2.270µ ± 1%   2.270µ ± 0%       ~ (p=0.326 n=15)
NatSqr/200-12                 8.776µ ± 1%   8.784µ ± 1%       ~ (p=0.083 n=15)
NatSqr/300-12                 15.07µ ± 0%   15.09µ ± 0%       ~ (p=0.455 n=15)
NatSqr/500-12                 41.71µ ± 0%   41.77µ ± 1%       ~ (p=0.305 n=15)
NatSqr/800-12                 80.77µ ± 1%   80.59µ ± 0%       ~ (p=0.113 n=15)
NatSqr/1000-12                126.4µ ± 1%   126.5µ ± 0%       ~ (p=0.683 n=15)
NatSqr/10000-12               4.204m ± 0%   4.119m ± 0%  -2.02% (p=0.000 n=15)
NatSqr/100000-12              177.0m ± 0%   172.9m ± 0%  -2.31% (p=0.000 n=15)
geomean                       3.790µ        3.757µ       -0.87%

Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652057
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
src/math/big/int.go
src/math/big/nat.go
src/math/big/nat_test.go
src/math/big/natdiv.go
src/math/big/natmul.go