math/big: simplify, speed up Karatsuba multiplication
The old Karatsuba implementation only operated on lengths that are
a power of two times a number smaller than karatsubaThreshold.
For example, when karatsubaThreshold = 40, multiplying a pair
of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words
and then has to fix up the answer to include the high 3 words of each.
I suspect this requirement was needed to make the analysis of
how many temporary words to reserve easier, back when the
answer was 3*n and depended on exactly halving the size at
each Karatsuba step.
Now that we have the more flexible temporary allocation stack,
we can change Karatsuba to accept operands of odd length.
Doing so avoids most of the fixup that the old approach required.
For example, multiplying a pair of 99-word numbers runs
karatsuba on all 99 words now.
This is simpler and about the same speed or, for large cases, faster.
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15)
GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15)
GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15)
GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15)
GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15)
GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15)
GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15)
GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15)
Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15)
Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15)
Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15)
Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15)
Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15)
Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15)
Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15)
Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15)
Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15)
Div/
2000000/
1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15)
Div/
20000000/
10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15)
NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15)
NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15)
NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15)
NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15)
NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15)
NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15)
NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15)
NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15)
NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15)
NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15)
NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15)
NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15)
NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15)
NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15)
NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15)
NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15)
NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15)
NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15)
NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15)
NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15)
NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15)
NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15)
NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15)
geomean 7.745µ 7.760µ +0.19%
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15)
GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15)
GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15)
GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15)
GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15)
GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15)
GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15)
GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15)
Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15)
Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15)
Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15)
Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15)
Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15)
Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15)
Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15)
Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15)
Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15)
Div/
2000000/
1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15)
Div/
20000000/
10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15)
NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15)
NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15)
NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15)
NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15)
NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15)
NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15)
NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15)
NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15)
NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15)
NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15)
NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15)
NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15)
NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15)
NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15)
NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15)
NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15)
NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15)
NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15)
NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15)
NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15)
NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15)
NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15)
NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15)
geomean 6.518µ 6.438µ -1.22%
goos: linux
goarch: arm64
pkg: math/big
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15)
GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15)
GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15)
GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15)
GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15)
GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15)
GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15)
GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15)
Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15)
Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15)
Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15)
Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15)
Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15)
Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15)
Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15)
Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15)
Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15)
Div/
2000000/
1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15)
Div/
20000000/
10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15)
NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15)
NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15)
NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15)
NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15)
NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15)
NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15)
NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15)
NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15)
NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15)
NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15)
NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15)
NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15)
NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15)
NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15)
NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15)
NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15)
NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15)
NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15)
NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15)
NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15)
NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15)
NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15)
NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15)
geomean 7.264µ 7.172µ -1.27%
goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15)
GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15)
GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15)
GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15)
GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15)
GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15)
GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15)
GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15)
Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15)
Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15)
Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15)
Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15)
Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15)
Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15)
Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15)
Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15)
Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15)
Div/
2000000/
1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15)
Div/
20000000/
10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15)
NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15)
NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15)
NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15)
NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15)
NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15)
NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15)
NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15)
NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15)
NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15)
NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15)
NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15)
NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15)
NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15)
NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15)
NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15)
NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15)
NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15)
NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15)
NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15)
NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15)
NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15)
NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15)
NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15)
geomean 3.790µ 3.757µ -0.87%
Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652057
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>