Cypherpunks repositories - gostls13.git/log

cmd/compile/internal/ssa: simplify riscv64 FCLASSD rewrite rules

We don't need to check that the bit patterns of the constants
match, it is sufficient to just check the constant is equal to the
given value.

While we're here also change the FCLASSD rules to use a bit pattern
for the mask. I think this improves readability, particularly as
more uses of FCLASSD get added (e.g. CL 717560).

These changes should not affect codegen.

Change-Id: I92a6338dc71e6a71e04306f67d7d86016c6e9c47
Reviewed-on: https://go-review.googlesource.com/c/go/+/717580
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>

runtime: amend doc for setPinned

Change-Id: I9c5a8f8a031e368bda312c830dc266f5986e8b1a
GitHub-Last-Rev: 23145e8fdbc020ae17503e3abe4fe230ba1082ef
GitHub-Pull-Request: golang/go#76160
Reviewed-on: https://go-review.googlesource.com/c/go/+/717341
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

cmd/compile/internal/ssa: more aggressive on dead auto elim

Propagate "unread" across OpMoves. If the addr of this auto is only used
by an OpMove as its source arg, and the OpMove's target arg is the addr
of another auto. If the 2nd auto can be eliminated, this one can also be
eliminated.

This CL eliminates unnecessary memory copies and makes the frame smaller
in the following code snippet:

func contains(m map[string][16]int, k string) bool {
        _, ok := m[k]
        return ok
}

These are the benchmark results followed by the benchmark code:

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
                │   old.txt   │              new.txt                │
                │   sec/op    │   sec/op     vs base                │
Map1Access2Ok-8   9.582n ± 2%   9.226n ± 0%   -3.72% (p=0.000 n=20)
Map2Access2Ok-8   13.79n ± 1%   10.24n ± 1%  -25.77% (p=0.000 n=20)
Map3Access2Ok-8   68.68n ± 1%   12.65n ± 1%  -81.58% (p=0.000 n=20)

package main_test

import "testing"

var (
        m1 = map[int]int{}
        m2 = map[int][16]int{}
        m3 = map[int][256]int{}
)

func init() {
        for i := range 1000 {
                m1[i] = i
                m2[i] = [16]int{15:i}
                m3[i] = [256]int{255:i}
        }
}

func BenchmarkMap1Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m1[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

func BenchmarkMap2Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m2[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

func BenchmarkMap3Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m3[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

Fixes #75398

Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771
Reviewed-on: https://go-review.googlesource.com/c/go/+/702875
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

cmd/cgo: drop pre-1.18 support

Now that the bootstrap compiler is 1.24, it's no longer needed.

Change-Id: I9b3d6b7176af10fbc580173d50130120b542e7f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/717060
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

internal/strconv: handle %f with fixedFtoa when possible

Everyone writes papers about fast shortest-output formatting.
Eventually we also sped up fixed-length formatting %e and %g.
But we've neglected %f, which falls back to the slow general code
even for relatively trivial things like %.2f on 1.23.

This CL uses the fast path fixedFtoa for %f when possible by
estimating the number of digits needed.

benchmark \ host                  linux-arm64    local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                      vs base  vs base      vs base  vs base    vs base        vs base
AppendFloat/Decimal                         ~        ~            ~   +0.30%          ~              ~
AppendFloat/Float                      -0.45%        ~       -2.20%        ~     -2.19%              ~
AppendFloat/Exp                        +0.12%        ~       +4.11%        ~          ~              ~
AppendFloat/NegExp                     +0.53%        ~            ~        ~          ~              ~
AppendFloat/LongExp                    +0.41%   -1.42%       +4.50%        ~          ~              ~
AppendFloat/Big                             ~   -1.25%       +3.69%        ~          ~              ~
AppendFloat/BinaryExp                  +0.38%   +1.68%            ~        ~     +2.65%         +0.97%
AppendFloat/32Integer                       ~        ~            ~        ~          ~              ~
AppendFloat/32ExactFraction                 ~        ~       -2.61%        ~          ~              ~
AppendFloat/32Point                    -0.41%        ~       -2.65%        ~          ~              ~
AppendFloat/32Exp                           ~        ~       +5.35%        ~     +1.44%         +0.39%
AppendFloat/32NegExp                   +0.30%        ~       +2.31%        ~          ~         +0.82%
AppendFloat/32Shortest                 +0.28%   -0.85%            ~        ~     -3.20%              ~
AppendFloat/32Fixed8Hard               -0.29%        ~            ~   -1.75%          ~         +4.30%
AppendFloat/32Fixed9Hard                    ~        ~            ~        ~          ~         +1.52%
AppendFloat/64Fixed1                   +0.61%   -2.03%            ~        ~          ~         +4.36%
AppendFloat/64Fixed2                        ~   -3.43%            ~        ~          ~         +1.03%
AppendFloat/64Fixed2.5                 +0.57%   -2.23%            ~        ~          ~         +2.66%
AppendFloat/64Fixed3                        ~   -1.64%            ~   +0.31%     +2.32%         +2.10%
AppendFloat/64Fixed4                   +0.15%   -2.11%            ~        ~     +1.48%         +1.58%
AppendFloat/64Fixed5Hard               +0.45%        ~       +1.58%        ~          ~         +1.73%
AppendFloat/64Fixed12                  -0.16%        ~       +1.63%   -1.23%     +3.93%         +2.42%
AppendFloat/64Fixed16                  -0.33%   -0.49%            ~        ~     +3.67%         +2.33%
AppendFloat/64Fixed12Hard              -0.58%        ~            ~        ~     +4.98%         +0.62%
AppendFloat/64Fixed17Hard              +0.27%   -0.94%            ~        ~     +2.07%         +1.79%
AppendFloat/64Fixed18Hard                   ~        ~            ~        ~          ~              ~
AppendFloat/64FixedF1                 -69.59%  -76.08%      -70.94%  -68.26%    -75.27%        -69.88%
AppendFloat/64FixedF2                 -76.28%  -81.82%      -76.95%  -77.34%    -83.53%        -80.04%
AppendFloat/64FixedF3                 -77.30%  -84.51%      -77.82%  -77.81%    -78.77%        -73.69%
AppendFloat/Slowpath64                      ~   -1.30%       +1.64%        ~     -2.66%         -0.44%
AppendFloat/SlowpathDenormal64         +0.11%   -1.69%            ~        ~     -2.90%              ~

host: linux-arm64
goos: linux
goarch: arm64
pkg: internal/strconv
cpu: unknown
                                 │ 1cc918cc725  │             b66c604f523              │
                                 │    sec/op    │    sec/op     vs base                │
AppendFloat/Decimal-8               60.22n ± 0%    60.21n ± 0%        ~ (p=0.416 n=20)
AppendFloat/Float-8                 88.93n ± 0%    88.53n ± 0%   -0.45% (p=0.000 n=20)
AppendFloat/Exp-8                   93.09n ± 0%    93.20n ± 0%   +0.12% (p=0.000 n=20)
AppendFloat/NegExp-8                93.06n ± 0%    93.56n ± 0%   +0.53% (p=0.000 n=20)
AppendFloat/LongExp-8               99.79n ± 0%   100.20n ± 0%   +0.41% (p=0.000 n=20)
AppendFloat/Big-8                   103.9n ± 0%    104.0n ± 0%        ~ (p=0.004 n=20)
AppendFloat/BinaryExp-8             47.34n ± 0%    47.52n ± 0%   +0.38% (p=0.000 n=20)
AppendFloat/32Integer-8             60.43n ± 0%    60.40n ± 0%        ~ (p=0.006 n=20)
AppendFloat/32ExactFraction-8       86.21n ± 0%    86.24n ± 0%        ~ (p=0.634 n=20)
AppendFloat/32Point-8               83.20n ± 0%    82.87n ± 0%   -0.41% (p=0.000 n=20)
AppendFloat/32Exp-8                 89.43n ± 0%    89.45n ± 0%        ~ (p=0.193 n=20)
AppendFloat/32NegExp-8              87.31n ± 0%    87.58n ± 0%   +0.30% (p=0.000 n=20)
AppendFloat/32Shortest-8            76.28n ± 0%    76.49n ± 0%   +0.28% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-8          52.44n ± 0%    52.29n ± 0%   -0.29% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-8          60.57n ± 0%    60.54n ± 0%        ~ (p=0.285 n=20)
AppendFloat/64Fixed1-8              46.27n ± 0%    46.55n ± 0%   +0.61% (p=0.000 n=20)
AppendFloat/64Fixed2-8              46.77n ± 0%    46.80n ± 0%        ~ (p=0.060 n=20)
AppendFloat/64Fixed2.5-8            43.70n ± 0%    43.95n ± 0%   +0.57% (p=0.000 n=20)
AppendFloat/64Fixed3-8              47.22n ± 0%    47.19n ± 0%        ~ (p=0.008 n=20)
AppendFloat/64Fixed4-8              44.07n ± 0%    44.13n ± 0%   +0.15% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-8          51.81n ± 0%    52.04n ± 0%   +0.45% (p=0.000 n=20)
AppendFloat/64Fixed12-8             78.41n ± 0%    78.29n ± 0%   -0.16% (p=0.000 n=20)
AppendFloat/64Fixed16-8             65.14n ± 0%    64.93n ± 0%   -0.33% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-8         62.12n ± 0%    61.76n ± 0%   -0.58% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-8         73.93n ± 0%    74.13n ± 0%   +0.27% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-8         4.285µ ± 0%    4.283µ ± 0%        ~ (p=0.039 n=20)
AppendFloat/64FixedF1-8            216.10n ± 0%    65.71n ± 0%  -69.59% (p=0.000 n=20)
AppendFloat/64FixedF2-8            227.70n ± 0%    54.02n ± 0%  -76.28% (p=0.000 n=20)
AppendFloat/64FixedF3-8            208.20n ± 1%    47.25n ± 0%  -77.30% (p=0.000 n=20)
AppendFloat/Slowpath64-8            97.40n ± 0%    97.45n ± 0%        ~ (p=0.018 n=20)
AppendFloat/SlowpathDenormal64-8    94.75n ± 0%    94.86n ± 0%   +0.11% (p=0.000 n=20)
geomean                             87.86n         76.99n       -12.37%

host: local
goos: darwin
cpu: Apple M3 Pro
                                  │ 1cc918cc725  │             b66c604f523             │
                                  │    sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-12               21.05n ± 1%   20.91n ± 1%        ~ (p=0.051 n=20)
AppendFloat/Float-12                 32.13n ± 0%   32.04n ± 1%        ~ (p=0.457 n=20)
AppendFloat/Exp-12                   31.84n ± 0%   31.72n ± 0%        ~ (p=0.151 n=20)
AppendFloat/NegExp-12                31.78n ± 1%   31.79n ± 1%        ~ (p=0.867 n=20)
AppendFloat/LongExp-12               33.70n ± 0%   33.22n ± 1%   -1.42% (p=0.000 n=20)
AppendFloat/Big-12                   35.52n ± 1%   35.07n ± 1%   -1.25% (p=0.000 n=20)
AppendFloat/BinaryExp-12             19.32n ± 1%   19.64n ± 0%   +1.68% (p=0.000 n=20)
AppendFloat/32Integer-12             21.32n ± 0%   21.18n ± 1%        ~ (p=0.025 n=20)
AppendFloat/32ExactFraction-12       30.88n ± 0%   31.07n ± 0%        ~ (p=0.087 n=20)
AppendFloat/32Point-12               30.88n ± 0%   30.95n ± 1%        ~ (p=0.250 n=20)
AppendFloat/32Exp-12                 31.57n ± 0%   31.67n ± 2%        ~ (p=0.126 n=20)
AppendFloat/32NegExp-12              30.50n ± 1%   30.76n ± 1%        ~ (p=0.087 n=20)
AppendFloat/32Shortest-12            27.14n ± 0%   26.91n ± 1%   -0.85% (p=0.001 n=20)
AppendFloat/32Fixed8Hard-12          17.11n ± 0%   17.08n ± 0%        ~ (p=0.027 n=20)
AppendFloat/32Fixed9Hard-12          19.16n ± 1%   19.31n ± 1%        ~ (p=0.062 n=20)
AppendFloat/64Fixed1-12              15.50n ± 0%   15.18n ± 1%   -2.03% (p=0.000 n=20)
AppendFloat/64Fixed2-12              15.46n ± 0%   14.93n ± 0%   -3.43% (p=0.000 n=20)
AppendFloat/64Fixed2.5-12            15.28n ± 0%   14.94n ± 1%   -2.23% (p=0.000 n=20)
AppendFloat/64Fixed3-12              15.58n ± 0%   15.32n ± 1%   -1.64% (p=0.000 n=20)
AppendFloat/64Fixed4-12              15.39n ± 0%   15.06n ± 1%   -2.11% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-12          18.00n ± 0%   18.07n ± 1%        ~ (p=0.011 n=20)
AppendFloat/64Fixed12-12             27.97n ± 8%   29.05n ± 3%        ~ (p=0.107 n=20)
AppendFloat/64Fixed16-12             21.48n ± 0%   21.38n ± 0%   -0.49% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-12         20.79n ± 1%   21.05n ± 2%        ~ (p=0.784 n=20)
AppendFloat/64Fixed17Hard-12         27.21n ± 1%   26.95n ± 1%   -0.94% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-12         2.166µ ± 1%   2.182µ ± 1%        ~ (p=0.031 n=20)
AppendFloat/64FixedF1-12            103.35n ± 0%   24.72n ± 0%  -76.08% (p=0.000 n=20)
AppendFloat/64FixedF2-12            114.30n ± 1%   20.78n ± 0%  -81.82% (p=0.000 n=20)
AppendFloat/64FixedF3-12            107.10n ± 0%   16.58n ± 0%  -84.51% (p=0.000 n=20)
AppendFloat/Slowpath64-12            32.01n ± 0%   31.59n ± 0%   -1.30% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-12    30.21n ± 0%   29.70n ± 0%   -1.69% (p=0.000 n=20)
geomean                              31.84n        27.00n       -15.20%

host: linux-amd64
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                                  │ 1cc918cc725  │             b66c604f523              │
                                  │    sec/op    │    sec/op     vs base                │
AppendFloat/Decimal-16               63.62n ± 1%    64.05n ± 1%        ~ (p=0.753 n=20)
AppendFloat/Float-16                 97.12n ± 1%    94.98n ± 1%   -2.20% (p=0.000 n=20)
AppendFloat/Exp-16                   98.12n ± 1%   102.15n ± 1%   +4.11% (p=0.000 n=20)
AppendFloat/NegExp-16                101.1n ± 1%    101.5n ± 1%        ~ (p=0.089 n=20)
AppendFloat/LongExp-16               104.5n ± 1%    109.2n ± 1%   +4.50% (p=0.000 n=20)
AppendFloat/Big-16                   108.5n ± 0%    112.5n ± 1%   +3.69% (p=0.000 n=20)
AppendFloat/BinaryExp-16             47.68n ± 1%    47.44n ± 1%        ~ (p=0.143 n=20)
AppendFloat/32Integer-16             63.77n ± 2%    63.45n ± 1%        ~ (p=0.015 n=20)
AppendFloat/32ExactFraction-16       97.69n ± 1%    95.14n ± 1%   -2.61% (p=0.000 n=20)
AppendFloat/32Point-16               92.17n ± 1%    89.72n ± 1%   -2.65% (p=0.000 n=20)
AppendFloat/32Exp-16                 95.63n ± 1%   100.75n ± 1%   +5.35% (p=0.000 n=20)
AppendFloat/32NegExp-16              94.53n ± 1%    96.72n ± 0%   +2.31% (p=0.000 n=20)
AppendFloat/32Shortest-16            86.43n ± 0%    86.95n ± 0%        ~ (p=0.010 n=20)
AppendFloat/32Fixed8Hard-16          57.75n ± 1%    57.95n ± 1%        ~ (p=0.098 n=20)
AppendFloat/32Fixed9Hard-16          66.56n ± 2%    66.97n ± 1%        ~ (p=0.380 n=20)
AppendFloat/64Fixed1-16              51.02n ± 1%    50.99n ± 1%        ~ (p=0.473 n=20)
AppendFloat/64Fixed2-16              50.94n ± 1%    51.01n ± 1%        ~ (p=0.136 n=20)
AppendFloat/64Fixed2.5-16            49.27n ± 1%    49.37n ± 1%        ~ (p=0.218 n=20)
AppendFloat/64Fixed3-16              51.85n ± 1%    52.55n ± 1%        ~ (p=0.045 n=20)
AppendFloat/64Fixed4-16              50.30n ± 1%    50.43n ± 1%        ~ (p=0.794 n=20)
AppendFloat/64Fixed5Hard-16          57.57n ± 1%    58.48n ± 1%   +1.58% (p=0.000 n=20)
AppendFloat/64Fixed12-16             82.67n ± 1%    84.02n ± 1%   +1.63% (p=0.000 n=20)
AppendFloat/64Fixed16-16             71.10n ± 1%    70.94n ± 1%        ~ (p=0.569 n=20)
AppendFloat/64Fixed12Hard-16         68.36n ± 1%    68.64n ± 1%        ~ (p=0.155 n=20)
AppendFloat/64Fixed17Hard-16         80.16n ± 1%    80.10n ± 1%        ~ (p=0.836 n=20)
AppendFloat/64Fixed18Hard-16         4.916µ ± 1%    4.919µ ± 1%        ~ (p=0.507 n=20)
AppendFloat/64FixedF1-16            239.75n ± 1%    69.67n ± 1%  -70.94% (p=0.000 n=20)
AppendFloat/64FixedF2-16            252.50n ± 1%    58.20n ± 1%  -76.95% (p=0.000 n=20)
AppendFloat/64FixedF3-16            238.00n ± 1%    52.79n ± 1%  -77.82% (p=0.000 n=20)
AppendFloat/Slowpath64-16            100.4n ± 1%    102.0n ± 1%   +1.64% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-16    97.92n ± 1%    98.01n ± 1%        ~ (p=0.304 n=20)
geomean                              95.58n         84.00n       -12.12%

host: s7
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                  │ 1cc918cc725 │             b66c604f523             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-32              22.00n ± 0%   22.06n ± 0%   +0.30% (p=0.001 n=20)
AppendFloat/Float-32                34.83n ± 0%   34.76n ± 0%        ~ (p=0.159 n=20)
AppendFloat/Exp-32                  34.91n ± 0%   34.89n ± 0%        ~ (p=0.188 n=20)
AppendFloat/NegExp-32               35.24n ± 0%   35.32n ± 0%        ~ (p=0.026 n=20)
AppendFloat/LongExp-32              37.02n ± 0%   37.02n ± 0%        ~ (p=0.317 n=20)
AppendFloat/Big-32                  38.51n ± 0%   38.43n ± 0%        ~ (p=0.060 n=20)
AppendFloat/BinaryExp-32            17.57n ± 0%   17.59n ± 0%        ~ (p=0.278 n=20)
AppendFloat/32Integer-32            22.06n ± 0%   22.09n ± 0%        ~ (p=0.762 n=20)
AppendFloat/32ExactFraction-32      32.91n ± 0%   33.00n ± 0%        ~ (p=0.055 n=20)
AppendFloat/32Point-32              33.24n ± 0%   33.18n ± 0%        ~ (p=0.068 n=20)
AppendFloat/32Exp-32                34.50n ± 0%   34.55n ± 0%        ~ (p=0.030 n=20)
AppendFloat/32NegExp-32             33.53n ± 0%   33.61n ± 0%        ~ (p=0.045 n=20)
AppendFloat/32Shortest-32           30.10n ± 0%   30.10n ± 0%        ~ (p=0.931 n=20)
AppendFloat/32Fixed8Hard-32         22.89n ± 0%   22.49n ± 0%   -1.75% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32         25.82n ± 0%   25.75n ± 1%        ~ (p=0.143 n=20)
AppendFloat/64Fixed1-32             18.80n ± 0%   18.70n ± 0%        ~ (p=0.004 n=20)
AppendFloat/64Fixed2-32             18.64n ± 1%   18.54n ± 0%        ~ (p=0.001 n=20)
AppendFloat/64Fixed2.5-32           17.89n ± 0%   17.81n ± 0%        ~ (p=0.001 n=20)
AppendFloat/64Fixed3-32             19.62n ± 0%   19.68n ± 0%   +0.31% (p=0.000 n=20)
AppendFloat/64Fixed4-32             18.64n ± 0%   18.82n ± 0%        ~ (p=0.010 n=20)
AppendFloat/64Fixed5Hard-32         21.62n ± 0%   21.57n ± 0%        ~ (p=0.058 n=20)
AppendFloat/64Fixed12-32            30.98n ± 1%   30.61n ± 1%   -1.23% (p=0.000 n=20)
AppendFloat/64Fixed16-32            26.89n ± 0%   27.08n ± 1%        ~ (p=0.003 n=20)
AppendFloat/64Fixed12Hard-32        26.03n ± 0%   26.20n ± 1%        ~ (p=0.344 n=20)
AppendFloat/64Fixed17Hard-32        30.03n ± 1%   29.72n ± 1%        ~ (p=0.001 n=20)
AppendFloat/64Fixed18Hard-32        1.824µ ± 0%   1.825µ ± 1%        ~ (p=0.567 n=20)
AppendFloat/64FixedF1-32            83.58n ± 1%   26.52n ± 0%  -68.26% (p=0.000 n=20)
AppendFloat/64FixedF2-32            89.68n ± 1%   20.32n ± 1%  -77.34% (p=0.000 n=20)
AppendFloat/64FixedF3-32            84.84n ± 0%   18.82n ± 0%  -77.81% (p=0.000 n=20)
AppendFloat/Slowpath64-32           35.55n ± 0%   35.61n ± 0%        ~ (p=0.394 n=20)
AppendFloat/SlowpathDenormal64-32   35.03n ± 0%   35.02n ± 0%        ~ (p=0.733 n=20)
geomean                             34.67n        30.31n       -12.56%

host: linux-386
goarch: 386
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                                  │ 1cc918cc725 │             b66c604f523             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-16              133.6n ± 1%   130.5n ± 1%        ~ (p=0.002 n=20)
AppendFloat/Float-16                242.3n ± 1%   237.0n ± 1%   -2.19% (p=0.000 n=20)
AppendFloat/Exp-16                  249.1n ± 3%   252.5n ± 1%        ~ (p=0.005 n=20)
AppendFloat/NegExp-16               248.7n ± 3%   253.8n ± 2%        ~ (p=0.006 n=20)
AppendFloat/LongExp-16              258.4n ± 2%   253.0n ± 6%        ~ (p=0.185 n=20)
AppendFloat/Big-16                  285.6n ± 1%   279.2n ± 5%        ~ (p=0.012 n=20)
AppendFloat/BinaryExp-16            89.47n ± 1%   91.85n ± 2%   +2.65% (p=0.000 n=20)
AppendFloat/32Integer-16            133.5n ± 1%   129.9n ± 1%        ~ (p=0.004 n=20)
AppendFloat/32ExactFraction-16      213.7n ± 1%   212.2n ± 2%        ~ (p=0.071 n=20)
AppendFloat/32Point-16              202.0n ± 0%   200.4n ± 1%        ~ (p=0.223 n=20)
AppendFloat/32Exp-16                236.4n ± 1%   239.8n ± 1%   +1.44% (p=0.000 n=20)
AppendFloat/32NegExp-16             212.5n ± 1%   211.9n ± 1%        ~ (p=0.995 n=20)
AppendFloat/32Shortest-16           200.3n ± 1%   193.9n ± 1%   -3.20% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-16         136.0n ± 1%   133.2n ± 4%        ~ (p=0.323 n=20)
AppendFloat/32Fixed9Hard-16         155.6n ± 1%   156.7n ± 2%        ~ (p=0.022 n=20)
AppendFloat/64Fixed1-16             132.8n ± 1%   133.0n ± 3%        ~ (p=0.199 n=20)
AppendFloat/64Fixed2-16             128.9n ± 1%   129.7n ± 3%        ~ (p=0.018 n=20)
AppendFloat/64Fixed2.5-16           127.0n ± 1%   126.5n ± 3%        ~ (p=0.825 n=20)
AppendFloat/64Fixed3-16             127.3n ± 1%   130.3n ± 4%   +2.32% (p=0.001 n=20)
AppendFloat/64Fixed4-16             121.4n ± 1%   123.2n ± 2%   +1.48% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16         136.2n ± 1%   136.2n ± 3%        ~ (p=0.256 n=20)
AppendFloat/64Fixed12-16            159.0n ± 1%   165.2n ± 2%   +3.93% (p=0.000 n=20)
AppendFloat/64Fixed16-16            151.4n ± 0%   156.9n ± 1%   +3.67% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16        146.5n ± 1%   153.8n ± 1%   +4.98% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16        166.3n ± 1%   169.8n ± 1%   +2.07% (p=0.001 n=20)
AppendFloat/64Fixed18Hard-16        10.59µ ± 2%   10.60µ ± 0%        ~ (p=0.499 n=20)
AppendFloat/64FixedF1-16            614.4n ± 1%   152.0n ± 1%  -75.27% (p=0.000 n=20)
AppendFloat/64FixedF2-16            845.0n ± 0%   139.1n ± 1%  -83.53% (p=0.000 n=20)
AppendFloat/64FixedF3-16            608.8n ± 1%   129.3n ± 1%  -78.77% (p=0.000 n=20)
AppendFloat/Slowpath64-16           251.7n ± 1%   245.0n ± 1%   -2.66% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-16   248.4n ± 1%   241.2n ± 1%   -2.90% (p=0.000 n=20)
geomean                             225.7n        193.8n       -14.14%

host: s7:GOARCH=386
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                  │ 1cc918cc725  │             b66c604f523             │
                                  │    sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-32               41.88n ± 0%   42.02n ± 1%        ~ (p=0.004 n=20)
AppendFloat/Float-32                 71.05n ± 0%   71.24n ± 0%        ~ (p=0.044 n=20)
AppendFloat/Exp-32                   74.91n ± 1%   74.80n ± 0%        ~ (p=0.433 n=20)
AppendFloat/NegExp-32                74.10n ± 0%   74.20n ± 0%        ~ (p=0.867 n=20)
AppendFloat/LongExp-32               75.73n ± 0%   75.84n ± 0%        ~ (p=0.147 n=20)
AppendFloat/Big-32                   82.47n ± 0%   82.36n ± 0%        ~ (p=0.490 n=20)
AppendFloat/BinaryExp-32             32.31n ± 1%   32.62n ± 0%   +0.97% (p=0.000 n=20)
AppendFloat/32Integer-32             41.38n ± 1%   41.40n ± 1%        ~ (p=0.106 n=20)
AppendFloat/32ExactFraction-32       62.72n ± 0%   62.92n ± 0%        ~ (p=0.009 n=20)
AppendFloat/32Point-32               60.36n ± 0%   60.33n ± 0%        ~ (p=0.050 n=20)
AppendFloat/32Exp-32                 68.97n ± 0%   69.24n ± 0%   +0.39% (p=0.000 n=20)
AppendFloat/32NegExp-32              62.63n ± 0%   63.15n ± 0%   +0.82% (p=0.000 n=20)
AppendFloat/32Shortest-32            58.76n ± 0%   58.87n ± 0%        ~ (p=0.053 n=20)
AppendFloat/32Fixed8Hard-32          41.67n ± 1%   43.46n ± 1%   +4.30% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32          49.78n ± 1%   50.53n ± 1%   +1.52% (p=0.000 n=20)
AppendFloat/64Fixed1-32              41.15n ± 0%   42.95n ± 1%   +4.36% (p=0.000 n=20)
AppendFloat/64Fixed2-32              40.83n ± 1%   41.24n ± 1%   +1.03% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32            39.42n ± 0%   40.47n ± 1%   +2.66% (p=0.000 n=20)
AppendFloat/64Fixed3-32              40.73n ± 1%   41.58n ± 1%   +2.10% (p=0.000 n=20)
AppendFloat/64Fixed4-32              38.68n ± 0%   39.29n ± 0%   +1.58% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32          42.88n ± 1%   43.62n ± 1%   +1.73% (p=0.000 n=20)
AppendFloat/64Fixed12-32             51.67n ± 1%   52.92n ± 1%   +2.42% (p=0.000 n=20)
AppendFloat/64Fixed16-32             49.15n ± 0%   50.30n ± 0%   +2.33% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32         48.51n ± 0%   48.81n ± 0%   +0.62% (p=0.001 n=20)
AppendFloat/64Fixed17Hard-32         54.62n ± 1%   55.60n ± 1%   +1.79% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32         3.979µ ± 1%   3.980µ ± 1%        ~ (p=0.569 n=20)
AppendFloat/64FixedF1-32            165.90n ± 1%   49.97n ± 0%  -69.88% (p=0.000 n=20)
AppendFloat/64FixedF2-32            225.50n ± 0%   45.02n ± 1%  -80.04% (p=0.000 n=20)
AppendFloat/64FixedF3-32            160.20n ± 1%   42.16n ± 1%  -73.69% (p=0.000 n=20)
AppendFloat/Slowpath64-32            75.55n ± 0%   75.23n ± 0%   -0.44% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-32    74.84n ± 0%   75.00n ± 0%        ~ (p=0.268 n=20)
geomean                              69.22n        61.13n       -11.69%

Change-Id: I722d2e2621e74e32cb3fc34a2df5b16cc595715c
Reviewed-on: https://go-review.googlesource.com/c/go/+/717183
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>

cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm

This lets us remove useAvg and useHmul from the division rules.
The compiler is simpler and the generated code is faster.

goos: wasip1
goarch: wasm
pkg: internal/strconv
                               │   old.txt   │               new.txt               │
                               │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal              192.8n ± 1%   194.6n ± 0%   +0.91% (p=0.000 n=10)
AppendFloat/Float                328.6n ± 0%   279.6n ± 0%  -14.93% (p=0.000 n=10)
AppendFloat/Exp                  335.6n ± 1%   289.2n ± 1%  -13.80% (p=0.000 n=10)
AppendFloat/NegExp               336.0n ± 0%   289.1n ± 1%  -13.97% (p=0.000 n=10)
AppendFloat/LongExp              332.4n ± 0%   285.2n ± 1%  -14.20% (p=0.000 n=10)
AppendFloat/Big                  348.2n ± 0%   300.1n ± 0%  -13.83% (p=0.000 n=10)
AppendFloat/BinaryExp            137.4n ± 0%   138.2n ± 0%   +0.55% (p=0.001 n=10)
AppendFloat/32Integer            193.3n ± 1%   196.5n ± 0%   +1.66% (p=0.000 n=10)
AppendFloat/32ExactFraction      283.3n ± 0%   268.9n ± 1%   -5.08% (p=0.000 n=10)
AppendFloat/32Point              279.9n ± 0%   266.5n ± 0%   -4.80% (p=0.000 n=10)
AppendFloat/32Exp                300.1n ± 0%   288.3n ± 1%   -3.90% (p=0.000 n=10)
AppendFloat/32NegExp             288.2n ± 1%   277.9n ± 1%   -3.59% (p=0.000 n=10)
AppendFloat/32Shortest           261.7n ± 0%   250.2n ± 0%   -4.39% (p=0.000 n=10)
AppendFloat/32Fixed8Hard         173.3n ± 1%   158.9n ± 1%   -8.31% (p=0.000 n=10)
AppendFloat/32Fixed9Hard         180.0n ± 0%   167.9n ± 2%   -6.70% (p=0.000 n=10)
AppendFloat/64Fixed1             167.1n ± 0%   149.6n ± 1%  -10.50% (p=0.000 n=10)
AppendFloat/64Fixed2             162.4n ± 1%   146.5n ± 0%   -9.73% (p=0.000 n=10)
AppendFloat/64Fixed2.5           165.5n ± 0%   149.4n ± 1%   -9.70% (p=0.000 n=10)
AppendFloat/64Fixed3             166.4n ± 1%   150.2n ± 0%   -9.74% (p=0.000 n=10)
AppendFloat/64Fixed4             163.7n ± 0%   149.6n ± 1%   -8.62% (p=0.000 n=10)
AppendFloat/64Fixed5Hard         182.8n ± 1%   167.1n ± 1%   -8.61% (p=0.000 n=10)
AppendFloat/64Fixed12            222.2n ± 0%   208.8n ± 0%   -6.05% (p=0.000 n=10)
AppendFloat/64Fixed16            197.6n ± 1%   181.7n ± 0%   -8.02% (p=0.000 n=10)
AppendFloat/64Fixed12Hard        194.5n ± 0%   181.0n ± 0%   -6.99% (p=0.000 n=10)
AppendFloat/64Fixed17Hard        205.1n ± 1%   191.9n ± 0%   -6.44% (p=0.000 n=10)
AppendFloat/64Fixed18Hard        6.269µ ± 0%   6.643µ ± 0%   +5.97% (p=0.000 n=10)
AppendFloat/64FixedF1            211.7n ± 1%   197.0n ± 0%   -6.95% (p=0.000 n=10)
AppendFloat/64FixedF2            189.4n ± 0%   174.2n ± 0%   -8.08% (p=0.000 n=10)
AppendFloat/64FixedF3            169.0n ± 0%   154.9n ± 0%   -8.32% (p=0.000 n=10)
AppendFloat/Slowpath64           321.2n ± 0%   274.2n ± 1%  -14.63% (p=0.000 n=10)
AppendFloat/SlowpathDenormal64   307.4n ± 1%   261.2n ± 0%  -15.03% (p=0.000 n=10)
AppendInt                        3.367µ ± 1%   3.376µ ± 0%        ~ (p=0.517 n=10)
AppendUint                       675.5n ± 0%   676.9n ± 0%        ~ (p=0.196 n=10)
AppendIntSmall                   28.13n ± 1%   28.17n ± 0%   +0.14% (p=0.015 n=10)
AppendUintVarlen/digits=1        20.70n ± 0%   20.51n ± 1%   -0.89% (p=0.018 n=10)
AppendUintVarlen/digits=2        20.43n ± 0%   20.27n ± 0%   -0.81% (p=0.001 n=10)
AppendUintVarlen/digits=3        38.48n ± 0%   37.93n ± 0%   -1.43% (p=0.000 n=10)
AppendUintVarlen/digits=4        41.10n ± 0%   38.78n ± 1%   -5.62% (p=0.000 n=10)
AppendUintVarlen/digits=5        42.25n ± 1%   42.11n ± 0%   -0.32% (p=0.041 n=10)
AppendUintVarlen/digits=6        45.40n ± 1%   43.14n ± 0%   -4.98% (p=0.000 n=10)
AppendUintVarlen/digits=7        46.81n ± 1%   46.03n ± 0%   -1.66% (p=0.000 n=10)
AppendUintVarlen/digits=8        48.88n ± 1%   46.59n ± 1%   -4.68% (p=0.000 n=10)
AppendUintVarlen/digits=9        49.94n ± 2%   49.41n ± 1%   -1.06% (p=0.000 n=10)
AppendUintVarlen/digits=10       57.28n ± 1%   56.92n ± 1%   -0.62% (p=0.045 n=10)
AppendUintVarlen/digits=11       60.09n ± 1%   58.11n ± 2%   -3.30% (p=0.000 n=10)
AppendUintVarlen/digits=12       62.22n ± 0%   61.85n ± 0%   -0.59% (p=0.000 n=10)
AppendUintVarlen/digits=13       64.94n ± 0%   62.92n ± 0%   -3.10% (p=0.000 n=10)
AppendUintVarlen/digits=14       65.42n ± 1%   65.19n ± 1%   -0.34% (p=0.005 n=10)
AppendUintVarlen/digits=15       68.17n ± 0%   66.13n ± 0%   -2.99% (p=0.000 n=10)
AppendUintVarlen/digits=16       70.21n ± 1%   70.09n ± 1%        ~ (p=0.517 n=10)
AppendUintVarlen/digits=17       72.93n ± 0%   70.49n ± 0%   -3.34% (p=0.000 n=10)
AppendUintVarlen/digits=18       73.01n ± 0%   72.75n ± 0%   -0.35% (p=0.000 n=10)
AppendUintVarlen/digits=19       79.27n ± 1%   79.49n ± 1%        ~ (p=0.671 n=10)
AppendUintVarlen/digits=20       82.18n ± 0%   80.43n ± 1%   -2.14% (p=0.000 n=10)
geomean                          143.4n        136.0n        -5.20%

Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267
Reviewed-on: https://go-review.googlesource.com/c/go/+/717407
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>

encoding/pem: don't reslice in failure modes

We re-slice the data being processed at the stat of each loop. If the
var that we use to calculate where to re-slice is < 0 or > the length
of the remaining data, return instead of attempting to re-slice.

Change-Id: I1d6c2b6c596feedeea8feeaace370ea73ba02c4c
Reviewed-on: https://go-review.googlesource.com/c/go/+/715260
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>

internal/strconv: extract fixed-precision ftoa from ftoaryu.go

The fixed-precision ftoa algorithm is not actually
documented in the Ryū paper, and it is fairly
straightforward: multiply by a power of 10 to get
an integer that contains the digits we need.
There is also no need for separate float32 and float64
implementations.

This CL implements a new fixedFtoa, separate from Ryū.
The overall algorithm is the same, but the new code
is simpler, faster, and better documented.

Now ftoaryu.go is only about shortest-output formatting,
so if and when yet another algorithm comes along, it will
be clearer what should be replaced (all of ftoaryu.go)
and what should not (all of ftoafixed.go).

benchmark \ host                  linux-arm64    local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                      vs base  vs base      vs base  vs base    vs base        vs base
AppendFloat/Decimal                    -0.18%        ~            ~   -0.68%     +0.49%         -0.79%
AppendFloat/Float                      +0.09%        ~       +1.50%   +0.84%     -0.37%         -0.69%
AppendFloat/Exp                        -0.51%        ~            ~   +1.20%     -1.27%         -1.01%
AppendFloat/NegExp                     -1.01%        ~       +3.43%   +1.35%     -2.33%              ~
AppendFloat/LongExp                    -1.22%   +0.77%            ~        ~     -1.48%              ~
AppendFloat/Big                        -2.07%        ~       -2.07%   -1.97%     -2.89%         -2.93%
AppendFloat/BinaryExp                  -0.28%   +1.06%            ~   +1.35%     -0.64%         -1.64%
AppendFloat/32Integer                       ~        ~            ~   -0.79%          ~         -0.66%
AppendFloat/32ExactFraction            -0.50%        ~       +5.69%        ~     -1.24%         +0.69%
AppendFloat/32Point                         ~   -1.19%       +2.59%   +1.03%     -1.37%         +0.80%
AppendFloat/32Exp                      -3.39%   -2.79%       -8.36%   -0.94%     -5.72%         -5.92%
AppendFloat/32NegExp                   -0.63%        ~            ~   +0.98%     -1.34%         -0.73%
AppendFloat/32Shortest                 -1.00%   +1.36%       +2.94%        ~          ~              ~
AppendFloat/32Fixed8Hard               -5.91%  -12.45%       -6.62%        ~    +18.46%        +11.61%
AppendFloat/32Fixed9Hard               -6.53%  -11.35%       -6.01%   -0.97%    -18.31%         -9.16%
AppendFloat/64Fixed1                  -13.84%  -16.90%      -13.13%  -10.71%    -24.52%        -18.94%
AppendFloat/64Fixed2                  -11.12%  -16.97%      -12.13%   -9.88%    -22.73%        -15.48%
AppendFloat/64Fixed2.5                -21.98%  -20.75%      -19.08%  -14.74%    -28.11%        -24.92%
AppendFloat/64Fixed3                  -11.53%  -16.21%      -10.75%   -7.53%    -23.11%        -15.78%
AppendFloat/64Fixed4                  -12.89%  -12.36%      -11.07%   -9.79%    -14.51%        -13.44%
AppendFloat/64Fixed5Hard              -47.62%  -38.59%      -40.83%  -37.06%    -60.51%        -55.29%
AppendFloat/64Fixed12                  -7.40%        ~       -8.56%   -4.31%    -13.82%         -8.61%
AppendFloat/64Fixed16                  -9.10%   -8.95%       -6.92%   -3.92%    -12.99%         -9.03%
AppendFloat/64Fixed12Hard              -9.14%   -5.24%       -6.23%   -4.82%    -13.58%         -8.99%
AppendFloat/64Fixed17Hard              -6.80%        ~       -4.03%   -2.84%    -19.81%        -10.27%
AppendFloat/64Fixed18Hard              -0.12%        ~            ~        ~          ~              ~
AppendFloat/64FixedF1                       ~        ~            ~        ~     -0.40%         +2.72%
AppendFloat/64FixedF2                  -0.18%        ~       -1.98%   -0.95%          ~         +1.25%
AppendFloat/64FixedF3                  -0.29%        ~            ~        ~          ~         +1.22%
AppendFloat/Slowpath64                 -1.16%        ~            ~        ~          ~         -2.16%
AppendFloat/SlowpathDenormal64         -1.09%        ~            ~   -0.88%     -0.83%              ~

host: linux-arm64
goos: linux
goarch: arm64
pkg: internal/strconv
cpu: unknown
                                 │ 14b7e09f493  │             f9bf7fcb8e2             │
                                 │    sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-8               60.35n ± 0%   60.24n ± 0%   -0.18% (p=0.000 n=20)
AppendFloat/Float-8                 88.83n ± 0%   88.91n ± 0%   +0.09% (p=0.000 n=20)
AppendFloat/Exp-8                   93.55n ± 0%   93.06n ± 0%   -0.51% (p=0.000 n=20)
AppendFloat/NegExp-8                94.01n ± 0%   93.06n ± 0%   -1.01% (p=0.000 n=20)
AppendFloat/LongExp-8              101.00n ± 0%   99.77n ± 0%   -1.22% (p=0.000 n=20)
AppendFloat/Big-8                   106.1n ± 0%   103.9n ± 0%   -2.07% (p=0.000 n=20)
AppendFloat/BinaryExp-8             47.48n ± 0%   47.35n ± 0%   -0.28% (p=0.000 n=20)
AppendFloat/32Integer-8             60.45n ± 0%   60.43n ± 0%        ~ (p=0.150 n=20)
AppendFloat/32ExactFraction-8       86.65n ± 0%   86.22n ± 0%   -0.50% (p=0.000 n=20)
AppendFloat/32Point-8               83.26n ± 0%   83.21n ± 0%        ~ (p=0.046 n=20)
AppendFloat/32Exp-8                 92.55n ± 0%   89.42n ± 0%   -3.39% (p=0.000 n=20)
AppendFloat/32NegExp-8              87.89n ± 0%   87.34n ± 0%   -0.63% (p=0.000 n=20)
AppendFloat/32Shortest-8            77.05n ± 0%   76.28n ± 0%   -1.00% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-8          55.73n ± 0%   52.44n ± 0%   -5.91% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-8          64.80n ± 0%   60.57n ± 0%   -6.53% (p=0.000 n=20)
AppendFloat/64Fixed1-8              53.72n ± 0%   46.29n ± 0%  -13.84% (p=0.000 n=20)
AppendFloat/64Fixed2-8              52.64n ± 0%   46.79n ± 0%  -11.12% (p=0.000 n=20)
AppendFloat/64Fixed2.5-8            56.01n ± 0%   43.70n ± 0%  -21.98% (p=0.000 n=20)
AppendFloat/64Fixed3-8              53.38n ± 0%   47.23n ± 0%  -11.53% (p=0.000 n=20)
AppendFloat/64Fixed4-8              50.62n ± 0%   44.10n ± 0%  -12.89% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-8          98.94n ± 0%   51.82n ± 0%  -47.62% (p=0.000 n=20)
AppendFloat/64Fixed12-8             84.70n ± 0%   78.44n ± 0%   -7.40% (p=0.000 n=20)
AppendFloat/64Fixed16-8             71.68n ± 0%   65.16n ± 0%   -9.10% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-8         68.41n ± 0%   62.16n ± 0%   -9.14% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-8         79.31n ± 0%   73.92n ± 0%   -6.80% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-8         4.290µ ± 0%   4.285µ ± 0%   -0.12% (p=0.000 n=20)
AppendFloat/64FixedF1-8             216.0n ± 0%   216.1n ± 0%        ~ (p=0.090 n=20)
AppendFloat/64FixedF2-8             228.2n ± 0%   227.8n ± 0%   -0.18% (p=0.000 n=20)
AppendFloat/64FixedF3-8             208.8n ± 0%   208.2n ± 0%   -0.29% (p=0.000 n=20)
AppendFloat/Slowpath64-8            98.56n ± 0%   97.42n ± 0%   -1.16% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-8    95.81n ± 0%   94.77n ± 0%   -1.09% (p=0.000 n=20)
geomean                             93.81n        87.87n        -6.33%

host: local
goos: darwin
cpu: Apple M3 Pro
                                  │ 14b7e09f493 │             f9bf7fcb8e2              │
                                  │   sec/op    │    sec/op     vs base                │
AppendFloat/Decimal-12              21.14n ± 0%   21.15n ±  0%        ~ (p=0.963 n=20)
AppendFloat/Float-12                32.48n ± 1%   32.43n ±  0%        ~ (p=0.358 n=20)
AppendFloat/Exp-12                  31.85n ± 0%   31.94n ±  1%        ~ (p=0.634 n=20)
AppendFloat/NegExp-12               31.75n ± 0%   32.04n ±  0%        ~ (p=0.004 n=20)
AppendFloat/LongExp-12              33.55n ± 0%   33.81n ±  0%   +0.77% (p=0.000 n=20)
AppendFloat/Big-12                  35.62n ± 1%   35.73n ±  1%        ~ (p=0.888 n=20)
AppendFloat/BinaryExp-12            19.26n ± 0%   19.46n ±  1%   +1.06% (p=0.000 n=20)
AppendFloat/32Integer-12            21.41n ± 0%   21.46n ±  1%        ~ (p=0.733 n=20)
AppendFloat/32ExactFraction-12      31.23n ± 1%   31.30n ±  1%        ~ (p=0.857 n=20)
AppendFloat/32Point-12              31.39n ± 1%   31.02n ±  0%   -1.19% (p=0.000 n=20)
AppendFloat/32Exp-12                32.42n ± 1%   31.52n ±  1%   -2.79% (p=0.000 n=20)
AppendFloat/32NegExp-12             30.66n ± 1%   30.66n ±  1%        ~ (p=0.380 n=20)
AppendFloat/32Shortest-12           26.88n ± 1%   27.25n ±  1%   +1.36% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-12         19.52n ± 0%   17.09n ±  1%  -12.45% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-12         21.55n ± 2%   19.11n ±  1%  -11.35% (p=0.000 n=20)
AppendFloat/64Fixed1-12             18.64n ± 0%   15.49n ±  0%  -16.90% (p=0.000 n=20)
AppendFloat/64Fixed2-12             18.65n ± 0%   15.49n ±  0%  -16.97% (p=0.000 n=20)
AppendFloat/64Fixed2.5-12           19.23n ± 1%   15.24n ±  0%  -20.75% (p=0.000 n=20)
AppendFloat/64Fixed3-12             18.61n ± 0%   15.59n ±  1%  -16.21% (p=0.000 n=20)
AppendFloat/64Fixed4-12             17.55n ± 1%   15.38n ±  0%  -12.36% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-12         29.27n ± 1%   17.97n ±  0%  -38.59% (p=0.000 n=20)
AppendFloat/64Fixed12-12            28.26n ± 1%   28.17n ± 10%        ~ (p=0.941 n=20)
AppendFloat/64Fixed16-12            23.56n ± 0%   21.46n ±  0%   -8.95% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-12        21.85n ± 2%   20.70n ±  1%   -5.24% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-12        26.91n ± 1%   27.10n ±  0%        ~ (p=0.059 n=20)
AppendFloat/64Fixed18Hard-12        2.197µ ± 1%   2.169µ ±  1%        ~ (p=0.013 n=20)
AppendFloat/64FixedF1-12            103.7n ± 1%   103.3n ±  0%        ~ (p=0.035 n=20)
AppendFloat/64FixedF2-12            114.8n ± 1%   114.1n ±  1%        ~ (p=0.234 n=20)
AppendFloat/64FixedF3-12            107.8n ± 1%   107.1n ±  1%        ~ (p=0.180 n=20)
AppendFloat/Slowpath64-12           32.05n ± 1%   32.00n ±  0%        ~ (p=0.952 n=20)
AppendFloat/SlowpathDenormal64-12   29.98n ± 1%   30.20n ±  0%        ~ (p=0.004 n=20)
geomean                             33.83n        31.91n         -5.68%

host: linux-amd64
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                                  │ 14b7e09f493  │             f9bf7fcb8e2              │
                                  │    sec/op    │    sec/op     vs base                │
AppendFloat/Decimal-16               64.00n ± 1%    63.67n ± 1%        ~ (p=0.784 n=20)
AppendFloat/Float-16                 95.99n ± 1%    97.42n ± 1%   +1.50% (p=0.000 n=20)
AppendFloat/Exp-16                   97.59n ± 1%    97.72n ± 1%        ~ (p=0.984 n=20)
AppendFloat/NegExp-16                97.80n ± 1%   101.15n ± 1%   +3.43% (p=0.000 n=20)
AppendFloat/LongExp-16               103.1n ± 1%    104.5n ± 1%        ~ (p=0.006 n=20)
AppendFloat/Big-16                   110.8n ± 1%    108.5n ± 1%   -2.07% (p=0.000 n=20)
AppendFloat/BinaryExp-16             47.82n ± 1%    47.33n ± 1%        ~ (p=0.007 n=20)
AppendFloat/32Integer-16             63.65n ± 1%    63.51n ± 0%        ~ (p=0.560 n=20)
AppendFloat/32ExactFraction-16       91.81n ± 1%    97.03n ± 1%   +5.69% (p=0.000 n=20)
AppendFloat/32Point-16               89.84n ± 1%    92.16n ± 1%   +2.59% (p=0.000 n=20)
AppendFloat/32Exp-16                103.80n ± 1%    95.12n ± 1%   -8.36% (p=0.000 n=20)
AppendFloat/32NegExp-16              93.70n ± 1%    94.87n ± 1%        ~ (p=0.003 n=20)
AppendFloat/32Shortest-16            83.98n ± 1%    86.45n ± 1%   +2.94% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-16          61.91n ± 1%    57.81n ± 1%   -6.62% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-16          71.08n ± 0%    66.81n ± 1%   -6.01% (p=0.000 n=20)
AppendFloat/64Fixed1-16              59.27n ± 2%    51.49n ± 1%  -13.13% (p=0.000 n=20)
AppendFloat/64Fixed2-16              57.89n ± 1%    50.87n ± 1%  -12.13% (p=0.000 n=20)
AppendFloat/64Fixed2.5-16            61.04n ± 1%    49.40n ± 1%  -19.08% (p=0.000 n=20)
AppendFloat/64Fixed3-16              58.42n ± 1%    52.14n ± 1%  -10.75% (p=0.000 n=20)
AppendFloat/64Fixed4-16              56.52n ± 1%    50.27n ± 1%  -11.07% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16          97.79n ± 1%    57.86n ± 1%  -40.83% (p=0.000 n=20)
AppendFloat/64Fixed12-16             90.78n ± 1%    83.01n ± 1%   -8.56% (p=0.000 n=20)
AppendFloat/64Fixed16-16             76.11n ± 1%    70.84n ± 0%   -6.92% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16         73.56n ± 1%    68.98n ± 2%   -6.23% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16         83.20n ± 1%    79.85n ± 1%   -4.03% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-16         4.947µ ± 1%    4.915µ ± 1%        ~ (p=0.229 n=20)
AppendFloat/64FixedF1-16             242.4n ± 1%    239.4n ± 1%        ~ (p=0.038 n=20)
AppendFloat/64FixedF2-16             257.7n ± 2%    252.6n ± 1%   -1.98% (p=0.000 n=20)
AppendFloat/64FixedF3-16             237.5n ± 0%    237.5n ± 1%        ~ (p=0.440 n=20)
AppendFloat/Slowpath64-16            99.75n ± 1%    99.78n ± 1%        ~ (p=0.995 n=20)
AppendFloat/SlowpathDenormal64-16    97.41n ± 1%    98.20n ± 1%        ~ (p=0.006 n=20)
geomean                              100.7n         95.60n        -5.05%

host: s7
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                  │ 14b7e09f493 │             f9bf7fcb8e2             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-32              22.19n ± 0%   22.04n ± 0%   -0.68% (p=0.000 n=20)
AppendFloat/Float-32                34.59n ± 0%   34.88n ± 0%   +0.84% (p=0.000 n=20)
AppendFloat/Exp-32                  34.47n ± 0%   34.88n ± 0%   +1.20% (p=0.000 n=20)
AppendFloat/NegExp-32               34.85n ± 0%   35.32n ± 0%   +1.35% (p=0.000 n=20)
AppendFloat/LongExp-32              37.23n ± 0%   37.09n ± 0%        ~ (p=0.003 n=20)
AppendFloat/Big-32                  39.27n ± 0%   38.50n ± 0%   -1.97% (p=0.000 n=20)
AppendFloat/BinaryExp-32            17.38n ± 0%   17.61n ± 0%   +1.35% (p=0.000 n=20)
AppendFloat/32Integer-32            22.26n ± 0%   22.08n ± 0%   -0.79% (p=0.000 n=20)
AppendFloat/32ExactFraction-32      32.82n ± 0%   32.91n ± 0%        ~ (p=0.018 n=20)
AppendFloat/32Point-32              32.88n ± 0%   33.22n ± 0%   +1.03% (p=0.000 n=20)
AppendFloat/32Exp-32                34.95n ± 0%   34.62n ± 0%   -0.94% (p=0.000 n=20)
AppendFloat/32NegExp-32             33.23n ± 0%   33.55n ± 0%   +0.98% (p=0.000 n=20)
AppendFloat/32Shortest-32           30.19n ± 0%   30.12n ± 0%        ~ (p=0.122 n=20)
AppendFloat/32Fixed8Hard-32         22.94n ± 0%   22.88n ± 0%        ~ (p=0.124 n=20)
AppendFloat/32Fixed9Hard-32         26.20n ± 0%   25.94n ± 1%   -0.97% (p=0.000 n=20)
AppendFloat/64Fixed1-32             21.10n ± 0%   18.84n ± 0%  -10.71% (p=0.000 n=20)
AppendFloat/64Fixed2-32             20.75n ± 0%   18.70n ± 0%   -9.88% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32           21.07n ± 0%   17.96n ± 0%  -14.74% (p=0.000 n=20)
AppendFloat/64Fixed3-32             21.24n ± 0%   19.64n ± 0%   -7.53% (p=0.000 n=20)
AppendFloat/64Fixed4-32             20.63n ± 0%   18.61n ± 0%   -9.79% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32         34.48n ± 0%   21.70n ± 0%  -37.06% (p=0.000 n=20)
AppendFloat/64Fixed12-32            32.26n ± 0%   30.87n ± 1%   -4.31% (p=0.000 n=20)
AppendFloat/64Fixed16-32            27.95n ± 0%   26.86n ± 0%   -3.92% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32        27.30n ± 0%   25.98n ± 1%   -4.82% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-32        30.80n ± 0%   29.93n ± 0%   -2.84% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32        1.833µ ± 0%   1.831µ ± 0%        ~ (p=0.663 n=20)
AppendFloat/64FixedF1-32            83.42n ± 1%   84.00n ± 1%        ~ (p=0.003 n=20)
AppendFloat/64FixedF2-32            90.10n ± 0%   89.23n ± 1%   -0.95% (p=0.001 n=20)
AppendFloat/64FixedF3-32            84.42n ± 1%   84.39n ± 0%        ~ (p=0.878 n=20)
AppendFloat/Slowpath64-32           35.72n ± 0%   35.59n ± 0%        ~ (p=0.007 n=20)
AppendFloat/SlowpathDenormal64-32   35.36n ± 0%   35.05n ± 0%   -0.88% (p=0.000 n=20)
geomean                             36.05n        34.69n        -3.77%

host: linux-386
goarch: 386
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                                  │ 14b7e09f493 │             f9bf7fcb8e2             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-16              132.8n ± 0%   133.5n ± 0%   +0.49% (p=0.001 n=20)
AppendFloat/Float-16                242.6n ± 0%   241.7n ± 0%   -0.37% (p=0.000 n=20)
AppendFloat/Exp-16                  252.2n ± 0%   249.1n ± 0%   -1.27% (p=0.000 n=20)
AppendFloat/NegExp-16               253.6n ± 0%   247.7n ± 0%   -2.33% (p=0.000 n=20)
AppendFloat/LongExp-16              260.9n ± 0%   257.1n ± 0%   -1.48% (p=0.000 n=20)
AppendFloat/Big-16                  293.7n ± 0%   285.2n ± 0%   -2.89% (p=0.000 n=20)
AppendFloat/BinaryExp-16            89.63n ± 1%   89.06n ± 0%   -0.64% (p=0.000 n=20)
AppendFloat/32Integer-16            132.6n ± 0%   133.2n ± 0%        ~ (p=0.016 n=20)
AppendFloat/32ExactFraction-16      216.9n ± 0%   214.2n ± 0%   -1.24% (p=0.000 n=20)
AppendFloat/32Point-16              205.0n ± 0%   202.2n ± 0%   -1.37% (p=0.000 n=20)
AppendFloat/32Exp-16                250.2n ± 0%   235.9n ± 0%   -5.72% (p=0.000 n=20)
AppendFloat/32NegExp-16             213.5n ± 0%   210.6n ± 0%   -1.34% (p=0.000 n=20)
AppendFloat/32Shortest-16           198.3n ± 0%   197.8n ± 0%        ~ (p=0.147 n=20)
AppendFloat/32Fixed8Hard-16         114.9n ± 1%   136.0n ± 1%  +18.46% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-16         189.8n ± 0%   155.0n ± 1%  -18.31% (p=0.000 n=20)
AppendFloat/64Fixed1-16             175.8n ± 0%   132.7n ± 0%  -24.52% (p=0.000 n=20)
AppendFloat/64Fixed2-16             166.6n ± 0%   128.7n ± 0%  -22.73% (p=0.000 n=20)
AppendFloat/64Fixed2.5-16           176.5n ± 0%   126.8n ± 0%  -28.11% (p=0.000 n=20)
AppendFloat/64Fixed3-16             165.3n ± 0%   127.1n ± 0%  -23.11% (p=0.000 n=20)
AppendFloat/64Fixed4-16             141.3n ± 0%   120.8n ± 1%  -14.51% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16         344.6n ± 0%   136.0n ± 0%  -60.51% (p=0.000 n=20)
AppendFloat/64Fixed12-16            184.2n ± 0%   158.7n ± 0%  -13.82% (p=0.000 n=20)
AppendFloat/64Fixed16-16            174.0n ± 0%   151.3n ± 0%  -12.99% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16        169.7n ± 0%   146.7n ± 0%  -13.58% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16        207.7n ± 0%   166.6n ± 0%  -19.81% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-16        10.66µ ± 0%   10.63µ ± 0%        ~ (p=0.030 n=20)
AppendFloat/64FixedF1-16            615.9n ± 0%   613.5n ± 0%   -0.40% (p=0.000 n=20)
AppendFloat/64FixedF2-16            846.6n ± 0%   847.4n ± 0%        ~ (p=0.551 n=20)
AppendFloat/64FixedF3-16            609.9n ± 0%   609.5n ± 0%        ~ (p=0.213 n=20)
AppendFloat/Slowpath64-16           254.1n ± 0%   252.6n ± 1%        ~ (p=0.048 n=20)
AppendFloat/SlowpathDenormal64-16   251.5n ± 0%   249.4n ± 0%   -0.83% (p=0.000 n=20)
geomean                             249.2n        225.4n        -9.54%

host: s7:GOARCH=386
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                  │ 14b7e09f493 │             f9bf7fcb8e2             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-32              42.65n ± 0%   42.31n ± 0%   -0.79% (p=0.000 n=20)
AppendFloat/Float-32                71.56n ± 0%   71.06n ± 0%   -0.69% (p=0.000 n=20)
AppendFloat/Exp-32                  75.61n ± 1%   74.85n ± 1%   -1.01% (p=0.000 n=20)
AppendFloat/NegExp-32               74.36n ± 0%   74.30n ± 0%        ~ (p=0.482 n=20)
AppendFloat/LongExp-32              75.82n ± 0%   75.73n ± 0%        ~ (p=0.490 n=20)
AppendFloat/Big-32                  85.10n ± 0%   82.61n ± 0%   -2.93% (p=0.000 n=20)
AppendFloat/BinaryExp-32            33.02n ± 0%   32.48n ± 1%   -1.64% (p=0.000 n=20)
AppendFloat/32Integer-32            41.54n ± 1%   41.27n ± 1%   -0.66% (p=0.000 n=20)
AppendFloat/32ExactFraction-32      62.48n ± 0%   62.91n ± 0%   +0.69% (p=0.000 n=20)
AppendFloat/32Point-32              60.17n ± 0%   60.65n ± 0%   +0.80% (p=0.000 n=20)
AppendFloat/32Exp-32                73.34n ± 0%   68.99n ± 0%   -5.92% (p=0.000 n=20)
AppendFloat/32NegExp-32             63.29n ± 0%   62.83n ± 0%   -0.73% (p=0.000 n=20)
AppendFloat/32Shortest-32           58.97n ± 0%   59.07n ± 0%        ~ (p=0.029 n=20)
AppendFloat/32Fixed8Hard-32         37.42n ± 0%   41.76n ± 1%  +11.61% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32         55.18n ± 0%   50.13n ± 1%   -9.16% (p=0.000 n=20)
AppendFloat/64Fixed1-32             50.89n ± 1%   41.25n ± 0%  -18.94% (p=0.000 n=20)
AppendFloat/64Fixed2-32             48.33n ± 1%   40.85n ± 1%  -15.48% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32           52.46n ± 0%   39.39n ± 0%  -24.92% (p=0.000 n=20)
AppendFloat/64Fixed3-32             48.28n ± 1%   40.66n ± 0%  -15.78% (p=0.000 n=20)
AppendFloat/64Fixed4-32             44.57n ± 0%   38.58n ± 0%  -13.44% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32         96.16n ± 0%   42.99n ± 1%  -55.29% (p=0.000 n=20)
AppendFloat/64Fixed12-32            56.84n ± 0%   51.95n ± 1%   -8.61% (p=0.000 n=20)
AppendFloat/64Fixed16-32            54.23n ± 0%   49.33n ± 0%   -9.03% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32        53.47n ± 0%   48.67n ± 0%   -8.99% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-32        61.76n ± 0%   55.42n ± 1%  -10.27% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32        3.998µ ± 1%   4.001µ ± 0%        ~ (p=0.449 n=20)
AppendFloat/64FixedF1-32            161.8n ± 0%   166.2n ± 1%   +2.72% (p=0.000 n=20)
AppendFloat/64FixedF2-32            223.4n ± 2%   226.2n ± 1%   +1.25% (p=0.000 n=20)
AppendFloat/64FixedF3-32            159.6n ± 0%   161.6n ± 1%   +1.22% (p=0.000 n=20)
AppendFloat/Slowpath64-32           76.69n ± 0%   75.03n ± 0%   -2.16% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-32   75.02n ± 0%   74.36n ± 1%        ~ (p=0.003 n=20)
geomean                             74.66n        69.39n        -7.06%

Change-Id: I9db46471a93bd2aab3c2796e563d154cb531d4cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/717182
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>

internal/strconv: add tests and benchmarks for ftoaFixed

ftoaFixed is in the next CL; this proves the tests are correct
against the current implementation, and it adds a benchmark
for comparison with the new implementation.

Change-Id: I7ac8a1f699b693ea6d11a7122b22fc70cc135af6
Reviewed-on: https://go-review.googlesource.com/c/go/+/717181
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

internal/strconv: fix pow10 off-by-one in exponent result

The exact meaning of pow10 was not defined nor tested directly.
Define it as pow10(e) returns mant, exp where mant/2^128 * 2**exp = 10^e.
This is the most natural definition but is off-by-one from what
it had been returning. Fix the off-by-one and then adjust the
call sites to stop compensating for it.

Change-Id: I9ee475854f30be4bd0d4f4d770a6b12ec68281fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/717180
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>

cmd/internal/obj/loong64: using {xv,v}slli.d to perform copying between vector registers

Go asm syntax:
VMOVQ     Vj, Vd
XVMOVQ    Xj, Xd

Equivalent platform assembler syntax:
vslli.d   vd, vj, 0x0
xvslli.d  xd, xj, 0x0

Change-Id: Ifddc3d4d3fbaa6fee2e079bf2ebfe96a2febaa1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/716801
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/internal/obj/loong64: add VPERMI.W, XVPERMI.{W,V,Q} instruction support

Go asm syntax:
VPERMIW        $0x1b, vj, vd
XVPERMI{W,V,Q}  $0x1b, xj, xd

Equivalent platform assembler syntax:
vpermi.w       vd, vj, $0x1b
xvpermi.{w,d,q} xd, xj, $0x1b

Change-Id: Ie23b2fdd09b4c93801dc804913206f1c5a496268
Reviewed-on: https://go-review.googlesource.com/c/go/+/716800
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

runtime: avoid append in printint, printuint

Should make cmd/link/internal/ld.TestAbstractOriginSanity happier.

Change-Id: I121927d42e527ff23d996e7387066f149b11cc59
Reviewed-on: https://go-review.googlesource.com/c/go/+/717480
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

runtime: allow Stack to traceback goroutines in syscall _Grunning window

net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of
all goroutines, and searches for a specific line in that stack trace.

It currently sometimes fails because it encounters the goroutine its
looking for in the small window where a goroutine might be in _Grunning
while in a syscall, introduced in CL 646198. In that case, the traceback
will give up, failing to print the stack TestCopyError is expecting.

This represents a general regression, since previously runtime.Stack
could never fail to take a goroutine's stack; giving up was only
possible in fatal panic cases.

Fix this the same way we fixed goroutine profiles: allow the stack trace
to proceed if the g's syscallsp != 0. This is safe in any
stop-the-world-related context, because syscallsp won't be mutated while
the goroutine fails to acquire a P, and thus fails to fully exit the
syscall context. This also means the stack below syscallsp won't be
mutated, and thus taking a traceback is also safe.

Fixes #66639.

Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/716680
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

runtime: don't panic in castogscanstatus

The panic calls gopanic which may have write barriers, but
castogscanstatus is called from //go:nowritebarrier contexts.

The panic is dead code anyway, and appears immediately before a call to
'throw'.

Change-Id: I4a8e296b71bf002295a3aa1db4f723c305ed939a
Reviewed-on: https://go-review.googlesource.com/c/go/+/717406
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>

cmd/cgo: use the export'ed file/line in error messages

When an unpinned Go pointer (or a pointer to an unpinned Go pointer) is
returned from Go to C,

     1  package main
     2
     3  import (
     4          "C"
     5  )
     6
     7  //export foo
     8  func foo(CLine *C.char) string {
     9          return C.GoString(CLine)
    10  }
    11
    12
    13  func main() {
    14  }

The error message mentions the file/line of the cgo wrapper,

    panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer

    goroutine 17 [running, locked to thread]:
    panic({0x798f2341a4c0?, 0xc000112000?})
            /usr/lib/go/src/runtime/panic.go:802 +0x168
    runtime.cgoCheckArg(0x798f23417e20, 0xc000066e50, 0x0?, 0x0, {0x798f233f5a62, 0x42})
            /usr/lib/go/src/runtime/cgocall.go:679 +0x35b
    runtime.cgoCheckResult({0x798f23417e20, 0xc000066e50})
            /usr/lib/go/src/runtime/cgocall.go:795 +0x4b
    _cgoexp_3c910ddb72c4_foo(0x7ffc9fa9bfa0)
            _cgo_gotypes.go:65 +0x5d
    runtime.cgocallbackg1(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
            /usr/lib/go/src/runtime/cgocall.go:446 +0x289
    runtime.cgocallbackg(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
            /usr/lib/go/src/runtime/cgocall.go:350 +0x132
    runtime.cgocallbackg(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
            <autogenerated>:1 +0x2b
    runtime.cgocallback(0x0, 0x0, 0x0)
            /usr/lib/go/src/runtime/asm_amd64.s:1082 +0xcd
    runtime.goexit({})
            /usr/lib/go/src/runtime/asm_amd64.s:1693 +0x1

The cgo wrapper (_cgoexp_3c910ddb72c4_foo) is located in a temporary
build artifact (_cgo_gotypes.go)

    $ go tool cgo -objdir objdir parse.go
    $ cat -n objdir/_cgo_gotypes.go | sed -n '55,70p'
        55  //go:cgo_export_dynamic foo
        56  //go:linkname _cgoexp_d48770e267d1_foo _cgoexp_d48770e267d1_foo
        57  //go:cgo_export_static _cgoexp_d48770e267d1_foo
        58  func _cgoexp_d48770e267d1_foo(a *struct {
        59                  p0 *_Ctype_char
        60                  r0 string
        61          }) {
        62          a.r0 = foo(a.p0)
        63          _cgoCheckResult(a.r0)
        64  }

The file/line of the export'ed function is expected in the error message.

Use it in error messages.

    panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer

    goroutine 17 [running, locked to thread]:
    panic({0x7df72b1d8ae0?, 0x3ec8a1790030?})
            /mnt/go/src/runtime/panic.go:877 +0x16f
    runtime.cgoCheckArg(0x7df72b1d62c0, 0x3ec8a16eee50, 0x68?, 0x0, {0x7df72b1ad44c, 0x42})
            /mnt/go/src/runtime/cgocall.go:679 +0x35b
    runtime.cgoCheckResult({0x7df72b1d62c0, 0x3ec8a16eee50})
            /mnt/go/src/runtime/cgocall.go:795 +0x4b
    _cgoexp_3c910ddb72c4_foo(0x7ffca1b21020)
            /mnt/tmp/parse.go:8 +0x5d
    runtime.cgocallbackg1(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
            /mnt/go/src/runtime/cgocall.go:446 +0x289
    runtime.cgocallbackg(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
            /mnt/go/src/runtime/cgocall.go:350 +0x132
    runtime.cgocallbackg(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
            <autogenerated>:1 +0x2b
    runtime.cgocallback(0x0, 0x0, 0x0)
            /mnt/go/src/runtime/asm_amd64.s:1101 +0xcd
    runtime.goexit({})
            /mnt/go/src/runtime/asm_amd64.s:1712 +0x1

So doing, fix typos in comments.

Link: https://web.archive.org/web/20251008114504/https://dave.cheney.net/2018/01/08/gos-hidden-pragmas
Suggested-by: Keith Randall <khr@golang.org>
For #75856

Change-Id: I0bf36d5c8c5c0c7df13b00818bc4641009058979
GitHub-Last-Rev: e65839cfb2e28a879beac67c5c550de871b00018
GitHub-Pull-Request: golang/go#76118
Reviewed-on: https://go-review.googlesource.com/c/go/+/716441
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/go: link to go.dev/doc/godebug for removed GODEBUG settings

This makes the user experience better, before users would receive
an unknown godebug error message, now we explicitly mention that
it was removed and link to go.dev/doc/godebug where users can find
more information about the removal.

Additionally we keep all the removed GODEBUGs in the source, making
sure we do not reuse such GODEBUG after it is removed.

Updates #72111
Updates #75316

Change-Id: I6a6a6964cce1c100108fdba4bfba7d13cd9a893a
Reviewed-on: https://go-review.googlesource.com/c/go/+/701875
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Mateusz Poliwczak <mpoliwczak34@gmail.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Michael Matloob <matloob@google.com>

crypto/tls: add BetterTLS test coverage

This commit adds test coverage of path building and name constraint
verification using the suite of test data provided by Netflix's
BetterTLS project.

Since the uncompressed raw JSON test data exported by BetterTLS for
external test integrations is ~31MB we use a similar approach to the
BoGo and ACVP test integrations and fetch the BetterTLS Go module, and
run its export tool on-the-fly to generate the test data in a tempdir.

As expected, all tests pass currently and this coverage is mainly
helpful in catching regressions, especially with tricky/cursed name
constraints.

Change-Id: I23d7c24232e314aece86bcbfd133b7f02c9e71b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/717420
TryBot-Bypass: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: Michael Pratt <mpratt@google.com>

cmd/internal/obj: support arm64 FMOVQ large offset encoding

Support arm64 FMOVQ with large offset in immediate which is encoded
using register offset instruction in opldrr or opstrr. This will help
allowing folding immediate into new ssa ops FMOVQload and FMOVQstore.

For example: FMOVQ F0, -20000(R0) is encoded as following:
  MOVD 3(PC), R27
  FMOVQ F0, (R0)(R27)
  RET
  ffff b1e0 # constant value

Change-Id: Ib71f92f6ff4b310bda004a440b1df41ffe164523
Reviewed-on: https://go-review.googlesource.com/c/go/+/716960
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

cmd/go/testdata/script: loosen list_empty_importpath for freebsd

We've been seeing the flakes where we get a 'no errors' output on
freebsd in addition to windows and solaris. Also allow that case to
avoid flakes.

For #73976

Change-Id: I6a6a696445ec908b55520d8d75e7c1f867b9c092
Reviewed-on: https://go-review.googlesource.com/c/go/+/715640
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@google.com>
Reviewed-by: Ian Alexander <jitsu@google.com>

internal/runtime/cgobench: add cgo callback benchmark

Change-Id: I43ea575aff87a3e420477cb26d35185d03df5ccc
Reviewed-on: https://go-review.googlesource.com/c/go/+/713283
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/go: move functions to methods

[git-generate]
cd src/cmd/go/internal/modload
rf '
  mv InitWorkfile State.InitWorkfile
  mv FindGoWork State.FindGoWork
  mv WillBeEnabled State.WillBeEnabled
  mv Enabled State.Enabled
  mv inWorkspaceMode State.inWorkspaceMode
  mv HasModRoot State.HasModRoot
  mv MustHaveModRoot State.MustHaveModRoot
  mv ModFilePath State.ModFilePath
'

Change-Id: I207113868af037c9c0049f4207c3d3b4c19468bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/716602
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/go: eliminate additional global variable

Move global variable to a field on the State type.

Change-Id: I1edd32e1d28ce814bcd75501098ee4b22227546b
Reviewed-on: https://go-review.googlesource.com/c/go/+/716162
Reviewed-by: Michael Matloob <matloob@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@google.com>

cmd/go/internal/telemetrystats: count cgo usage

Knowing how many times cgo is used is useful information to have in the
local telemetry database.

It also opens the door for uploading them in the future if desired.

Change-Id: Ia92b11fc489f015bbface7f28ed5a5c2871c44f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/707055
Reviewed-by: Michael Matloob <matloob@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Reviewed-by: Michael Matloob <matloob@google.com>
Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>

runtime: update outdated comments for deferprocStack

Change-Id: I0ea4d15da163cec6fe2a703376ce5a6032e15484
Reviewed-on: https://go-review.googlesource.com/c/go/+/714861
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>

all: remove extra space in the comments

Change-Id: I26302d801732f40b1fe6b30ff69d222047bca490
Reviewed-on: https://go-review.googlesource.com/c/go/+/716740
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

internal/profile: optimize Parse allocs

In our case, it greatly improves the performance of continuously collecting diff profiles from the net/http/pprof endpoint, such as /debug/pprof/allocs?seconds=30.

This CL is a cherry-pick of my PR upstream: https://github.com/google/pprof/pull/951

Benchmark of profile Parse func:

goos: linux
goarch: amd64
pkg: github.com/google/pprof/profile
cpu: 13th Gen Intel(R) Core(TM) i7-1360P
         │ old-parse.txt │            new-parse.txt             │
         │    sec/op     │    sec/op     vs base                │
Parse-16    62.07m ± 13%   55.54m ± 13%  -10.52% (p=0.035 n=10)

         │ old-parse.txt │            new-parse.txt             │
         │     B/op      │     B/op      vs base                │
Parse-16    47.56Mi ± 0%   41.09Mi ± 0%  -13.59% (p=0.000 n=10)

         │ old-parse.txt │            new-parse.txt            │
         │   allocs/op   │  allocs/op   vs base                │
Parse-16     272.9k ± 0%   175.8k ± 0%  -35.58% (p=0.000 n=10)

Change-Id: I737ff9b9f815fdc56bc3b5743403717c4b6f07fd
GitHub-Last-Rev: a09108f7ff7e6a27509f300d617e7adb36a9eb8a
GitHub-Pull-Request: golang/go#76145
Reviewed-on: https://go-review.googlesource.com/c/go/+/717081
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>

bytes: add Buffer.Peek

Fixes #73794

Change-Id: I0a57db05aacfa805213fe8278fc727e76eb8a65e
GitHub-Last-Rev: 3494d93f803f21905dfd5a9d593644da69279f16
GitHub-Pull-Request: golang/go#73795
Reviewed-on: https://go-review.googlesource.com/c/go/+/674415
Reviewed-by: Sean Liao <sean@liao.dev>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

runtime: remove the pc field of _defer struct

Since we always can get the address of `CALL runtime.deferreturn(SB)`
from the unwinder, so it is not necessary to record the caller's pc
in the _defer struct. For the stack allocated _defer, this CL makes
the frame smaller.

Change-Id: I0fd347e4bc07cf8a9b954816323df30fc52552b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/716720
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

crypto/internal/constanttime: expose intrinsics to the FIPS 140-3 packages

Intrinsifying things inside the module (crypto/internal/fips140/subtle)
is asking for trouble, as the import paths are rewritten by the
GOFIPS140 mechanism, and we might have to support multiple modules
in the future.

Importing crypto/subtle from inside a FIPS 140-3 module is not allowed,
and is basically asking for circular dependencies.

Instead, break off the intrinsics into their own package
(crypto/internal/constanttime), and keep the byte slice operations
in crypto/internal/fips140/subtle. crypto/subtle then becomes a thin
dispatch layer.

Change-Id: I6a6a6964cd5cb5ad06e9d1679201447f5a811da4
Reviewed-on: https://go-review.googlesource.com/c/go/+/716120
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>

cmd/go: skip git sha256 tests if git < 2.29

Fix test building on older Ubuntu LTS releases (that are still
supported). Git SHA256 support was only included in 2.29, which came out
in 2021. Check the output of `git version` and skip these tests if the
version is older than that introduction.

Thanks to @ianlancetaylor for flagging this.

Updates: #73704
Change-Id: I9d413a63fa43f34f94c274bba7f7b883c80433b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/698835
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Michael Matloob <matloob@google.com>
Auto-Submit: Michael Matloob <matloob@google.com>
Reviewed-by: Ian Alexander <jitsu@google.com>

runtime: prevent time.Timer.Reset(0) from deadlocking testing/synctest tests

In Go 1.23+, timer channels behave synchronously. When we have a timer
channel (i.e. !async && t.isChan) we would lock the
runtime.timer.sendLock mutex at the beginning of
runtime.timer.modify()'s execution.

Calling time.Timer.Reset(0) within a testing/synctest test,
unfortunately, causes it to hang indefinitely. This is because the
runtime.timer.sendLock mutex ends up being locked twice before it could
be unlocked:

- When calling time.Timer.Reset(), runtime.timer.modify() would lock the
  mutex per usual.
- Due to the 0 argument, runtime.timer.modify() would also try to
  execute the bubbled timer immediately rather than adding them to a
  heap. However, in doing so, it uses runtime.timer.unlockAndRun(),
  which also locks the same mutex.

This CL solves this issue by making sure that a locked
runtime.timer.sendLock mutex is unlocked first, whenever we try to
execute bubbled timer immediately in the stack.

Fixes #76052

Change-Id: I66429b9bf6971400de95dcf2d5dc9670c3135492
Reviewed-on: https://go-review.googlesource.com/c/go/+/716883
Reviewed-by: Damien Neil <dneil@google.com>
Auto-Submit: Nicholas Husin <nsh@golang.org>
Reviewed-by: Nicholas Husin <husin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/go: document purego convention

Fixes #23172

Change-Id: I854e399471dfa22e62fbdec9719e561c4501e5ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/660136
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

runtime: avoid zeroing scavenged memory

On Linux, memory returned to the kernel via MADV_DONTNEED is guaranteed
to be zero-filled on its next use.

This commit leverages this kernel behavior to avoid a redundant software
zeroing pass in the runtime, improving performance.

Change-Id: Ia14343b447a2cec7af87644fe8050e23e983c787
GitHub-Last-Rev: 6c8df322836e70922c69ca3c5aac36e4b8a0839a
GitHub-Pull-Request: golang/go#76063
Reviewed-on: https://go-review.googlesource.com/c/go/+/715160
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>

runtime: prioritize panic output over racefini

For some reason CL 646198 uncovered #3934 and #20018 again, but only in
race mode. It turns out that because racefini does not return, and
racefini is called early after main returns, we would not properly wait
for a concurrent panic to complete. This would result in fairly
consistent failures of TestPanicRace, which specifically looks for the
panic output to appear if main concurrently exits.

The important part of this change is that race mode will no longer have
the bug described in #3934 and #20018. A byproduct, however, is that
racefini is that we're essentially prioritizing the panic output over
racefini in this scenario. If racefini were to reveal a latent race
condition and fail, we'll prefer to surface the panic. Such a case is
probably fine, because the panic is always an crashing, unrecoverable
panic.

For #3934.
For #20018.

Change-Id: I0674a75c918563c5ec4ee1eec057dfd096fcfbc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/691795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

runtime: optimistically CAS atomicstatus directly in enter/exitsyscall

This change steals the performance trick from the coro implementation to
try to do the CAS directly first before calling into casgstatus, a much
more heavyweight function. We have to be careful about synctest
bubbling, but overall it's a good bit faster, and easy low-hanging
fruit.

goos: linux
goarch: amd64
pkg: internal/runtime/cgobench
cpu: AMD EPYC 7B13
           │ after-2-2.out │            after-3.out             │
           │    sec/op     │   sec/op     vs base               │
CgoCall-64     34.62n ± 1%   30.55n ± 1%  -11.76% (p=0.002 n=6)

Change-Id: Ic38620233b55f58b8a07510666aa18648373e2e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/708596
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

runtime: don't track scheduling latency for _Grunning <-> _Gsyscall

The current logic causes much more tracking than necessary, when really
_Grunning and _Gsyscall are both sort of "running" from the perspective
of tracking scheduling latency.

This makes cgo calls and syscalls a little faster in the single-threaded
case, and shows much larger improvement in the multi-threaded case
by removing updates of shared variables (though this parallel
microbenchmark is a little unrealistic, so don't ascribe too much weight
to it).

goos: linux
goarch: amd64
pkg: internal/runtime/cgobench
cpu: AMD EPYC 7B13
                   │  after.out  │            after-2.out             │
                   │   sec/op    │   sec/op     vs base               │
CgoCall-64           35.83n ± 1%   34.69n ± 1%   -3.20% (p=0.002 n=6)
CgoCallParallel-64   5.338n ± 1%   1.352n ± 4%  -74.67% (p=0.002 n=6)

Change-Id: I2ea494dd5ebbbfb457373549986fbe2fbe318d45
Reviewed-on: https://go-review.googlesource.com/c/go/+/646275
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

runtime: document tracer invariants explicitly

This change is a documentation update for the execution tracer. Far too
much is left to small comments scattered around places. This change
accumulates the big important trace invariants, with rationale, into one
file: trace.go.

Change-Id: I5fd1402a3d8fdf14a0051e305b3a8fb5dfeafcb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/708398
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

runtime: eliminate _Psyscall

This change eliminates the _Psyscall state by using synchronization on
the G status _Gsyscall to make syscalls work instead. This removes an
atomic Store and an atomic CAS on the syscall path, which reduces
syscall and cgo overheads. It also simplifies the syscall paths quite a
bit.

The one danger with this change is that we have a new combination of
states that was previously impossible. There are brief windows where
it's possible to observe a goroutine in _Grunning but without a P. This
change is careful to hide this detail from the execution tracer, but it
may have unexpected effects in the rest of the runtime, making this
change somewhat risky.

goos: linux
goarch: amd64
pkg: internal/runtime/cgobench
cpu: AMD EPYC 7B13
                   │ before.out  │             after.out              │
                   │   sec/op    │   sec/op     vs base               │
CgoCall-64           43.69n ± 1%   35.83n ± 1%  -17.99% (p=0.002 n=6)
CgoCallParallel-64   5.306n ± 1%   5.338n ± 1%        ~ (p=0.132 n=6)

Change-Id: I4551afc1eea0c1b67a0b2dd26b0d49aa47bf1fb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/646198
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

strconv: delete divmod1e9

The compiler is just as good now, even on 32-bit systems.

Change-Id: Ifee72c0e84a68703c0721a7a9f4ca5aa637ad5e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/716464
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>

runtime: delete timediv

Now that the compiler handles constant 64-bit divisions
without function calls on 32-bit systems, we no longer need
to maintain and test a bad custom implementation of 64-bit division.

Change-Id: If28807ad4f86507267ae69bc8f0b09ec18e98b66
Reviewed-on: https://go-review.googlesource.com/c/go/+/716463
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>

strconv: remove arch-specific decision in formatBase10

There is only one architecture-specific code segment left in formatBase10.
Remove it for simplicity.

The only affected system is ppc64le, which does add 10-20% to the
runtime, but that's a ppc64le problem, not a strconv problem.
Changing the "uint32" to "uint" makes ppc64le not slower anymore,
meaning that somehow uint32 divide-by-constant is slower than
uint divide-by-constant on ppc64le. If this minor slowdown matters,
it should be addressed by improving the generated code for
ppc64le division, not by complicating strconv.
Even though some percentages look big, the geomean is +6% and
the worst case slowdown is only about 6ns/call.

benchmark \ host                    local       s7  s7:GOARCH=386  linux-amd64  linux-386  linux-ppc64le
                                  vs base  vs base        vs base      vs base    vs base        vs base
AppendFloat/Decimal                     ~        ~         -0.75%            ~          ~              ~
AppendFloat/Float                       ~        ~              ~            ~          ~              ~
AppendFloat/Exp                         ~        ~              ~            ~          ~              ~
AppendFloat/NegExp                      ~        ~              ~            ~          ~              ~
AppendFloat/LongExp                     ~        ~              ~            ~          ~              ~
AppendFloat/Big                         ~        ~              ~            ~          ~              ~
AppendFloat/BinaryExp                   ~        ~              ~            ~     -2.92%        +10.31%
AppendFloat/32Integer                   ~        ~              ~            ~          ~              ~
AppendFloat/32ExactFraction             ~        ~         -0.43%            ~          ~              ~
AppendFloat/32Point                     ~        ~              ~            ~          ~              ~
AppendFloat/32Exp                       ~        ~              ~            ~          ~              ~
AppendFloat/32NegExp                    ~        ~              ~            ~          ~              ~
AppendFloat/32Shortest                  ~        ~              ~            ~          ~              ~
AppendFloat/32Fixed8Hard                ~        ~         -1.12%            ~     -4.19%         +5.39%
AppendFloat/32Fixed9Hard                ~        ~              ~            ~          ~        +12.09%
AppendFloat/64Fixed1                    ~        ~              ~            ~     -0.78%              ~
AppendFloat/64Fixed2                    ~        ~         +0.53%            ~     -0.74%              ~
AppendFloat/64Fixed3                    ~        ~              ~            ~     -0.82%              ~
AppendFloat/64Fixed4                    ~        ~              ~            ~     -0.63%         +2.55%
AppendFloat/64Fixed12                   ~        ~         +0.63%       -1.83%     -1.01%         +5.68%
AppendFloat/64Fixed16                   ~        ~         +0.63%            ~     -1.33%              ~
AppendFloat/64Fixed12Hard               ~        ~         +1.31%            ~     -0.34%         +4.43%
AppendFloat/64Fixed17Hard               ~        ~              ~            ~     -1.66%         +4.63%
AppendFloat/64Fixed18Hard               ~        ~              ~            ~          ~              ~
AppendFloat/Slowpath64                  ~        ~              ~            ~          ~              ~
AppendFloat/SlowpathDenormal64          ~        ~              ~            ~          ~              ~
AppendInt                               ~        ~              ~            ~     -0.77%        +11.06%
AppendUint                              ~        ~         +0.42%            ~     -1.35%         +7.27%
AppendIntSmall                          ~        ~              ~            ~          ~              ~
AppendUintVarlen/digits=1               ~        ~              ~            ~          ~              ~
AppendUintVarlen/digits=2               ~        ~              ~            ~          ~         +4.80%
AppendUintVarlen/digits=3               ~        ~         -1.61%            ~     -4.49%        +10.04%
AppendUintVarlen/digits=4               ~        ~         -2.09%            ~     -2.88%         +9.99%
AppendUintVarlen/digits=5               ~        ~         -3.97%            ~     -8.92%         +7.19%
AppendUintVarlen/digits=6               ~        ~         -3.88%            ~     -7.84%         +5.25%
AppendUintVarlen/digits=7               ~        ~         -2.93%            ~    -11.08%              ~
AppendUintVarlen/digits=8               ~        ~         -2.88%            ~    -10.09%        +10.74%
AppendUintVarlen/digits=9               ~        ~         -3.69%            ~    -10.98%        +11.56%
AppendUintVarlen/digits=10              ~        ~         +0.90%            ~     -1.09%        +34.49%
AppendUintVarlen/digits=11              ~        ~         +2.11%            ~     +0.72%        +20.63%
AppendUintVarlen/digits=12              ~        ~         +1.21%            ~     -2.26%        +27.38%
AppendUintVarlen/digits=13              ~        ~         +2.07%            ~     -2.13%        +15.37%
AppendUintVarlen/digits=14              ~        ~              ~            ~     -3.27%        +19.32%
AppendUintVarlen/digits=15              ~        ~         +0.56%            ~     -3.18%         +8.55%
AppendUintVarlen/digits=16              ~        ~              ~            ~     -4.31%        +15.69%
AppendUintVarlen/digits=17              ~        ~              ~            ~     -4.50%         +7.61%
AppendUintVarlen/digits=18              ~        ~              ~            ~     -5.54%        +12.74%
AppendUintVarlen/digits=19              ~        ~         +0.87%            ~     +0.41%        +13.82%
AppendUintVarlen/digits=20              ~        ~         +1.40%            ~     +0.82%        +15.80%

host: local
                                  │ e8ffcf3095c │            3c5742b667a             │
                                  │   sec/op    │   sec/op     vs base               │
AppendFloat/Decimal-12              21.04n ± 2%   21.10n ± 0%       ~ (p=0.542 n=20)
AppendFloat/Float-12                32.77n ± 1%   32.56n ± 0%       ~ (p=0.213 n=20)
AppendFloat/Exp-12                  32.42n ± 1%   32.42n ± 0%       ~ (p=0.995 n=20)
AppendFloat/NegExp-12               31.78n ± 2%   31.79n ± 2%       ~ (p=0.542 n=20)
AppendFloat/LongExp-12              31.54n ± 1%   31.45n ± 1%       ~ (p=0.507 n=20)
AppendFloat/Big-12                  33.75n ± 1%   33.59n ± 1%       ~ (p=0.597 n=20)
AppendFloat/BinaryExp-12            19.16n ± 1%   19.14n ± 1%       ~ (p=0.635 n=20)
AppendFloat/32Integer-12            21.59n ± 1%   21.63n ± 1%       ~ (p=0.569 n=20)
AppendFloat/32ExactFraction-12      31.12n ± 1%   31.17n ± 1%       ~ (p=0.337 n=20)
AppendFloat/32Point-12              31.11n ± 1%   31.13n ± 0%       ~ (p=0.899 n=20)
AppendFloat/32Exp-12                32.27n ± 0%   32.24n ± 1%       ~ (p=0.794 n=20)
AppendFloat/32NegExp-12             30.56n ± 1%   30.64n ± 1%       ~ (p=0.899 n=20)
AppendFloat/32Shortest-12           26.88n ± 0%   27.11n ± 1%       ~ (p=0.001 n=20)
AppendFloat/32Fixed8Hard-12         20.63n ± 1%   20.63n ± 0%       ~ (p=0.596 n=20)
AppendFloat/32Fixed9Hard-12         22.69n ± 1%   22.81n ± 1%       ~ (p=0.763 n=20)
AppendFloat/64Fixed1-12             18.65n ± 0%   18.73n ± 1%       ~ (p=0.112 n=20)
AppendFloat/64Fixed2-12             18.91n ± 1%   18.88n ± 0%       ~ (p=0.569 n=20)
AppendFloat/64Fixed3-12             18.73n ± 1%   18.60n ± 0%       ~ (p=0.003 n=20)
AppendFloat/64Fixed4-12             17.87n ± 3%   17.84n ± 1%       ~ (p=0.569 n=20)
AppendFloat/64Fixed12-12            28.18n ± 0%   28.30n ± 0%       ~ (p=0.010 n=20)
AppendFloat/64Fixed16-12            24.38n ± 1%   24.34n ± 1%       ~ (p=0.683 n=20)
AppendFloat/64Fixed12Hard-12        21.61n ± 0%   21.63n ± 1%       ~ (p=0.542 n=20)
AppendFloat/64Fixed17Hard-12        27.02n ± 1%   26.94n ± 0%       ~ (p=0.077 n=20)
AppendFloat/64Fixed18Hard-12        2.179µ ± 1%   2.189µ ± 1%       ~ (p=0.337 n=20)
AppendFloat/Slowpath64-12           29.93n ± 1%   30.01n ± 1%       ~ (p=0.402 n=20)
AppendFloat/SlowpathDenormal64-12   28.29n ± 1%   28.35n ± 0%       ~ (p=0.952 n=20)
AppendInt-12                        400.7n ± 1%   401.8n ± 1%       ~ (p=0.176 n=20)
AppendUint-12                       95.87n ± 1%   96.06n ± 1%       ~ (p=0.208 n=20)
AppendIntSmall-12                   3.450n ± 0%   3.455n ± 1%       ~ (p=0.203 n=20)
AppendUintVarlen/digits=1-12        3.150n ± 1%   3.175n ± 1%       ~ (p=0.147 n=20)
AppendUintVarlen/digits=2-12        2.654n ± 0%   2.667n ± 1%       ~ (p=0.003 n=20)
AppendUintVarlen/digits=3-12        5.044n ± 0%   5.069n ± 0%       ~ (p=0.026 n=20)
AppendUintVarlen/digits=4-12        5.046n ± 0%   5.047n ± 1%       ~ (p=0.408 n=20)
AppendUintVarlen/digits=5-12        5.851n ± 0%   5.860n ± 1%       ~ (p=0.132 n=20)
AppendUintVarlen/digits=6-12        6.118n ± 0%   6.120n ± 0%       ~ (p=0.888 n=20)
AppendUintVarlen/digits=7-12        7.333n ± 0%   7.334n ± 0%       ~ (p=0.588 n=20)
AppendUintVarlen/digits=8-12        7.312n ± 1%   7.322n ± 0%       ~ (p=0.805 n=20)
AppendUintVarlen/digits=9-12        7.832n ± 1%   7.838n ± 1%       ~ (p=0.465 n=20)
AppendUintVarlen/digits=10-12       8.606n ± 1%   8.602n ± 1%       ~ (p=0.974 n=20)
AppendUintVarlen/digits=11-12       8.908n ± 0%   8.899n ± 1%       ~ (p=0.931 n=20)
AppendUintVarlen/digits=12-12       9.178n ± 0%   9.118n ± 0%       ~ (p=0.007 n=20)
AppendUintVarlen/digits=13-12       9.398n ± 1%   9.405n ± 1%       ~ (p=0.888 n=20)
AppendUintVarlen/digits=14-12       10.24n ± 0%   10.22n ± 0%       ~ (p=0.154 n=20)
AppendUintVarlen/digits=15-12       10.50n ± 0%   10.48n ± 0%       ~ (p=0.692 n=20)
AppendUintVarlen/digits=16-12       11.66n ± 0%   11.75n ± 2%       ~ (p=0.087 n=20)
AppendUintVarlen/digits=17-12       11.65n ± 0%   11.72n ± 1%       ~ (p=0.079 n=20)
AppendUintVarlen/digits=18-12       12.38n ± 1%   12.42n ± 2%       ~ (p=0.035 n=20)
AppendUintVarlen/digits=19-12       13.59n ± 1%   13.60n ± 0%       ~ (p=0.673 n=20)
AppendUintVarlen/digits=20-12       13.85n ± 1%   13.84n ± 0%       ~ (p=0.634 n=20)
geomean                             17.99n        18.01n       +0.09%

host: s7
                                  │ e8ffcf3095c │            3c5742b667a             │
                                  │   sec/op    │   sec/op     vs base               │
AppendFloat/Decimal-32              22.24n ± 0%   22.20n ± 0%       ~ (p=0.143 n=20)
AppendFloat/Float-32                34.70n ± 0%   34.59n ± 0%       ~ (p=0.057 n=20)
AppendFloat/Exp-32                  34.85n ± 0%   34.77n ± 0%       ~ (p=0.051 n=20)
AppendFloat/NegExp-32               35.19n ± 0%   35.17n ± 0%       ~ (p=0.533 n=20)
AppendFloat/LongExp-32              36.80n ± 0%   36.78n ± 0%       ~ (p=0.941 n=20)
AppendFloat/Big-32                  38.63n ± 0%   38.71n ± 0%       ~ (p=0.143 n=20)
AppendFloat/BinaryExp-32            17.39n ± 0%   17.36n ± 0%       ~ (p=0.049 n=20)
AppendFloat/32Integer-32            22.29n ± 0%   22.23n ± 0%       ~ (p=0.068 n=20)
AppendFloat/32ExactFraction-32      33.52n ± 0%   33.44n ± 0%       ~ (p=0.058 n=20)
AppendFloat/32Point-32              32.96n ± 0%   32.93n ± 0%       ~ (p=0.256 n=20)
AppendFloat/32Exp-32                35.31n ± 0%   35.24n ± 0%       ~ (p=0.014 n=20)
AppendFloat/32NegExp-32             33.56n ± 0%   33.50n ± 0%       ~ (p=0.032 n=20)
AppendFloat/32Shortest-32           30.03n ± 0%   29.97n ± 0%       ~ (p=0.351 n=20)
AppendFloat/32Fixed8Hard-32         22.93n ± 0%   22.94n ± 0%       ~ (p=0.920 n=20)
AppendFloat/32Fixed9Hard-32         26.27n ± 0%   26.22n ± 0%       ~ (p=0.693 n=20)
AppendFloat/64Fixed1-32             21.15n ± 0%   21.09n ± 0%       ~ (p=0.006 n=20)
AppendFloat/64Fixed2-32             20.75n ± 0%   20.75n ± 0%       ~ (p=0.898 n=20)
AppendFloat/64Fixed3-32             21.29n ± 0%   21.28n ± 0%       ~ (p=0.524 n=20)
AppendFloat/64Fixed4-32             20.59n ± 0%   20.63n ± 0%       ~ (p=0.350 n=20)
AppendFloat/64Fixed12-32            32.53n ± 0%   32.51n ± 0%       ~ (p=0.804 n=20)
AppendFloat/64Fixed16-32            28.43n ± 0%   28.34n ± 0%       ~ (p=0.060 n=20)
AppendFloat/64Fixed12Hard-32        27.27n ± 0%   27.22n ± 0%       ~ (p=0.056 n=20)
AppendFloat/64Fixed17Hard-32        30.76n ± 0%   30.70n ± 0%       ~ (p=0.304 n=20)
AppendFloat/64Fixed18Hard-32        1.812µ ± 0%   1.806µ ± 1%       ~ (p=0.216 n=20)
AppendFloat/Slowpath64-32           35.48n ± 0%   35.43n ± 0%       ~ (p=0.129 n=20)
AppendFloat/SlowpathDenormal64-32   35.62n ± 0%   35.60n ± 0%       ~ (p=0.273 n=20)
AppendInt-32                        387.5n ± 0%   387.9n ± 0%       ~ (p=0.417 n=20)
AppendUint-32                       104.9n ± 0%   104.9n ± 0%       ~ (p=0.641 n=20)
AppendIntSmall-32                   3.329n ± 5%   3.489n ± 5%       ~ (p=0.723 n=20)
AppendUintVarlen/digits=1-32        2.396n ± 7%   2.398n ± 8%       ~ (p=0.409 n=20)
AppendUintVarlen/digits=2-32        2.399n ± 7%   2.315n ± 4%       ~ (p=0.386 n=20)
AppendUintVarlen/digits=3-32        5.516n ± 0%   5.506n ± 0%       ~ (p=0.026 n=20)
AppendUintVarlen/digits=4-32        5.515n ± 0%   5.510n ± 0%       ~ (p=0.291 n=20)
AppendUintVarlen/digits=5-32        5.748n ± 1%   5.731n ± 0%       ~ (p=0.014 n=20)
AppendUintVarlen/digits=6-32        5.873n ± 0%   5.853n ± 0%       ~ (p=0.011 n=20)
AppendUintVarlen/digits=7-32        6.460n ± 0%   6.444n ± 0%       ~ (p=0.041 n=20)
AppendUintVarlen/digits=8-32        6.457n ± 0%   6.446n ± 0%       ~ (p=0.047 n=20)
AppendUintVarlen/digits=9-32        7.377n ± 0%   7.373n ± 0%       ~ (p=0.569 n=20)
AppendUintVarlen/digits=10-32       8.496n ± 0%   8.492n ± 0%       ~ (p=0.525 n=20)
AppendUintVarlen/digits=11-32       8.674n ± 0%   8.653n ± 0%       ~ (p=0.060 n=20)
AppendUintVarlen/digits=12-32       9.416n ± 0%   9.386n ± 0%       ~ (p=0.035 n=20)
AppendUintVarlen/digits=13-32       9.621n ± 0%   9.594n ± 0%       ~ (p=0.069 n=20)
AppendUintVarlen/digits=14-32       10.34n ± 0%   10.35n ± 0%       ~ (p=0.332 n=20)
AppendUintVarlen/digits=15-32       10.48n ± 0%   10.52n ± 0%       ~ (p=0.196 n=20)
AppendUintVarlen/digits=16-32       11.27n ± 0%   11.27n ± 0%       ~ (p=0.445 n=20)
AppendUintVarlen/digits=17-32       11.50n ± 0%   11.48n ± 0%       ~ (p=0.283 n=20)
AppendUintVarlen/digits=18-32       12.15n ± 0%   12.18n ± 0%       ~ (p=0.414 n=20)
AppendUintVarlen/digits=19-32       13.42n ± 0%   13.42n ± 0%       ~ (p=0.782 n=20)
AppendUintVarlen/digits=20-32       13.70n ± 0%   13.71n ± 0%       ~ (p=0.540 n=20)
geomean                             18.71n        18.70n       -0.08%

host: s7:GOARCH=386
                                  │ e8ffcf3095c  │            3c5742b667a             │
                                  │    sec/op    │   sec/op     vs base               │
AppendFloat/Decimal-32               42.03n ± 0%   41.71n ± 1%  -0.75% (p=0.000 n=20)
AppendFloat/Float-32                 69.99n ± 0%   69.85n ± 0%       ~ (p=0.062 n=20)
AppendFloat/Exp-32                   73.07n ± 0%   73.13n ± 0%       ~ (p=0.240 n=20)
AppendFloat/NegExp-32                72.75n ± 0%   72.72n ± 0%       ~ (p=0.298 n=20)
AppendFloat/LongExp-32               74.20n ± 0%   74.13n ± 0%       ~ (p=0.952 n=20)
AppendFloat/Big-32                   82.88n ± 0%   82.91n ± 0%       ~ (p=0.909 n=20)
AppendFloat/BinaryExp-32             32.12n ± 0%   32.23n ± 0%       ~ (p=0.014 n=20)
AppendFloat/32Integer-32             41.18n ± 0%   41.15n ± 0%       ~ (p=0.212 n=20)
AppendFloat/32ExactFraction-32       62.65n ± 0%   62.38n ± 0%  -0.43% (p=0.000 n=20)
AppendFloat/32Point-32               59.98n ± 0%   59.82n ± 0%       ~ (p=0.109 n=20)
AppendFloat/32Exp-32                 72.62n ± 0%   72.69n ± 0%       ~ (p=0.417 n=20)
AppendFloat/32NegExp-32              62.36n ± 0%   62.49n ± 0%       ~ (p=0.578 n=20)
AppendFloat/32Shortest-32            58.03n ± 0%   57.95n ± 0%       ~ (p=0.615 n=20)
AppendFloat/32Fixed8Hard-32          36.75n ± 0%   36.34n ± 0%  -1.12% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32          53.83n ± 0%   53.96n ± 0%       ~ (p=0.126 n=20)
AppendFloat/64Fixed1-32              48.52n ± 0%   48.72n ± 0%       ~ (p=0.075 n=20)
AppendFloat/64Fixed2-32              46.50n ± 0%   46.74n ± 0%  +0.53% (p=0.000 n=20)
AppendFloat/64Fixed3-32              46.43n ± 0%   46.24n ± 1%       ~ (p=0.147 n=20)
AppendFloat/64Fixed4-32              42.52n ± 0%   42.69n ± 0%       ~ (p=0.038 n=20)
AppendFloat/64Fixed12-32             54.51n ± 1%   54.85n ± 0%  +0.63% (p=0.000 n=20)
AppendFloat/64Fixed16-32             52.61n ± 0%   52.94n ± 0%  +0.63% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32         50.78n ± 0%   51.45n ± 0%  +1.31% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-32         59.91n ± 0%   59.91n ± 0%       ~ (p=0.713 n=20)
AppendFloat/64Fixed18Hard-32         3.984µ ± 1%   4.000µ ± 1%       ~ (p=0.569 n=20)
AppendFloat/Slowpath64-32            73.42n ± 0%   73.37n ± 0%       ~ (p=0.425 n=20)
AppendFloat/SlowpathDenormal64-32    73.27n ± 0%   73.36n ± 0%       ~ (p=0.298 n=20)
AppendInt-32                         1.036µ ± 1%   1.041µ ± 1%       ~ (p=0.024 n=20)
AppendUint-32                        261.1n ± 0%   262.1n ± 1%  +0.42% (p=0.000 n=20)
AppendIntSmall-32                    5.515n ± 0%   5.506n ± 0%       ~ (p=0.394 n=20)
AppendUintVarlen/digits=1-32         3.679n ± 0%   3.685n ± 0%       ~ (p=0.815 n=20)
AppendUintVarlen/digits=2-32         3.683n ± 0%   3.687n ± 0%       ~ (p=0.899 n=20)
AppendUintVarlen/digits=3-32         8.988n ± 2%   8.843n ± 1%  -1.61% (p=0.000 n=20)
AppendUintVarlen/digits=4-32         9.300n ± 1%   9.106n ± 1%  -2.09% (p=0.000 n=20)
AppendUintVarlen/digits=5-32        10.235n ± 1%   9.829n ± 1%  -3.97% (p=0.000 n=20)
AppendUintVarlen/digits=6-32         10.44n ± 1%   10.03n ± 0%  -3.88% (p=0.000 n=20)
AppendUintVarlen/digits=7-32         11.27n ± 1%   10.94n ± 1%  -2.93% (p=0.000 n=20)
AppendUintVarlen/digits=8-32         11.46n ± 1%   11.13n ± 1%  -2.88% (p=0.000 n=20)
AppendUintVarlen/digits=9-32         12.74n ± 0%   12.27n ± 1%  -3.69% (p=0.000 n=20)
AppendUintVarlen/digits=10-32        15.57n ± 0%   15.71n ± 1%  +0.90% (p=0.000 n=20)
AppendUintVarlen/digits=11-32        15.65n ± 1%   15.98n ± 0%  +2.11% (p=0.000 n=20)
AppendUintVarlen/digits=12-32        16.54n ± 0%   16.73n ± 0%  +1.21% (p=0.000 n=20)
AppendUintVarlen/digits=13-32        16.69n ± 0%   17.04n ± 0%  +2.07% (p=0.000 n=20)
AppendUintVarlen/digits=14-32        17.52n ± 1%   17.62n ± 0%       ~ (p=0.005 n=20)
AppendUintVarlen/digits=15-32        17.76n ± 0%   17.86n ± 0%  +0.56% (p=0.001 n=20)
AppendUintVarlen/digits=16-32        18.68n ± 1%   18.59n ± 0%       ~ (p=0.006 n=20)
AppendUintVarlen/digits=17-32        18.78n ± 1%   18.74n ± 0%       ~ (p=0.040 n=20)
AppendUintVarlen/digits=18-32        19.63n ± 1%   19.54n ± 0%       ~ (p=0.202 n=20)
AppendUintVarlen/digits=19-32        24.02n ± 1%   24.23n ± 1%  +0.87% (p=0.000 n=20)
AppendUintVarlen/digits=20-32        23.87n ± 1%   24.20n ± 1%  +1.40% (p=0.000 n=20)
geomean                              35.08n        35.00n       -0.21%

host: linux-amd64
                                  │ e8ffcf3095c │            3c5742b667a             │
                                  │   sec/op    │   sec/op     vs base               │
AppendFloat/Decimal-16              62.16n ± 1%   61.81n ± 0%       ~ (p=0.060 n=20)
AppendFloat/Float-16                91.66n ± 1%   91.22n ± 1%       ~ (p=0.606 n=20)
AppendFloat/Exp-16                  95.42n ± 2%   95.22n ± 1%       ~ (p=0.417 n=20)
AppendFloat/NegExp-16               96.88n ± 1%   95.89n ± 1%       ~ (p=0.185 n=20)
AppendFloat/LongExp-16              102.0n ± 1%   101.6n ± 1%       ~ (p=0.794 n=20)
AppendFloat/Big-16                  110.8n ± 1%   110.8n ± 1%       ~ (p=0.984 n=20)
AppendFloat/BinaryExp-16            46.33n ± 1%   46.19n ± 1%       ~ (p=0.723 n=20)
AppendFloat/32Integer-16            62.62n ± 2%   61.88n ± 1%       ~ (p=0.286 n=20)
AppendFloat/32ExactFraction-16      89.09n ± 3%   88.25n ± 2%       ~ (p=0.101 n=20)
AppendFloat/32Point-16              86.17n ± 1%   86.90n ± 1%       ~ (p=0.251 n=20)
AppendFloat/32Exp-16                101.7n ± 1%   101.2n ± 1%       ~ (p=0.379 n=20)
AppendFloat/32NegExp-16             92.17n ± 1%   92.81n ± 1%       ~ (p=0.165 n=20)
AppendFloat/32Shortest-16           82.38n ± 1%   82.89n ± 1%       ~ (p=0.185 n=20)
AppendFloat/32Fixed8Hard-16         60.32n ± 1%   60.41n ± 1%       ~ (p=0.337 n=20)
AppendFloat/32Fixed9Hard-16         69.58n ± 1%   69.22n ± 1%       ~ (p=0.794 n=20)
AppendFloat/64Fixed1-16             57.56n ± 1%   57.65n ± 1%       ~ (p=0.597 n=20)
AppendFloat/64Fixed2-16             55.84n ± 1%   55.54n ± 1%       ~ (p=0.059 n=20)
AppendFloat/64Fixed3-16             57.12n ± 2%   56.55n ± 1%       ~ (p=0.449 n=20)
AppendFloat/64Fixed4-16             55.38n ± 2%   54.50n ± 2%       ~ (p=0.010 n=20)
AppendFloat/64Fixed12-16            88.28n ± 1%   86.67n ± 1%  -1.83% (p=0.000 n=20)
AppendFloat/64Fixed16-16            73.41n ± 1%   73.56n ± 2%       ~ (p=0.794 n=20)
AppendFloat/64Fixed12Hard-16        72.08n ± 1%   71.71n ± 1%       ~ (p=0.116 n=20)
AppendFloat/64Fixed17Hard-16        79.78n ± 1%   79.78n ± 1%       ~ (p=0.224 n=20)
AppendFloat/64Fixed18Hard-16        4.720µ ± 1%   4.746µ ± 1%       ~ (p=0.877 n=20)
AppendFloat/Slowpath64-16           98.71n ± 1%   98.79n ± 1%       ~ (p=0.678 n=20)
AppendFloat/SlowpathDenormal64-16   97.58n ± 2%   98.16n ± 2%       ~ (p=0.862 n=20)
AppendInt-16                        1.185µ ± 1%   1.194µ ± 1%       ~ (p=0.457 n=20)
AppendUint-16                       292.6n ± 1%   293.9n ± 1%       ~ (p=0.351 n=20)
AppendIntSmall-16                   7.197n ± 1%   7.203n ± 1%       ~ (p=0.952 n=20)
AppendUintVarlen/digits=1-16        5.905n ± 1%   5.897n ± 2%       ~ (p=0.474 n=20)
AppendUintVarlen/digits=2-16        5.770n ± 1%   5.733n ± 1%       ~ (p=0.180 n=20)
AppendUintVarlen/digits=3-16        14.00n ± 2%   14.00n ± 1%       ~ (p=0.909 n=20)
AppendUintVarlen/digits=4-16        13.99n ± 1%   14.03n ± 1%       ~ (p=0.578 n=20)
AppendUintVarlen/digits=5-16        15.88n ± 2%   15.83n ± 1%       ~ (p=0.542 n=20)
AppendUintVarlen/digits=6-16        16.25n ± 1%   16.30n ± 1%       ~ (p=0.899 n=20)
AppendUintVarlen/digits=7-16        17.66n ± 1%   17.70n ± 1%       ~ (p=0.440 n=20)
AppendUintVarlen/digits=8-16        18.54n ± 1%   18.58n ± 2%       ~ (p=0.262 n=20)
AppendUintVarlen/digits=9-16        20.25n ± 2%   20.43n ± 1%       ~ (p=0.007 n=20)
AppendUintVarlen/digits=10-16       23.88n ± 1%   24.00n ± 1%       ~ (p=0.473 n=20)
AppendUintVarlen/digits=11-16       24.42n ± 1%   24.57n ± 2%       ~ (p=0.743 n=20)
AppendUintVarlen/digits=12-16       25.54n ± 2%   25.88n ± 1%       ~ (p=0.499 n=20)
AppendUintVarlen/digits=13-16       26.55n ± 1%   26.60n ± 1%       ~ (p=0.774 n=20)
AppendUintVarlen/digits=14-16       28.18n ± 1%   28.16n ± 1%       ~ (p=0.606 n=20)
AppendUintVarlen/digits=15-16       28.66n ± 1%   28.85n ± 0%       ~ (p=0.031 n=20)
AppendUintVarlen/digits=16-16       29.84n ± 1%   30.00n ± 1%       ~ (p=0.151 n=20)
AppendUintVarlen/digits=17-16       30.54n ± 1%   30.60n ± 1%       ~ (p=0.952 n=20)
AppendUintVarlen/digits=18-16       32.01n ± 1%   32.26n ± 2%       ~ (p=0.239 n=20)
AppendUintVarlen/digits=19-16       35.50n ± 1%   35.56n ± 1%       ~ (p=0.256 n=20)
AppendUintVarlen/digits=20-16       35.94n ± 1%   35.96n ± 1%       ~ (p=0.867 n=20)
geomean                             50.35n        50.35n       -0.01%

host: linux-386
                                  │ e8ffcf3095c │             3c5742b667a             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-16              122.9n ± 0%   122.9n ± 0%        ~ (p=0.351 n=20)
AppendFloat/Float-16                226.7n ± 0%   227.1n ± 0%        ~ (p=0.003 n=20)
AppendFloat/Exp-16                  238.3n ± 0%   238.3n ± 0%        ~ (p=0.301 n=20)
AppendFloat/NegExp-16               239.2n ± 0%   239.5n ± 0%        ~ (p=0.015 n=20)
AppendFloat/LongExp-16              237.6n ± 0%   237.8n ± 0%        ~ (p=0.357 n=20)
AppendFloat/Big-16                  275.6n ± 0%   275.7n ± 0%        ~ (p=0.576 n=20)
AppendFloat/BinaryExp-16            90.33n ± 0%   87.69n ± 0%   -2.92% (p=0.000 n=20)
AppendFloat/32Integer-16            121.7n ± 0%   121.8n ± 0%        ~ (p=0.340 n=20)
AppendFloat/32ExactFraction-16      209.4n ± 0%   209.4n ± 0%        ~ (p=0.566 n=20)
AppendFloat/32Point-16              196.1n ± 0%   196.5n ± 0%        ~ (p=0.036 n=20)
AppendFloat/32Exp-16                246.3n ± 0%   246.4n ± 0%        ~ (p=0.407 n=20)
AppendFloat/32NegExp-16             205.4n ± 0%   205.4n ± 0%        ~ (p=0.221 n=20)
AppendFloat/32Shortest-16           187.8n ± 0%   187.8n ± 0%        ~ (p=0.671 n=20)
AppendFloat/32Fixed8Hard-16         112.3n ± 0%   107.6n ± 0%   -4.19% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-16         182.3n ± 0%   182.1n ± 0%        ~ (p=0.098 n=20)
AppendFloat/64Fixed1-16             167.0n ± 0%   165.7n ± 0%   -0.78% (p=0.000 n=20)
AppendFloat/64Fixed2-16             161.9n ± 0%   160.7n ± 0%   -0.74% (p=0.000 n=20)
AppendFloat/64Fixed3-16             157.7n ± 0%   156.4n ± 0%   -0.82% (p=0.000 n=20)
AppendFloat/64Fixed4-16             134.4n ± 0%   133.5n ± 0%   -0.63% (p=0.000 n=20)
AppendFloat/64Fixed12-16            178.2n ± 0%   176.4n ± 0%   -1.01% (p=0.000 n=20)
AppendFloat/64Fixed16-16            168.7n ± 0%   166.4n ± 0%   -1.33% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16        162.8n ± 0%   162.2n ± 0%   -0.34% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16        201.9n ± 0%   198.6n ± 0%   -1.66% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-16        14.45µ ± 0%   14.46µ ± 0%        ~ (p=0.189 n=20)
AppendFloat/Slowpath64-16           232.8n ± 0%   232.8n ± 0%        ~ (p=0.995 n=20)
AppendFloat/SlowpathDenormal64-16   229.0n ± 0%   228.9n ± 0%        ~ (p=0.605 n=20)
AppendInt-16                        3.123µ ± 0%   3.099µ ± 0%   -0.77% (p=0.000 n=20)
AppendUint-16                       832.0n ± 0%   820.8n ± 0%   -1.35% (p=0.000 n=20)
AppendIntSmall-16                   13.27n ± 0%   13.28n ± 0%        ~ (p=0.127 n=20)
AppendUintVarlen/digits=1-16        10.19n ± 0%   10.20n ± 0%        ~ (p=0.414 n=20)
AppendUintVarlen/digits=2-16        10.19n ± 0%   10.19n ± 0%        ~ (p=0.700 n=20)
AppendUintVarlen/digits=3-16        21.47n ± 0%   20.51n ± 0%   -4.49% (p=0.000 n=20)
AppendUintVarlen/digits=4-16        21.68n ± 0%   21.05n ± 0%   -2.88% (p=0.000 n=20)
AppendUintVarlen/digits=5-16        26.28n ± 0%   23.94n ± 0%   -8.92% (p=0.000 n=20)
AppendUintVarlen/digits=6-16        26.77n ± 0%   24.67n ± 0%   -7.84% (p=0.000 n=20)
AppendUintVarlen/digits=7-16        30.65n ± 0%   27.25n ± 0%  -11.08% (p=0.000 n=20)
AppendUintVarlen/digits=8-16        31.27n ± 0%   28.11n ± 0%  -10.09% (p=0.000 n=20)
AppendUintVarlen/digits=9-16        36.55n ± 0%   32.53n ± 0%  -10.98% (p=0.000 n=20)
AppendUintVarlen/digits=10-16       42.53n ± 0%   42.06n ± 0%   -1.09% (p=0.000 n=20)
AppendUintVarlen/digits=11-16       42.21n ± 0%   42.51n ± 0%   +0.72% (p=0.000 n=20)
AppendUintVarlen/digits=12-16       45.44n ± 0%   44.41n ± 0%   -2.26% (p=0.000 n=20)
AppendUintVarlen/digits=13-16       46.04n ± 0%   45.06n ± 0%   -2.13% (p=0.000 n=20)
AppendUintVarlen/digits=14-16       49.35n ± 0%   47.73n ± 0%   -3.27% (p=0.000 n=20)
AppendUintVarlen/digits=15-16       50.04n ± 0%   48.45n ± 0%   -3.18% (p=0.000 n=20)
AppendUintVarlen/digits=16-16       53.51n ± 0%   51.21n ± 0%   -4.31% (p=0.000 n=20)
AppendUintVarlen/digits=17-16       53.60n ± 0%   51.19n ± 0%   -4.50% (p=0.000 n=20)
AppendUintVarlen/digits=18-16       57.21n ± 0%   54.03n ± 0%   -5.54% (p=0.000 n=20)
AppendUintVarlen/digits=19-16       64.41n ± 0%   64.68n ± 0%   +0.41% (p=0.000 n=20)
AppendUintVarlen/digits=20-16       64.51n ± 0%   65.04n ± 0%   +0.82% (p=0.000 n=20)
geomean                             104.9n        102.8n        -2.02%

host: linux-ppc64le
                                 │ e8ffcf3095c │             3c5742b667a             │
                                 │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-5              53.78n ± 1%   53.62n ± 1%        ~ (p=0.588 n=20)
AppendFloat/Float-5                76.29n ± 2%   75.77n ± 2%        ~ (p=0.402 n=20)
AppendFloat/Exp-5                  83.73n ± 0%   82.25n ± 2%        ~ (p=0.009 n=20)
AppendFloat/NegExp-5               84.22n ± 1%   83.15n ± 1%        ~ (p=0.031 n=20)
AppendFloat/LongExp-5              88.66n ± 1%   88.34n ± 1%        ~ (p=0.213 n=20)
AppendFloat/Big-5                  93.25n ± 0%   92.73n ± 1%        ~ (p=0.007 n=20)
AppendFloat/BinaryExp-5            40.69n ± 0%   44.88n ± 1%  +10.31% (p=0.000 n=20)
AppendFloat/32Integer-5            54.17n ± 0%   54.39n ± 1%        ~ (p=0.245 n=20)
AppendFloat/32ExactFraction-5      79.89n ± 1%   79.41n ± 1%        ~ (p=0.457 n=20)
AppendFloat/32Point-5              74.81n ± 1%   74.89n ± 0%        ~ (p=0.625 n=20)
AppendFloat/32Exp-5                85.59n ± 1%   85.42n ± 1%        ~ (p=0.331 n=20)
AppendFloat/32NegExp-5             82.74n ± 1%   82.22n ± 1%        ~ (p=0.025 n=20)
AppendFloat/32Shortest-5           69.72n ± 0%   69.79n ± 1%        ~ (p=0.625 n=20)
AppendFloat/32Fixed8Hard-5         54.60n ± 0%   57.54n ± 0%   +5.39% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-5         58.49n ± 1%   65.56n ± 0%  +12.09% (p=0.000 n=20)
AppendFloat/64Fixed1-5             47.07n ± 1%   47.81n ± 1%        ~ (p=0.002 n=20)
AppendFloat/64Fixed2-5             47.88n ± 1%   48.52n ± 1%        ~ (p=0.007 n=20)
AppendFloat/64Fixed3-5             47.47n ± 1%   48.32n ± 1%        ~ (p=0.007 n=20)
AppendFloat/64Fixed4-5             45.12n ± 1%   46.26n ± 1%   +2.55% (p=0.000 n=20)
AppendFloat/64Fixed12-5            79.27n ± 0%   83.78n ± 1%   +5.68% (p=0.000 n=20)
AppendFloat/64Fixed16-5            69.28n ± 1%   70.69n ± 0%        ~ (p=0.001 n=20)
AppendFloat/64Fixed12Hard-5        65.42n ± 0%   68.31n ± 1%   +4.43% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-5        73.89n ± 0%   77.30n ± 1%   +4.63% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-5        5.163µ ± 1%   5.169µ ± 0%        ~ (p=0.733 n=20)
AppendFloat/Slowpath64-5           86.31n ± 1%   86.70n ± 2%        ~ (p=0.578 n=20)
AppendFloat/SlowpathDenormal64-5   83.10n ± 1%   82.97n ± 1%        ~ (p=0.218 n=20)
AppendInt-5                        1.013µ ± 1%   1.125µ ± 0%  +11.06% (p=0.000 n=20)
AppendUint-5                       268.2n ± 0%   287.7n ± 1%   +7.27% (p=0.000 n=20)
AppendIntSmall-5                   6.644n ± 3%   6.936n ± 2%        ~ (p=0.023 n=20)
AppendUintVarlen/digits=1-5        5.556n ± 4%   5.381n ± 3%        ~ (p=0.108 n=20)
AppendUintVarlen/digits=2-5        5.165n ± 1%   5.413n ± 1%   +4.80% (p=0.000 n=20)
AppendUintVarlen/digits=3-5        10.26n ± 1%   11.29n ± 2%  +10.04% (p=0.000 n=20)
AppendUintVarlen/digits=4-5        10.11n ± 1%   11.12n ± 1%   +9.99% (p=0.000 n=20)
AppendUintVarlen/digits=5-5        12.37n ± 2%   13.26n ± 1%   +7.19% (p=0.000 n=20)
AppendUintVarlen/digits=6-5        12.85n ± 3%   13.52n ± 1%   +5.25% (p=0.000 n=20)
AppendUintVarlen/digits=7-5        15.46n ± 7%   16.52n ± 1%        ~ (p=0.005 n=20)
AppendUintVarlen/digits=8-5        14.99n ± 1%   16.60n ± 1%  +10.74% (p=0.000 n=20)
AppendUintVarlen/digits=9-5        18.13n ± 1%   20.23n ± 1%  +11.56% (p=0.000 n=20)
AppendUintVarlen/digits=10-5       19.05n ± 3%   25.62n ± 1%  +34.49% (p=0.000 n=20)
AppendUintVarlen/digits=11-5       21.64n ± 2%   26.11n ± 1%  +20.63% (p=0.000 n=20)
AppendUintVarlen/digits=12-5       21.91n ± 1%   27.91n ± 0%  +27.38% (p=0.000 n=20)
AppendUintVarlen/digits=13-5       24.60n ± 1%   28.38n ± 1%  +15.37% (p=0.000 n=20)
AppendUintVarlen/digits=14-5       25.80n ± 1%   30.79n ± 0%  +19.32% (p=0.000 n=20)
AppendUintVarlen/digits=15-5       28.90n ± 1%   31.38n ± 3%   +8.55% (p=0.000 n=20)
AppendUintVarlen/digits=16-5       28.13n ± 2%   32.54n ± 0%  +15.69% (p=0.000 n=20)
AppendUintVarlen/digits=17-5       30.82n ± 1%   33.16n ± 1%   +7.61% (p=0.000 n=20)
AppendUintVarlen/digits=18-5       32.03n ± 0%   36.12n ± 1%  +12.74% (p=0.000 n=20)
AppendUintVarlen/digits=19-5       35.99n ± 3%   40.97n ± 1%  +13.82% (p=0.000 n=20)
AppendUintVarlen/digits=20-5       35.15n ± 1%   40.71n ± 0%  +15.80% (p=0.000 n=20)
geomean                            44.34n        47.15n        +6.34%

Change-Id: Ia6a3971a76f39d6187b10d6944071ee1c1b47316
Reviewed-on: https://go-review.googlesource.com/c/go/+/716462
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>

reflect: correct internal docs for uncommonType

This updates the doc to reflect the change in CL 19790 from 2016.

Change-Id: I1017babf6660aa3b4929755e2eccbe3168b7860c
Reviewed-on: https://go-review.googlesource.com/c/go/+/714880
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>

cmd/compile/internal/ssa: model right shift more precisely

Prove currently checks for 0 sign bit extraction (x>>63) at the
end of the pass, but it is more general and more useful
(and not really more work) to model right shift during
value range tracking. This handles sign bit extraction (both 0 and -1)
but also makes the value ranges available for proving bounds checks.

'go build -a -gcflags=-d=ssa/prove/debug=1 std'
finds 105 new things to prove.
https://gist.github.com/rsc/8ac41176e53ed9c2f1a664fc668e8336

For example, the compiler now recognizes that this code in
strconv does not need to check the second shift for being ≥ 64.

msb := xHi >> 63
retMantissa := xHi >> (msb + 38)

nor does this code in regexp:

return b < utf8.RuneSelf && specialBytes[b%16]&(1<<(b/16)) != 0

This code in math no longer has a bounds check on the first index:

if 0 <= n && n <= 308 {
return pow10postab32[uint(n)/32] * pow10tab[uint(n)%32]
}

The diff shows one "lost" proof in ycbcr.go but it's not really lost:
the expression was folded to a constant instead, and that only shows
up with debug=2. A diff of that output is at
https://gist.github.com/rsc/9139ed46c6019ae007f5a1ba4bb3250f

Change-Id: I84087311e0a303f00e2820d957a6f8b29ee22519
Reviewed-on: https://go-review.googlesource.com/c/go/+/716140
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: David Chase <drchase@google.com>

go/token: fix a typo in a comment

Fixes #75632

Change-Id: I71f891eb837147b6ff818ec4b2133c8c07091931
GitHub-Last-Rev: 3eeeea2dc28ef13eaef0fee7abf00ad418218f83
GitHub-Pull-Request: golang/go#76117
Reviewed-on: https://go-review.googlesource.com/c/go/+/716440
Reviewed-by: t hepudds <thepudds1460@gmail.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: t hepudds <thepudds1460@gmail.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Alan Donovan <adonovan@google.com>

strconv: remove hand-written divide on 32-bit systems

The compiler now generates code that is just as good.

host: s7:GOARCH=386
goos: linux
goarch: 386
pkg: strconv
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                 │ 8d2b4ce71b3 │            d5524a1f38c             │
                                 │   sec/op    │   sec/op     vs base               │
AppendFloat/Decimal-4              43.91n ± 2%   44.23n ± 1%       ~ (p=0.654 n=20)
AppendFloat/Float-4                73.73n ± 0%   74.17n ± 1%       ~ (p=0.062 n=20)
AppendFloat/Exp-4                  77.33n ± 1%   77.11n ± 1%       ~ (p=0.234 n=20)
AppendFloat/NegExp-4               77.56n ± 1%   77.00n ± 1%       ~ (p=0.136 n=20)
AppendFloat/LongExp-4              79.01n ± 1%   79.62n ± 1%       ~ (p=0.213 n=20)
AppendFloat/Big-4                  88.02n ± 2%   88.88n ± 1%       ~ (p=0.159 n=20)
AppendFloat/BinaryExp-4            34.43n ± 1%   34.58n ± 1%       ~ (p=0.683 n=20)
AppendFloat/32Integer-4            44.05n ± 2%   43.74n ± 1%       ~ (p=0.055 n=20)
AppendFloat/32ExactFraction-4      66.55n ± 1%   66.62n ± 1%       ~ (p=0.753 n=20)
AppendFloat/32Point-4              64.01n ± 1%   63.39n ± 1%       ~ (p=0.032 n=20)
AppendFloat/32Exp-4                77.05n ± 1%   77.84n ± 1%       ~ (p=0.055 n=20)
AppendFloat/32NegExp-4             66.96n ± 1%   67.41n ± 1%       ~ (p=0.569 n=20)
AppendFloat/32Shortest-4           61.73n ± 1%   62.00n ± 1%       ~ (p=0.457 n=20)
AppendFloat/32Fixed8Hard-4         39.09n ± 1%   39.06n ± 1%       ~ (p=0.588 n=20)
AppendFloat/32Fixed9Hard-4         57.66n ± 0%   57.39n ± 1%       ~ (p=0.167 n=20)
AppendFloat/64Fixed1-4             52.52n ± 1%   52.45n ± 1%       ~ (p=0.867 n=20)
AppendFloat/64Fixed2-4             50.64n ± 1%   50.12n ± 4%       ~ (p=0.208 n=20)
AppendFloat/64Fixed3-4             49.54n ± 1%   50.84n ± 1%  +2.62% (p=0.000 n=20)
AppendFloat/64Fixed4-4             45.60n ± 1%   45.25n ± 1%       ~ (p=0.034 n=20)
AppendFloat/64Fixed12-4            57.70n ± 1%   57.70n ± 1%       ~ (p=0.394 n=20)
AppendFloat/64Fixed16-4            56.49n ± 1%   56.15n ± 1%       ~ (p=0.044 n=20)
AppendFloat/64Fixed12Hard-4        53.99n ± 1%   53.79n ± 1%       ~ (p=0.358 n=20)
AppendFloat/64Fixed17Hard-4        64.51n ± 1%   63.18n ± 1%  -2.06% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-4        4.281µ ± 1%   4.294µ ± 1%       ~ (p=0.995 n=20)
AppendFloat/Slowpath64-4           77.94n ± 1%   78.66n ± 2%       ~ (p=0.136 n=20)
AppendFloat/SlowpathDenormal64-4   77.64n ± 1%   77.96n ± 1%       ~ (p=0.229 n=20)
AppendInt-4                        1.122µ ± 1%   1.116µ ± 1%       ~ (p=0.115 n=20)
AppendUint-4                       287.9n ± 1%   286.9n ± 1%       ~ (p=0.185 n=20)
AppendIntSmall-4                   5.845n ± 1%   5.819n ± 1%       ~ (p=0.516 n=20)
AppendUintVarlen/digits=1-4        3.924n ± 1%   3.905n ± 1%       ~ (p=0.317 n=20)
AppendUintVarlen/digits=2-4        3.909n ± 1%   3.940n ± 1%       ~ (p=0.995 n=20)
AppendUintVarlen/digits=3-4        9.543n ± 1%   9.567n ± 2%       ~ (p=0.606 n=20)
AppendUintVarlen/digits=4-4        9.710n ± 1%   9.748n ± 1%       ~ (p=0.602 n=20)
AppendUintVarlen/digits=5-4        10.84n ± 1%   10.88n ± 2%       ~ (p=0.425 n=20)
AppendUintVarlen/digits=6-4        11.06n ± 1%   11.06n ± 1%       ~ (p=0.506 n=20)
AppendUintVarlen/digits=7-4        11.97n ± 1%   12.05n ± 1%       ~ (p=0.218 n=20)
AppendUintVarlen/digits=8-4        12.27n ± 1%   12.32n ± 2%       ~ (p=0.358 n=20)
AppendUintVarlen/digits=9-4        13.57n ± 1%   13.57n ± 1%       ~ (p=0.952 n=20)
AppendUintVarlen/digits=10-4       16.88n ± 1%   16.52n ± 1%  -2.13% (p=0.000 n=20)
AppendUintVarlen/digits=11-4       16.83n ± 1%   16.72n ± 1%       ~ (p=0.012 n=20)
AppendUintVarlen/digits=12-4       17.93n ± 1%   17.63n ± 1%  -1.65% (p=0.000 n=20)
AppendUintVarlen/digits=13-4       18.38n ± 2%   17.80n ± 1%  -3.16% (p=0.000 n=20)
AppendUintVarlen/digits=14-4       19.20n ± 1%   18.65n ± 1%  -2.89% (p=0.000 n=20)
AppendUintVarlen/digits=15-4       19.41n ± 1%   18.85n ± 1%  -2.86% (p=0.000 n=20)
AppendUintVarlen/digits=16-4       20.33n ± 1%   19.79n ± 1%  -2.63% (p=0.000 n=20)
AppendUintVarlen/digits=17-4       20.32n ± 2%   19.79n ± 0%  -2.61% (p=0.000 n=20)
AppendUintVarlen/digits=18-4       21.09n ± 1%   20.84n ± 1%  -1.16% (p=0.000 n=20)
AppendUintVarlen/digits=19-4       25.68n ± 1%   25.24n ± 0%  -1.69% (p=0.000 n=20)
AppendUintVarlen/digits=20-4       25.42n ± 1%   25.15n ± 1%  -1.06% (p=0.000 n=20)
geomean                            37.54n        37.39n       -0.40%
%

Change-Id: I0dba26d1f6fbadc2a951dc0bbc8cf30d1391e10f
Reviewed-on: https://go-review.googlesource.com/c/go/+/716062
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>

cmd/compile: implement bits.Mul64 on 32-bit systems

This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.

Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:

(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)

For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:

(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)

The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.

The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.

This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.

pkg: cmd/compile/internal/test

benchmark \ host                      local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                    vs base      vs base  vs base    vs base        vs base
DivconstI64                               ~            ~        ~    -49.66%        -21.02%
ModconstI64                               ~            ~        ~    -13.45%        +14.52%
DivisiblePow2constI64                     ~            ~        ~     +0.97%         -1.32%
DivisibleconstI64                         ~            ~        ~    -20.01%        -48.28%
DivisibleWDivconstI64                     ~            ~   -1.76%    -38.59%        -42.74%
DivconstU64/3                             ~            ~        ~    -13.82%         -4.09%
DivconstU64/5                             ~            ~        ~    -14.10%         -3.54%
DivconstU64/37                       -2.07%       -4.45%        ~    -19.60%         -9.55%
DivconstU64/1234567                       ~            ~        ~    -61.55%        -56.93%
ModconstU64                               ~            ~        ~     -6.25%              ~
DivisibleconstU64                         ~            ~        ~     -2.78%         -7.82%
DivisibleWDivconstU64                     ~            ~        ~     +4.23%         +2.56%

pkg: math/bits

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Add                       ~            ~          ~              ~
Add32                +1.59%            ~          ~              ~
Add64                     ~            ~          ~              ~
Add64multiple             ~            ~          ~              ~
Sub                       ~            ~          ~              ~
Sub32                     ~            ~          ~              ~
Sub64                     ~            ~     -9.20%              ~
Sub64multiple             ~            ~          ~              ~
Mul                       ~            ~          ~              ~
Mul32                     ~            ~          ~              ~
Mul64                     ~            ~    -41.58%        -53.21%
Div                       ~            ~          ~              ~
Div32                     ~            ~          ~              ~
Div64                     ~            ~          ~              ~

pkg: strconv

benchmark \ host                       s7  linux-amd64  linux-386  s7:GOARCH=386
                                  vs base      vs base    vs base        vs base
ParseInt/Pos/7bit                       ~            ~    -11.08%         -6.75%
ParseInt/Pos/26bit                      ~            ~    -13.65%        -11.02%
ParseInt/Pos/31bit                      ~            ~    -14.65%         -9.71%
ParseInt/Pos/56bit                 -1.80%            ~    -17.97%        -10.78%
ParseInt/Pos/63bit                      ~            ~    -13.85%         -9.63%
ParseInt/Neg/7bit                       ~            ~    -12.14%         -7.26%
ParseInt/Neg/26bit                      ~            ~    -14.18%         -9.81%
ParseInt/Neg/31bit                      ~            ~    -14.51%         -9.02%
ParseInt/Neg/56bit                      ~            ~    -15.79%         -9.79%
ParseInt/Neg/63bit                      ~            ~    -15.68%        -11.07%
AppendFloat/Decimal                     ~            ~     -7.25%        -12.26%
AppendFloat/Float                       ~            ~    -15.96%        -19.45%
AppendFloat/Exp                         ~            ~    -13.96%        -17.76%
AppendFloat/NegExp                      ~            ~    -14.89%        -20.27%
AppendFloat/LongExp                     ~            ~    -12.68%        -17.97%
AppendFloat/Big                         ~            ~    -11.10%        -16.64%
AppendFloat/BinaryExp                   ~            ~          ~              ~
AppendFloat/32Integer                   ~            ~    -10.05%        -10.91%
AppendFloat/32ExactFraction             ~            ~     -8.93%        -13.00%
AppendFloat/32Point                     ~            ~    -10.36%        -14.89%
AppendFloat/32Exp                       ~            ~     -9.88%        -13.54%
AppendFloat/32NegExp                    ~            ~    -10.16%        -14.26%
AppendFloat/32Shortest                  ~            ~    -11.39%        -14.96%
AppendFloat/32Fixed8Hard                ~            ~          ~         -2.31%
AppendFloat/32Fixed9Hard                ~            ~          ~         -7.01%
AppendFloat/64Fixed1                    ~            ~     -2.83%         -8.23%
AppendFloat/64Fixed2                    ~            ~          ~         -7.94%
AppendFloat/64Fixed3                    ~            ~     -4.07%         -7.22%
AppendFloat/64Fixed4                    ~            ~     -7.24%         -7.62%
AppendFloat/64Fixed12                   ~            ~     -6.57%         -4.82%
AppendFloat/64Fixed16                   ~            ~     -4.00%         -5.81%
AppendFloat/64Fixed12Hard          -2.22%            ~     -4.07%         -6.35%
AppendFloat/64Fixed17Hard          -2.12%            ~          ~         -3.79%
AppendFloat/64Fixed18Hard          -1.89%            ~     +2.48%              ~
AppendFloat/Slowpath64             -1.85%            ~    -14.49%        -18.21%
AppendFloat/SlowpathDenormal64          ~            ~    -13.08%        -19.41%

pkg: crypto/internal/fips140/nistec/fiat

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Mul/P224                  ~            ~    -29.95%        -39.60%
Mul/P384                  ~            ~    -37.11%        -63.33%
Mul/P521                  ~            ~    -26.62%        -12.42%
Square/P224          +1.46%            ~    -40.62%        -49.18%
Square/P384               ~            ~    -45.51%        -69.68%
Square/P521         +90.37%            ~    -25.26%        -11.23%

(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)

pkg: crypto/internal/fips140/edwards25519

benchmark \ host                    s7  linux-amd64  linux-386  s7:GOARCH=386
                               vs base      vs base    vs base        vs base
EncodingDecoding                     ~            ~    -34.67%        -35.75%
ScalarBaseMult                       ~            ~    -31.25%        -30.29%
ScalarMult                           ~            ~    -33.45%        -32.54%
VarTimeDoubleScalarBaseMult          ~            ~    -33.78%        -33.68%

Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>

crypto/internal/fips140/aes: fix CTR generator

Fixed two issues in AVO based generator of amd64 asm code.

1. Updated golang.org/x/tools dependency to prevent build issue in Go 1.25.

> golang.org/x/tools@v0.24.0/internal/tokeninternal/tokeninternal.go:64:9:
> invalid array length -delta * delta (constant -256 of type int64)

This error was caused by changes in layout of data structures in Go. Package
golang.org/x/tools has a mirror of that struct and a static assert that it
matches the Go's struct.

2. Changed the package name from crypto/aes to crypto/internal/fips140/aes.

This fixed run time error:

> ctr_amd64_asm.go:31: could not find function "ctrBlocks1Asm"
and other errors

Now the following works as expected:

$ cd src/crypto/internal/fips140/aes/_asm/ctr/
$ go generate

The command re-generates file "src/crypto/internal/fips140/aes/ctr_amd64.s".

Fixes #75972

Change-Id: I28e4c9ebb5bf72506a524e36a0c81a1b50367a84
GitHub-Last-Rev: afc9f506e50df6dc25fd285d5a597b0e5c93b5d9
GitHub-Pull-Request: golang/go#75973
Reviewed-on: https://go-review.googlesource.com/c/go/+/712920
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

go/types, types: proceed with correct (invalid) type in case of a selector error

Fixes #76103.

Change-Id: Idc2f5d1d7aeb4a9b468e7c268e3bf5b85d1c3777
Reviewed-on: https://go-review.googlesource.com/c/go/+/716300
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>

strconv: remove &0xFF trick in formatBase10

The compiler is now smart enough to remove the bounds check itself.

                              │ 731f809c971  │            48fabc7d33b             │
                              │    sec/op    │   sec/op     vs base               │
AppendUint-12                    98.69n ± 1%   95.81n ± 1%  -2.91% (p=0.000 n=20)
AppendUintVarlen/digits=1-12     3.119n ± 1%   3.099n ± 1%       ~ (p=0.743 n=20)
AppendUintVarlen/digits=2-12     2.654n ± 0%   2.653n ± 0%       ~ (p=0.825 n=20)
AppendUintVarlen/digits=3-12     5.042n ± 0%   5.055n ± 1%       ~ (p=0.005 n=20)
AppendUintVarlen/digits=4-12     5.062n ± 1%   5.044n ± 0%       ~ (p=0.011 n=20)
AppendUintVarlen/digits=5-12     5.863n ± 0%   5.908n ± 1%       ~ (p=0.075 n=20)
AppendUintVarlen/digits=6-12     6.137n ± 0%   6.117n ± 1%       ~ (p=0.857 n=20)
AppendUintVarlen/digits=7-12     7.367n ± 0%   7.366n ± 0%       ~ (p=0.784 n=20)
AppendUintVarlen/digits=8-12     7.369n ± 0%   7.381n ± 0%       ~ (p=0.159 n=20)
AppendUintVarlen/digits=9-12     7.795n ± 2%   7.749n ± 0%       ~ (p=0.180 n=20)
AppendUintVarlen/digits=10-12    9.208n ± 1%   8.661n ± 0%  -5.94% (p=0.000 n=20)
AppendUintVarlen/digits=11-12    9.479n ± 1%   8.984n ± 0%  -5.22% (p=0.000 n=20)
AppendUintVarlen/digits=12-12    9.784n ± 0%   9.229n ± 1%  -5.67% (p=0.000 n=20)
AppendUintVarlen/digits=13-12   10.035n ± 1%   9.504n ± 0%  -5.29% (p=0.000 n=20)
AppendUintVarlen/digits=14-12    10.89n ± 1%   10.35n ± 0%  -4.96% (p=0.000 n=20)
AppendUintVarlen/digits=15-12    11.12n ± 0%   10.61n ± 1%  -4.67% (p=0.000 n=20)
AppendUintVarlen/digits=16-12    12.29n ± 0%   11.85n ± 1%  -3.62% (p=0.000 n=20)
AppendUintVarlen/digits=17-12    12.32n ± 0%   11.85n ± 1%  -3.85% (p=0.000 n=20)
AppendUintVarlen/digits=18-12    12.80n ± 0%   12.32n ± 1%  -3.79% (p=0.000 n=20)
AppendUintVarlen/digits=19-12    14.62n ± 1%   13.71n ± 1%  -6.29% (p=0.000 n=20)
AppendUintVarlen/digits=20-12    14.83n ± 0%   13.93n ± 0%  -6.10% (p=0.000 n=20)
geomean                          9.102n        8.843n       -2.84%

Change-Id: Ic8c79b472d5c30dccc1d974b47647f6425618e00
Reviewed-on: https://go-review.googlesource.com/c/go/+/714161
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>

cmd/compile: make prove understand div, mod better

This CL introduces new divisible and divmod passes that rewrite
divisibility checks and div, mod, and mul. These happen after
prove, so that prove can make better sense of the code for
deriving bounds, and they must run before decompose, so that
64-bit ops can be lowered to 32-bit ops on 32-bit systems.
And then they need another generic pass as well, to optimize
the generated code before decomposing.

The three opt passes are "opt", "middle opt", and "late opt".
(Perhaps instead they should be "generic", "opt", and "late opt"?)

The "late opt" pass repeats the "middle opt" work on any new code
that has been generated in the interim.
There will not be new divs or mods, but there may be new muls.

The x%c==0 rewrite rules are much simpler now, since they can
match before divs have been rewritten. This has the effect of
applying them more consistently and making the rewrite rules
independent of the exact div rewrites.

Prove is also now charged with marking signed div/mod as
unsigned when the arguments call for it, allowing simpler
code to be emitted in various cases. For example,
t.Seconds()/2 and len(x)/2 are now recognized as unsigned,
meaning they compile to a simple shift (unsigned division),
avoiding the more complex fixup we need for signed values.

https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32
shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std'
output before and after. "Proved Rsh64x64 shifts to zero" is replaced
by the higher-level "Proved Div64 is unsigned" (the shift was in the
signed expansion of div by constant), but otherwise prove is only
finding more things to prove.

One short example, in code that does x[i%len(x)]:

< runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero
---
> runtime/mfinal.go:131:34: Proved Div64 is unsigned
> runtime/mfinal.go:131:38: Proved IsInBounds

A longer example:

< crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero
---
> crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds

These diffs are due to the smaller opt being better
and taking work away from prove:

< image/jpeg/dct.go:307:5: Proved IsInBounds
< image/jpeg/dct.go:308:5: Proved IsInBounds
...
< image/jpeg/dct.go:442:5: Proved IsInBounds

In the old opt, Mul by 8 was rewritten to Lsh by 3 early.
This CL delays that rule to help prove recognize mods,
but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8].
Specifically, computing the length, opt can now do:

(Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) ->
(Add 8 (Sub (Mul 8 i) (Mul 8 i))) ->
(Add 8 (Mul 8 (Sub i i))) ->
(Add 8 (Mul 8 0)) ->
(Add 8 0) ->
8

The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)),
Leaving the multiply as Mul enables using that step; the old
rewrite to Lsh blocked it, leaving prove to figure out the length
and then remove the bounds checks. But now opt can evaluate
the length down to a constant 8 and then constant-fold away
the bounds checks 0 < 8, 1 < 8, and so on. After that,
the compiler has nothing left to prove.

Benchmarks are noisy in general; I checked the assembly for the many
large increases below, and the vast majority are unchanged and
presumably hitting the caches differently in some way.

The divisibility optimizations were not reliably triggering before.
This leads to a very large improvement in some cases, like
DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems
and DivisbleconstU64 on 32-bit systems.

Another way the divisibility optimizations were unreliable before
was incorrectly triggering for x/3, x%3 even though they are
written not to do that. There is a real but small slowdown
in the DivisibleWDivconst benchmarks on Mac because in the cases
used in the benchmark, it is still faster (on Mac) to do the
divisibility check than to remultiply.
This may be worth further study. Perhaps when there is no rotate
(meaning the divisor is odd), the divisibility optimization
should be enabled always. In any event, this CL makes it possible
to study that.

benchmark \ host                          s7  linux-amd64      mac  linux-arm64  linux-ppc64le  linux-386  s7:GOARCH=386  linux-arm
                                     vs base      vs base  vs base      vs base        vs base    vs base        vs base    vs base
LoadAdd                                    ~            ~        ~            ~              ~     -1.59%              ~          ~
ExtShift                                   ~            ~  -42.14%       +0.10%              ~     +1.44%         +5.66%     +8.50%
Modify                                     ~            ~        ~            ~              ~          ~              ~     -1.53%
MullImm                                    ~            ~        ~            ~              ~    +37.90%        -21.87%     +3.05%
ConstModify                                ~            ~        ~            ~        -49.14%          ~              ~          ~
BitSet                                     ~            ~        ~            ~        -15.86%    -14.57%         +6.44%     +0.06%
BitClear                                   ~            ~        ~            ~              ~     +1.78%         +3.50%     +0.06%
BitToggle                                  ~            ~        ~            ~              ~    -16.09%         +2.91%          ~
BitSetConst                                ~            ~        ~            ~              ~          ~              ~     -0.49%
BitClearConst                              ~            ~        ~            ~        -28.29%          ~              ~     -0.40%
BitToggleConst                             ~            ~        ~       +8.89%        -31.19%          ~              ~     -0.77%
MulNeg                                     ~            ~        ~            ~              ~          ~              ~          ~
Mul2Neg                                    ~            ~   -4.83%            ~              ~    -13.75%         -5.92%          ~
DivconstI64                                ~            ~        ~            ~              ~    -30.12%              ~     +0.50%
ModconstI64                                ~            ~   -9.94%       -4.63%              ~     +3.15%              ~     +5.32%
DivisiblePow2constI64                -34.49%      -12.58%        ~            ~        -12.25%          ~              ~          ~
DivisibleconstI64                    -24.69%      -25.06%   -0.40%       -2.27%        -42.61%     -3.31%              ~     +1.63%
DivisibleWDivconstI64                      ~            ~        ~            ~              ~    -17.55%              ~     -0.60%
DivconstU64/3                              ~            ~        ~            ~              ~     +1.51%              ~          ~
DivconstU64/5                              ~            ~        ~            ~              ~          ~              ~          ~
DivconstU64/37                             ~            ~   -0.18%            ~              ~     +2.70%              ~          ~
DivconstU64/1234567                        ~            ~        ~            ~              ~          ~              ~     +0.12%
ModconstU64                                ~            ~        ~       -0.24%              ~     -5.10%         -1.07%     -1.56%
DivisibleconstU64                          ~            ~        ~            ~              ~    -29.01%        -59.13%    -50.72%
DivisibleWDivconstU64                      ~            ~  -12.18%      -18.88%              ~     -5.50%         -3.91%     +5.17%
DivconstI32                                ~            ~   -0.48%            ~        -34.69%    +89.01%         -6.01%    -16.67%
ModconstI32                                ~       +2.95%   -0.33%            ~              ~     -2.98%         -5.40%     -8.30%
DivisiblePow2constI32                      ~            ~        ~            ~              ~          ~              ~    -16.22%
DivisibleconstI32                          ~            ~        ~            ~              ~    -37.27%        -47.75%    -25.03%
DivisibleWDivconstI32                -11.59%       +5.22%  -12.99%      -23.83%              ~    +45.95%         -7.03%    -10.01%
DivconstU32                                ~            ~        ~            ~              ~    +74.71%         +4.81%          ~
ModconstU32                                ~            ~   +0.53%       +0.18%              ~    +51.16%              ~          ~
DivisibleconstU32                          ~            ~        ~       -0.62%              ~     -4.25%              ~          ~
DivisibleWDivconstU32                 -2.77%       +5.56%  +11.12%       -5.15%              ~    +48.70%        +25.11%     -4.07%
DivconstI16                           -6.06%            ~   -0.33%       +0.22%              ~          ~         -9.68%     +5.47%
ModconstI16                                ~            ~   +4.44%       +2.82%              ~          ~              ~     +5.06%
DivisiblePow2constI16                      ~            ~        ~            ~              ~          ~              ~     -0.17%
DivisibleconstI16                          ~            ~   -0.23%            ~              ~          ~         +4.60%     +6.64%
DivisibleWDivconstI16                 -1.44%       -0.43%  +13.48%       -5.76%              ~     +1.62%        -23.15%     -9.06%
DivconstU16                           +1.61%            ~   -0.35%       -0.47%              ~          ~        +15.59%          ~
ModconstU16                                ~            ~        ~            ~              ~     -0.72%              ~    +14.23%
DivisibleconstU16                          ~            ~   -0.05%       +3.00%              ~          ~              ~     +5.06%
DivisibleWDivconstU16                +52.10%       +0.75%  +17.28%       +4.79%              ~    -37.39%         +5.28%     -9.06%
DivconstI8                                 ~            ~   -0.34%       -0.96%              ~          ~         -9.20%          ~
ModconstI8                            +2.29%            ~   +4.38%       +2.96%              ~          ~              ~          ~
DivisiblePow2constI8                       ~            ~        ~            ~              ~          ~              ~          ~
DivisibleconstI8                           ~            ~        ~            ~              ~          ~         +6.04%          ~
DivisibleWDivconstI8                 -26.44%       +1.69%  +17.03%       +4.05%              ~    +32.48%        -24.90%          ~
DivconstU8                            -4.50%      +14.06%   -0.28%            ~              ~          ~         +4.16%     +0.88%
ModconstU8                                 ~            ~  +25.84%       -0.64%              ~          ~              ~          ~
DivisibleconstU8                           ~            ~   -5.70%            ~              ~          ~              ~          ~
DivisibleWDivconstU8                 +49.55%       +9.07%        ~       +4.03%        +53.87%    -40.03%        +39.72%     -3.01%
Mul2                                       ~            ~        ~            ~              ~          ~              ~          ~
MulNeg2                                    ~            ~        ~            ~        -11.73%          ~              ~     -0.02%
EfaceInteger                               ~            ~        ~            ~              ~    +18.11%              ~     +2.53%
TypeAssert                           +33.90%       +2.86%        ~            ~              ~     -1.07%         -5.29%     -1.04%
Div64UnsignedSmall                         ~            ~        ~            ~              ~          ~              ~          ~
Div64Small                                 ~            ~        ~            ~              ~     -0.88%              ~     +2.39%
Div64SmallNegDivisor                       ~            ~        ~            ~              ~          ~              ~     +0.35%
Div64SmallNegDividend                      ~            ~        ~            ~              ~     -0.84%              ~     +3.57%
Div64SmallNegBoth                          ~            ~        ~            ~              ~     -0.86%              ~     +3.55%
Div64Unsigned                              ~            ~        ~            ~              ~          ~              ~     -0.11%
Div64                                      ~            ~        ~            ~              ~          ~              ~     +0.11%
Div64NegDivisor                            ~            ~        ~            ~              ~     -1.29%              ~          ~
Div64NegDividend                           ~            ~        ~            ~              ~     -1.44%              ~          ~
Div64NegBoth                               ~            ~        ~            ~              ~          ~              ~     +0.28%
Mod64UnsignedSmall                         ~            ~        ~            ~              ~     +0.48%              ~     +0.93%
Mod64Small                                 ~            ~        ~            ~              ~          ~              ~          ~
Mod64SmallNegDivisor                       ~            ~        ~            ~              ~          ~              ~     +1.44%
Mod64SmallNegDividend                      ~            ~        ~            ~              ~     +0.22%              ~     +1.37%
Mod64SmallNegBoth                          ~            ~        ~            ~              ~          ~              ~     -2.22%
Mod64Unsigned                              ~            ~        ~            ~              ~     -0.95%              ~     +0.11%
Mod64                                      ~            ~        ~            ~              ~          ~              ~          ~
Mod64NegDivisor                            ~            ~        ~            ~              ~          ~              ~     -0.02%
Mod64NegDividend                           ~            ~        ~            ~              ~          ~              ~          ~
Mod64NegBoth                               ~            ~        ~            ~              ~          ~              ~     -0.02%
MulconstI32/3                              ~            ~        ~      -25.00%              ~          ~              ~    +47.37%
MulconstI32/5                              ~            ~        ~      +33.28%              ~          ~              ~    +32.21%
MulconstI32/12                             ~            ~        ~       -2.13%              ~          ~              ~     -0.02%
MulconstI32/120                            ~            ~        ~       +2.93%              ~          ~              ~     -0.03%
MulconstI32/-120                           ~            ~        ~       -2.17%              ~          ~              ~     -0.03%
MulconstI32/65537                          ~            ~        ~            ~              ~          ~              ~     +0.03%
MulconstI32/65538                          ~            ~        ~            ~              ~    -33.38%              ~     +0.04%
MulconstI64/3                              ~            ~        ~      +33.35%              ~     -0.37%              ~     -0.13%
MulconstI64/5                              ~            ~        ~      -25.00%              ~     -0.34%              ~          ~
MulconstI64/12                             ~            ~        ~       +2.13%              ~    +11.62%              ~     +2.30%
MulconstI64/120                            ~            ~        ~       -1.98%              ~          ~              ~          ~
MulconstI64/-120                           ~            ~        ~       +0.75%              ~          ~              ~          ~
MulconstI64/65537                          ~            ~        ~            ~              ~     +5.61%              ~          ~
MulconstI64/65538                          ~            ~        ~            ~              ~     +5.25%              ~          ~
MulconstU32/3                              ~       +0.81%        ~      +33.39%              ~    +77.92%              ~    -32.31%
MulconstU32/5                              ~            ~        ~      -24.97%              ~    +77.92%              ~    -24.47%
MulconstU32/12                             ~            ~        ~       +2.06%              ~          ~              ~     +0.03%
MulconstU32/120                            ~            ~        ~       -2.74%              ~          ~              ~     +0.03%
MulconstU32/65537                          ~            ~        ~            ~              ~          ~              ~     +0.03%
MulconstU32/65538                          ~            ~        ~            ~              ~    -33.42%              ~     -0.03%
MulconstU64/3                              ~            ~        ~      +33.33%              ~     -0.28%              ~     +1.22%
MulconstU64/5                              ~            ~        ~      -25.00%              ~          ~              ~     -0.64%
MulconstU64/12                             ~            ~        ~       +2.30%              ~    +11.59%              ~     +0.14%
MulconstU64/120                            ~            ~        ~       -2.82%              ~          ~              ~     +0.04%
MulconstU64/65537                          ~       +0.37%        ~            ~              ~     +5.58%              ~          ~
MulconstU64/65538                          ~            ~        ~            ~              ~     +5.16%              ~          ~
ShiftArithmeticRight                       ~            ~        ~            ~              ~    -10.81%              ~     +0.31%
Switch8Predictable                   +14.69%            ~        ~            ~              ~    -24.85%              ~          ~
Switch8Unpredictable                       ~       -0.58%   -3.80%            ~              ~    -11.78%              ~     -0.79%
Switch32Predictable                  -10.33%      +17.89%        ~            ~              ~     +5.76%              ~          ~
Switch32Unpredictable                 -3.15%       +1.19%   +9.42%            ~              ~    -10.30%         -5.09%     +0.44%
SwitchStringPredictable              +70.88%      +20.48%        ~            ~              ~     +2.39%              ~     +0.31%
SwitchStringUnpredictable                  ~       +3.91%   -5.06%       -0.98%              ~     +0.61%         +2.03%          ~
SwitchTypePredictable               +146.58%       -1.10%        ~      -12.45%              ~     -0.46%         -3.81%          ~
SwitchTypeUnpredictable               +0.46%       -0.83%        ~       +4.18%              ~     +0.43%              ~     +0.62%
SwitchInterfaceTypePredictable       -13.41%      -10.13%  +11.03%            ~              ~     -4.38%              ~     +0.75%
SwitchInterfaceTypeUnpredictable      -6.37%       -2.14%        ~       -3.21%              ~     -4.20%              ~     +1.08%

Fixes #63110.
Fixes #75954.

Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/714160
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

test/codegen: simplify asmcheck pattern matching

Separate patterns in asmcheck by spaces instead of commas.
Many patterns end in comma (like "MOV [$]123,") so separating
patterns by comma is not great; they're already quoted, so spaces are fine.

Also replace all tabs in the assembly lines with spaces before matching.
Finally, replace \$ or \\$ with [$] as the matching idiom.
The effect of all these is to make the patterns look like:

// amd64:"BSFQ" "ORQ [$]256"

instead of the old:

// amd64:"BSFQ","ORQ\t\\$256"

Update all tests as well.

Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807
Reviewed-on: https://go-review.googlesource.com/c/go/+/716060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>

runtime: tweak example code for gorecover

Change-Id: Ie545610fab008f7855bcb1a608f98e0c5109ec20
GitHub-Last-Rev: 66401b913ad8e85c1dc6c9c0e785f57d5dc5c47e
GitHub-Pull-Request: golang/go#76100
Reviewed-on: https://go-review.googlesource.com/c/go/+/716040
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>

crypto/internal/fips140/bigmod: fix extendedGCD comment

Change-Id: I6a6a6964642991dc46929bfc47e411bb7563e425
Reviewed-on: https://go-review.googlesource.com/c/go/+/716080
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

runtime: use internal/strconv

Runtime doing its own number formatting dates back to
when runtime was the bottom-most Go package.
Those days are long gone. Use internal/strconv to avoid
duplicating code and also to get better floating-point
formatting:

% go1.24.6 run x.go
+1.234568e+004
% go run x.go
12345.678
%

With accurate floating point it becomes necessary to
introduce separate printers for float32 vs float64 and
for complex64 vs complex128. Otherwise float32(93.7)
prints as 93.69999694824219.

Change-Id: I25ae3f09519342dc3d1dcabf4711651423e00128
Reviewed-on: https://go-review.googlesource.com/c/go/+/716002
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

internal/itoa, internal/runtime/strconv: delete

Replaced by internal/strconv.

Change-Id: I0656a9ad5075e60339e963fbae7d194d2f3e16be
Reviewed-on: https://go-review.googlesource.com/c/go/+/716001
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

strconv: move all but Quote to internal/strconv

This will let low-level things depend on the canonical routines,
even for floating-point printing.

Change-Id: I31207dc6584ad90d4e365dbe6eaf20f8662ed22d
Reviewed-on: https://go-review.googlesource.com/c/go/+/716000
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

internal/runtime/gc/scan: avoid memory destination on VPCOMPRESSQ

On AMD Genoa / Zen 4, VPCOMPRESSQ with a memory destination imposes a
severe performance penalty of another an order of magnitude compared to
a register destination.

We can trivially work around this penalty with a register destination
and an additional move to memory.

Benchmark results from:

$ go test -bench=BenchmarkScanSpanPacked/.*/.*/.*/.*/impl=Platform internal/runtime/gc/scan

I've only included the summarized geomean here because there are ~2500
unique test cases.

AMD Genoa (Zen 4):

cpu: AMD EPYC 9B14 96-Core Processor
         │      mem      │                 reg        │
         │    sec/op     │    sec/op     vs base      │
geomean     1.039µ         310.1n        -70.16%

         │      mem      │                  reg       │
         │      B/s      │      B/s        vs base    │
geomean    2.906Gi          10.99Gi        +278.27%

As expected, we see a massive performance improvement on Genoa.

AMD Turin (Zen 5):

cpu: AMD EPYC 9B45 128-Core Processor
         │      mem      │                 reg        │
         │    sec/op     │    sec/op      vs base     │
geomean     231.9n          237.3n         +2.32%

         │      mem       │                  reg      │
         │      B/s       │      B/s        vs base   │
geomean     14.79Gi          14.43Gi         -2.50%

On Turin there is a minor regression. This is primarily due to a fairly
large regression (~15%) in very small microbenchmark cases where the
entire memory fits in L1 cache. This regression disappears as memory
access slows down with larger memories. The latter should be more common
in real workloads.

Intel Sapphire Rapids:

cpu: Intel(R) Xeon(R) Platinum 8481C
         │      mem      │                 reg        │
         │    sec/op     │    sec/op      vs base     │
geomean     254.9n          246.8n         -3.18%

         │      mem       │                  reg      │
         │      B/s       │      B/s        vs base   │
geomean     13.65Gi          14.15Gi         +3.69%

On Sapphire Rapids there is a minor improvement. Here results are fairly
noisy. Most cases are a wash, but some are arbitrary 20% slower or 20%
faster for unclear reasons.

For #73581.

Change-Id: I6a6a636cfd294a0dcdc4f34c9ece1bc9a6e5e4c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/715362
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>

cmd/compile: extend ppc64 MADDLD to match const ADDconst & MULLDconst

Fixes #76084

I was focused on restoring the old behavior and fixing the failing
test/codegen/arithmetic.go:MergeMuls2 test.

It is probable this same bug hides elsewhere in this file.

Change-Id: I17f2ee6b97a1e33b8132648d9d750749d006f7e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/715560
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>

cmd/compile: name change isDirect -> isDirectAndComparable

Now that it also restricts to comparable types. Followon to CL 713840.

Change-Id: Idd975c3fd16fb51f55360f2fa0b89ab0bf1d00ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/715700
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mateusz Poliwczak <mpoliwczak34@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/compile: don't optimize away a panicing interface comparison

We can't do direct pointer comparisons if the type is not a
comparable type.

Fixes #76008

Change-Id: I1687acff21832d2c2e8f3b875e7b5ec125702ef3
Reviewed-on: https://go-review.googlesource.com/c/go/+/713840
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>

cmd/compile: extend loong MOV*idx rules to match ADDshiftLLV

Fixes #76085

I was focused on restoring the old behavior and fixing the failing
test/codegen/floats.go:index* tests.

It is probable this same bug hides elsewhere in this file.

Change-Id: Ibb2cb2be5c7bbeb5eafa9705d998a67380f2b04c
Reviewed-on: https://go-review.googlesource.com/c/go/+/715580
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

runtime: define PanicBounds in funcdata.h

The comment in funcdata.h says that the constants must agree
with those in internal/abi/symtab.go. Make that so.

Change-Id: Ib64146bfb31fdecfc1cc6ae03ae746a1b4a4d22e
Reviewed-on: https://go-review.googlesource.com/c/go/+/715521
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>

crypto/internal/fips140test: collect 300M entropy samples for ESV

Change-Id: I6a6a69649df8f576df62e22c16db7813cde75224
Reviewed-on: https://go-review.googlesource.com/c/go/+/715401
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

runtime: amend comments a bit

Change-Id: I3cabc57f6b8f803f966221f9583a5edb8828ca12
GitHub-Last-Rev: 57569ace50ab8ce3d39e17ddf25ad161dffcc19d
GitHub-Pull-Request: golang/go#76086
Reviewed-on: https://go-review.googlesource.com/c/go/+/715600
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>

errors: document that the target of Is must be comparable

If target is not comparable, then errors.Is(err, target) can panic.
(Put another way, if target == target panics, then Is can panic.)

Document that the target must be comparable.

For #74488

Change-Id: I694dc4c91a608b80f044f06dd1c6ac32b8e77c9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/715440
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
TryBot-Bypass: Damien Neil <dneil@google.com>

go/types, types2: pull up package-level object sort to a separate phase

This step allows future additional phases to reuse the sorted object
list. Preparation for upcoming CLs.

Change-Id: I22eaffd5bbe39c7cc101c6d860011dc3cb98ce37
Reviewed-on: https://go-review.googlesource.com/c/go/+/715480
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Mark Freeman <markfreeman@google.com>

go/types, types2: reduce locks held at once in resolveUnderlying

There is no need to hold locks for the entire chain of Named types in
resolveUnderlying. This change moves the locking / unlocking right to
where t.underlying is set.

This change consolidates logic into resolveUnderlying where possible
and makes minor stylistic / documentation adjustments.

Change-Id: Ic5ec5a7e9a0da8bc34954bf456e4e23a28df296d
Reviewed-on: https://go-review.googlesource.com/c/go/+/714403
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/compile: rewrite proved multiplies by 0 or 1 into CondSelect

Updates #76056

Change-Id: I64fe631ab381c74f902f877392530d7cc91860ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/715044
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/compile: move branchelim supported arches to Config

Change-Id: I8d10399ba71e5fa97ead06a717fc972c806c0856
Reviewed-on: https://go-review.googlesource.com/c/go/+/715042
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

crypto/subtle,cmd/compile: add intrinsics for ConstantTimeSelect and *Eq

Targeting crypto/subtle rather than
crypto/internal/fips140/subtle after discussion with Filippo.

goos: linux
goarch: amd64
pkg: crypto/subtle
cpu: AMD Ryzen 5 3600 6-Core Processor
                        │ /tmp/old.logs │            /tmp/new.logs             │
                        │    sec/op     │    sec/op     vs base                │
ConstantTimeSelect-12      0.5246n ± 1%   0.5217n ± 2%        ~ (p=0.118 n=10)
ConstantTimeByteEq-12      1.0415n ± 1%   0.5202n ± 2%  -50.05% (p=0.000 n=10)
ConstantTimeEq-12          0.7813n ± 2%   0.7819n ± 0%        ~ (p=0.897 n=10)
ConstantTimeLessOrEq-12    1.0415n ± 3%   0.7813n ± 1%  -24.98% (p=0.000 n=10)
geomean                    0.8166n        0.6381n       -21.86%

The last three will become 1 lat-cycle (0.25ns) faster once #76066 is fixed.

The Select being that fast with the old code is really impressive.
I am pretty sure this happens because my CPU has BMI1&2 support and
a fusing unit able to translate non BMI code into BMI code.
This benchmark doesn't capture the CACHE gains from the shorter assembly.

It currently compiles as:
v17 = TESTQ <flags> v31 v31 // v != 0
v20 = CMOVQNE <int> v32 v33 v17 (y[int])

It is possible to remove the `TESTQ` by compiletime fusing it with the
compare in a pattern like this:
subtle.ConstantTimeSelect(subtle.ConstantTimeLessOrEq(left, right), right, left)

Saving 2 latency-cycles (1 with #76066 fixed).

Updates #76056

Change-Id: I61a1df99e97a1506f75dae13db529f43846d8f1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/715045
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>

cmd/compile: add generic rules to remove bool → int → bool roundtrips

Change-Id: I8b0a3b64c89fe167d304f901a5d38470f35400ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/715200
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>

cmd/compile: do not Zext bools to 64bits in amd64 CMOV generation rules

Change-Id: I77b714ed767e50d13183f4307f65e47ca7577f9f
Reviewed-on: https://go-review.googlesource.com/c/go/+/715380
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/compile: introduce bytesizeToConst to cleanup switches in prove

Change-Id: I32b45d9632a8131911cb9bd6eff075eb8312ccfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/715043
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>

cmd/link: internal linking support for windows/arm64

The internal linker was missing some pieces to support windows/arm64.

Closes #75485

Cq-Include-Trybots: luci.golang.try:gotip-windows-arm64
Change-Id: I5c18a47e63e09b8ae22c9b24832249b54f544b7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/704295
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

internal/runtime/gc/scan: correct size class size check

This check intends to skip size classes that are too big for scanSpan,
but it compares the size class index with a byte size. It must do the
conversion first.

For #73581.

Change-Id: I6a6a636c8d19fa3bf2a2b609870d67d33f47f66e
Reviewed-on: https://go-review.googlesource.com/c/go/+/715460
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

cmd/compile: add position info to sccp debug messages

Change-Id: Ic568dd3b2e3ebebb1b6aaa41ee78a12d4e8d3f06
Reviewed-on: https://go-review.googlesource.com/c/go/+/714221
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>

cmd/compile: teach prove about unsigned rounding-up divide

Change-Id: Ia7b5242c723f83ba85d12e4ca64a19fbbd126016
Reviewed-on: https://go-review.googlesource.com/c/go/+/714622
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

cmd/compile: change !l.nonzero() into l.maybezero()

if l.maybezero()
is easier to read than
if !l.nonzero()

Change-Id: I1183b0c0dc51fa1eed26dfc7a5a996783806a991
Reviewed-on: https://go-review.googlesource.com/c/go/+/714621
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

cmd/compile: optimize Add64carry with unused carries into plain Add64

Change-Id: I8a63f567cfc574bb066ad6269eec6929760cb9c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656338
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>

cmd/compile: remove 68857 ModU flowLimit workaround in prove

We can know this is correct because all the testcases added by CL 605156 are still passing.
Partial revert of CL 605156 (everything but the testcases).

Change-Id: I5d8daadb4cb35a9de29daaabc22baee642511fe0
Reviewed-on: https://go-review.googlesource.com/c/go/+/714941
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/compile: remove 68857 min & max flowLimit workaround in prove

We can know this is correct because all the testcases added by CL 656157 are still passing.
Partial revert of CL 656157 (everything but the testcases).

Change-Id: I24931fa1affba7e9e92233b3de74ebade3d48a09
Reviewed-on: https://go-review.googlesource.com/c/go/+/714921
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/compile: use topo-sort in prove to correctly learn facts while walking once

Fixes #68857

Change-Id: Ideb359cc6f1550afb4c79f02d25a00d0ae5e5c50
Reviewed-on: https://go-review.googlesource.com/c/go/+/714920
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>

runtime: avoid bound check in freebsd binuptime

Fixes #76062

Change-Id: I683c1232aaeac12b0b3688472bb277adb95ad542
Reviewed-on: https://go-review.googlesource.com/c/go/+/715180
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>

cmd/internal/obj, cmd/asm: reclassify the offset of memory access operations on loong64

This CL also fixes the encoding error of LL/SC[V] instruction and
adds the handling of offset greater than 16 bits in MOV{W/V}P instructions.

Change-Id: I7a8fab4b68a6839da81c5e59af1f42289d00ef61
Reviewed-on: https://go-review.googlesource.com/c/go/+/706435
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>

cmd/go: remove global loader state variable

Prior commits have removed all dependencies on the global module
loader state variable, so it is now unnecessary and thus removed.

This commit is part of the overall effort to eliminate global
modloader state.

Change-Id: Idaf033ee703a18ffd29e40fad80ef13c7b33dbec
Reviewed-on: https://go-review.googlesource.com/c/go/+/711139
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@google.com>
Reviewed-by: Michael Matloob <matloob@golang.org>

cmd/go: use local state for telemetry

This change adds a new loader state to satisfy the requirements for
the (currently unused) telemetry stats for the go command.

This commit is part of the overall effort to eliminate global
modloader state.

Change-Id: I6d4e38c91e5413d7649dfc6301e3ba35ee36c9b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/711136
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/go: use tagged switch

Minor modernization to use a tagged switch statement in lieu of
if-else chain.

Change-Id: I132d279d421b4a609403f85f9f1ddfc2605a5399
Reviewed-on: https://go-review.googlesource.com/c/go/+/715341
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>

cmd/go: increase stmt threshold on amd64

This change slightly increases the stmt threshold on the amd64
platform.

Change-Id: I87e39753b52d6d72f2cd77f1cb8015b1e550921a
Reviewed-on: https://go-review.googlesource.com/c/go/+/715340
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/go: removed unused code in toolchain.Exec

This change removes the unused module loader state in the function
`toolchain.Exec`.

This commit is part of the overall effort to eliminate global
modloader state.

Change-Id: I8935f14447db4669457becc5a96db7f45132772f
Reviewed-on: https://go-review.googlesource.com/c/go/+/709980
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@golang.org>

go/types, types2: clarify docs for resolveUnderlying

The resolveUnderlying method only detects cycles among type names, where
no type literal or predeclared type can be found (which would yield an
underlying type).

Change-Id: I203f3856eaf63a8a9d317c22521755390f9c1023
Reviewed-on: https://go-review.googlesource.com/c/go/+/714402
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

go/types, types2: wrap Named.fromRHS into Named.rhs

In debug mode, the Named.rhs method asserts that Named is in a state
with Named.fromRHS populated.

This caught a missing call to Named.unpack in validtype, which has been
added.

Change-Id: Id3f95f78f03d98a6efe87af6ac24f2ac2e285f96
Reviewed-on: https://go-review.googlesource.com/c/go/+/714242
Auto-Submit: Mark Freeman <markfreeman@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

go/types, types2: verify stateMask transitions in debug mode

Recently, we've changed the representation of Named type state from
an integer to a bit mask, which is a bit more complicated. To make
sure we uphold state invariants, we are adding a verification step
on each state transition.

This uncovered a few places where we do not uphold the transition
invariants; those are patched in this CL.

Change-Id: I76569e4326b2d362d7a1f078641029ffb3dca531
Reviewed-on: https://go-review.googlesource.com/c/go/+/714241
Auto-Submit: Mark Freeman <markfreeman@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

net/url: further speed up escape and unescape

This change is a follow-up to CL 712200. It further simplifies and speeds up
functions escape and unescape.

Here are some benchmark results (no change to allocations):

goos: darwin
goarch: amd64
pkg: net/url
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                    │ go/src/old  │             go/src/new             │
                    │   sec/op    │   sec/op     vs base               │
QueryEscape/#00-8     34.58n ± 1%   31.97n ± 1%  -7.55% (p=0.000 n=20)
QueryEscape/#01-8     92.92n ± 0%   94.63n ± 0%  +1.84% (p=0.000 n=20)
QueryEscape/#02-8     75.44n ± 0%   73.32n ± 0%  -2.80% (p=0.000 n=20)
QueryEscape/#03-8     143.4n ± 0%   136.6n ± 0%  -4.71% (p=0.000 n=20)
QueryEscape/#04-8     918.8n ± 1%   838.3n ± 0%  -8.76% (p=0.000 n=20)
PathEscape/#00-8      43.93n ± 0%   42.86n ± 0%  -2.44% (p=0.000 n=20)
PathEscape/#01-8      94.99n ± 0%   95.86n ± 0%  +0.91% (p=0.000 n=20)
PathEscape/#02-8      75.40n ± 1%   71.50n ± 1%  -5.18% (p=0.000 n=20)
PathEscape/#03-8      143.4n ± 0%   136.2n ± 0%  -4.99% (p=0.000 n=20)
PathEscape/#04-8      871.8n ± 0%   822.7n ± 0%  -5.63% (p=0.000 n=20)
QueryUnescape/#00-8   52.64n ± 1%   51.19n ± 0%  -2.75% (p=0.000 n=20)
QueryUnescape/#01-8   137.4n ± 1%   137.9n ± 1%       ~ (p=0.297 n=20)
QueryUnescape/#02-8   114.0n ± 0%   122.3n ± 1%  +7.24% (p=0.000 n=20)
QueryUnescape/#03-8   271.8n ± 0%   260.7n ± 1%  -4.08% (p=0.000 n=20)
QueryUnescape/#04-8   1.390µ ± 1%   1.355µ ± 0%  -2.52% (p=0.000 n=20)
PathUnescape/#00-8    52.45n ± 1%   53.03n ± 1%  +1.10% (p=0.008 n=20)
PathUnescape/#01-8    138.5n ± 1%   141.3n ± 0%  +2.06% (p=0.000 n=20)
PathUnescape/#02-8    114.0n ± 0%   121.5n ± 0%  +6.62% (p=0.000 n=20)
PathUnescape/#03-8    273.1n ± 1%   260.1n ± 0%  -4.76% (p=0.000 n=20)
PathUnescape/#04-8    1.431µ ± 1%   1.359µ ± 0%  -5.07% (p=0.000 n=20)
geomean               160.4n        156.9n       -2.14%

Updates #17860

Change-Id: If64ac3e9c62c41f672db06cfd7eab7357e934e6d
GitHub-Last-Rev: 1da047ac75f9a710baf75a45d105db4dc7b81810
GitHub-Pull-Request: golang/go#76048
Reviewed-on: https://go-review.googlesource.com/c/go/+/714900
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>

runtime: remove unused cgoCheckUsingType function

The only calls to it were removed in CL 616255.

Change-Id: I6c6b01e2e98d54300b6323fd74ccc45fa1d433dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/714820
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>

time: rewrite IsZero method to use wall and ext fields

Using wall and ext fields will be more efficient.

Fixes #76001

Change-Id: If2b9f597562e0d0d3f8ab300556fa559926480a0
GitHub-Last-Rev: 4a91948413079047cb6c382ed29844f456f3064d
GitHub-Pull-Request: golang/go#76006
Reviewed-on: https://go-review.googlesource.com/c/go/+/713720
Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>

cmd/go: reorder parameters so that context is first

This change simply reorders parameters so that the context.Context is
the first, as per standard practice.

This commit is part of the overall effort to eliminate global
modloader state.

Change-Id: I22d366fb2d2457f44d668409da3fe76fb87180cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/714780
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@golang.org>

sync: update comments for Once.done

Sync with CL 666895.

Change-Id: I49c4a7f88d87cee9c30a858facd3cd8348efdf94
GitHub-Last-Rev: 88ac1c9c4131aa3f5dfb9c7923e49e46808c409d
GitHub-Pull-Request: golang/go#76026
Reviewed-on: https://go-review.googlesource.com/c/go/+/714360
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>

runtime: add GOEXPERIMENT=runtimefree

This CL is part of a series of CLs to triangulate between the runtime,
compiler, and standard library to reduce how much work the GC must do.

An overall design document is in CL 700255.

This CL stack implements a runtime.free within the runtime, and
then uses it via automatic calls inserted by the compiler when
the compiler proves it is safe to do so. In the future, we can
also consider possibly a limited set of explicit calls from certain
low-level portions of the standard library.

When called, runtime.free allows immediate reuse of memory
without waiting for a GC cycle. The goals include less overall
CPU usage by the GC, longer times between GC cycles
(with less overall time with the write barrier enabled),
and more cache-friendly allocations for user code.

Here, we just add the GOEXPERIMENT=runtimefree flag. It currently
defaults to on, but can be disabled with GOEXPERIMENT=noruntimefree.

The actual implementation starts in CL 673695.

Updates #74299

Change-Id: I2f1f04dbdca51f4aaa735fd65bb2719c298d922e
Reviewed-on: https://go-review.googlesource.com/c/go/+/700235
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

cmd/compile: use MOV(D|F) with const for Const(64|32)F on riscv64

The original Const64F using: AUIPC + LD + FMVDX to load
float64 const, we can use AUIPC + FLD instead, same as Const32F.

Change-Id: I8ca0a0e90d820a26e69b74cd25df3cc662132bf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/703215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>