We don't need to check that the bit patterns of the constants
match, it is sufficient to just check the constant is equal to the
given value.
While we're here also change the FCLASSD rules to use a bit pattern
for the mask. I think this improves readability, particularly as
more uses of FCLASSD get added (e.g. CL 717560).
These changes should not affect codegen.
Change-Id: I92a6338dc71e6a71e04306f67d7d86016c6e9c47
Reviewed-on: https://go-review.googlesource.com/c/go/+/717580 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
cmd/compile/internal/ssa: more aggressive on dead auto elim
Propagate "unread" across OpMoves. If the addr of this auto is only used
by an OpMove as its source arg, and the OpMove's target arg is the addr
of another auto. If the 2nd auto can be eliminated, this one can also be
eliminated.
This CL eliminates unnecessary memory copies and makes the frame smaller
in the following code snippet:
func contains(m map[string][16]int, k string) bool {
_, ok := m[k]
return ok
}
These are the benchmark results followed by the benchmark code:
func init() {
for i := range 1000 {
m1[i] = i
m2[i] = [16]int{15:i}
m3[i] = [256]int{255:i}
}
}
func BenchmarkMap1Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m1[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap2Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m2[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap3Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m3[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
Fixes #75398
Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771
Reviewed-on: https://go-review.googlesource.com/c/go/+/702875 Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Ian Lance Taylor [Sun, 2 Nov 2025 04:34:53 +0000 (21:34 -0700)]
cmd/cgo: drop pre-1.18 support
Now that the bootstrap compiler is 1.24, it's no longer needed.
Change-Id: I9b3d6b7176af10fbc580173d50130120b542e7f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/717060 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Russ Cox [Sun, 2 Nov 2025 17:32:01 +0000 (12:32 -0500)]
internal/strconv: handle %f with fixedFtoa when possible
Everyone writes papers about fast shortest-output formatting.
Eventually we also sped up fixed-length formatting %e and %g.
But we've neglected %f, which falls back to the slow general code
even for relatively trivial things like %.2f on 1.23.
This CL uses the fast path fixedFtoa for %f when possible by
estimating the number of digits needed.
Roland Shoemaker [Mon, 27 Oct 2025 15:15:48 +0000 (08:15 -0700)]
encoding/pem: don't reslice in failure modes
We re-slice the data being processed at the stat of each loop. If the
var that we use to calculate where to re-slice is < 0 or > the length
of the remaining data, return instead of attempting to re-slice.
Change-Id: I1d6c2b6c596feedeea8feeaace370ea73ba02c4c
Reviewed-on: https://go-review.googlesource.com/c/go/+/715260
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Roland Shoemaker <roland@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
Russ Cox [Sat, 1 Nov 2025 13:41:40 +0000 (09:41 -0400)]
internal/strconv: extract fixed-precision ftoa from ftoaryu.go
The fixed-precision ftoa algorithm is not actually
documented in the Ryū paper, and it is fairly
straightforward: multiply by a power of 10 to get
an integer that contains the digits we need.
There is also no need for separate float32 and float64
implementations.
This CL implements a new fixedFtoa, separate from Ryū.
The overall algorithm is the same, but the new code
is simpler, faster, and better documented.
Now ftoaryu.go is only about shortest-output formatting,
so if and when yet another algorithm comes along, it will
be clearer what should be replaced (all of ftoaryu.go)
and what should not (all of ftoafixed.go).
Russ Cox [Sun, 2 Nov 2025 14:59:59 +0000 (09:59 -0500)]
internal/strconv: add tests and benchmarks for ftoaFixed
ftoaFixed is in the next CL; this proves the tests are correct
against the current implementation, and it adds a benchmark
for comparison with the new implementation.
Change-Id: I7ac8a1f699b693ea6d11a7122b22fc70cc135af6
Reviewed-on: https://go-review.googlesource.com/c/go/+/717181
Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Russ Cox [Sun, 2 Nov 2025 03:26:17 +0000 (23:26 -0400)]
internal/strconv: fix pow10 off-by-one in exponent result
The exact meaning of pow10 was not defined nor tested directly.
Define it as pow10(e) returns mant, exp where mant/2^128 * 2**exp = 10^e.
This is the most natural definition but is off-by-one from what
it had been returning. Fix the off-by-one and then adjust the
call sites to stop compensating for it.
Change-Id: I9ee475854f30be4bd0d4f4d770a6b12ec68281fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/717180
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Change-Id: Ifddc3d4d3fbaa6fee2e079bf2ebfe96a2febaa1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/716801 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change-Id: Ie23b2fdd09b4c93801dc804913206f1c5a496268
Reviewed-on: https://go-review.googlesource.com/c/go/+/716800 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Michael Anthony Knyszek [Thu, 30 Oct 2025 20:26:56 +0000 (20:26 +0000)]
runtime: allow Stack to traceback goroutines in syscall _Grunning window
net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of
all goroutines, and searches for a specific line in that stack trace.
It currently sometimes fails because it encounters the goroutine its
looking for in the small window where a goroutine might be in _Grunning
while in a syscall, introduced in CL 646198. In that case, the traceback
will give up, failing to print the stack TestCopyError is expecting.
This represents a general regression, since previously runtime.Stack
could never fail to take a goroutine's stack; giving up was only
possible in fatal panic cases.
Fix this the same way we fixed goroutine profiles: allow the stack trace
to proceed if the g's syscallsp != 0. This is safe in any
stop-the-world-related context, because syscallsp won't be mutated while
the goroutine fails to acquire a P, and thus fails to fully exit the
syscall context. This also means the stack below syscallsp won't be
mutated, and thus taking a traceback is also safe.
Fixes #66639.
Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/716680 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
cmd/go: link to go.dev/doc/godebug for removed GODEBUG settings
This makes the user experience better, before users would receive
an unknown godebug error message, now we explicitly mention that
it was removed and link to go.dev/doc/godebug where users can find
more information about the removal.
Additionally we keep all the removed GODEBUGs in the source, making
sure we do not reuse such GODEBUG after it is removed.
Updates #72111
Updates #75316
Change-Id: I6a6a6964cce1c100108fdba4bfba7d13cd9a893a
Reviewed-on: https://go-review.googlesource.com/c/go/+/701875 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Mateusz Poliwczak <mpoliwczak34@gmail.com> Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
Daniel McCarney [Mon, 3 Nov 2025 18:00:37 +0000 (13:00 -0500)]
crypto/tls: add BetterTLS test coverage
This commit adds test coverage of path building and name constraint
verification using the suite of test data provided by Netflix's
BetterTLS project.
Since the uncompressed raw JSON test data exported by BetterTLS for
external test integrations is ~31MB we use a similar approach to the
BoGo and ACVP test integrations and fetch the BetterTLS Go module, and
run its export tool on-the-fly to generate the test data in a tempdir.
As expected, all tests pass currently and this coverage is mainly
helpful in catching regressions, especially with tricky/cursed name
constraints.
Change-Id: I23d7c24232e314aece86bcbfd133b7f02c9e71b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/717420
TryBot-Bypass: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: Michael Pratt <mpratt@google.com>
Alexander Musman [Sat, 1 Nov 2025 11:44:39 +0000 (14:44 +0300)]
cmd/internal/obj: support arm64 FMOVQ large offset encoding
Support arm64 FMOVQ with large offset in immediate which is encoded
using register offset instruction in opldrr or opstrr. This will help
allowing folding immediate into new ssa ops FMOVQload and FMOVQstore.
For example: FMOVQ F0, -20000(R0) is encoded as following:
MOVD 3(PC), R27
FMOVQ F0, (R0)(R27)
RET
ffff b1e0 # constant value
Change-Id: Ib71f92f6ff4b310bda004a440b1df41ffe164523
Reviewed-on: https://go-review.googlesource.com/c/go/+/716960 Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
matloob@golang.org [Tue, 28 Oct 2025 15:18:02 +0000 (11:18 -0400)]
cmd/go/testdata/script: loosen list_empty_importpath for freebsd
We've been seeing the flakes where we get a 'no errors' output on
freebsd in addition to windows and solaris. Also allow that case to
avoid flakes.
For #73976
Change-Id: I6a6a696445ec908b55520d8d75e7c1f867b9c092
Reviewed-on: https://go-review.googlesource.com/c/go/+/715640 Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@google.com> Reviewed-by: Ian Alexander <jitsu@google.com>
Youlin Feng [Sat, 25 Oct 2025 03:49:30 +0000 (11:49 +0800)]
runtime: update outdated comments for deferprocStack
Change-Id: I0ea4d15da163cec6fe2a703376ce5a6032e15484
Reviewed-on: https://go-review.googlesource.com/c/go/+/714861 Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Maxim Merzhanov [Sun, 2 Nov 2025 11:28:31 +0000 (11:28 +0000)]
internal/profile: optimize Parse allocs
In our case, it greatly improves the performance of continuously collecting diff profiles from the net/http/pprof endpoint, such as /debug/pprof/allocs?seconds=30.
This CL is a cherry-pick of my PR upstream: https://github.com/google/pprof/pull/951
Benchmark of profile Parse func:
goos: linux
goarch: amd64
pkg: github.com/google/pprof/profile
cpu: 13th Gen Intel(R) Core(TM) i7-1360P
│ old-parse.txt │ new-parse.txt │
│ sec/op │ sec/op vs base │
Parse-16 62.07m ± 13% 55.54m ± 13% -10.52% (p=0.035 n=10)
Youlin Feng [Fri, 31 Oct 2025 02:45:26 +0000 (10:45 +0800)]
runtime: remove the pc field of _defer struct
Since we always can get the address of `CALL runtime.deferreturn(SB)`
from the unwinder, so it is not necessary to record the caller's pc
in the _defer struct. For the stack allocated _defer, this CL makes
the frame smaller.
Change-Id: I0fd347e4bc07cf8a9b954816323df30fc52552b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/716720 Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Filippo Valsorda [Wed, 29 Oct 2025 12:05:19 +0000 (13:05 +0100)]
crypto/internal/constanttime: expose intrinsics to the FIPS 140-3 packages
Intrinsifying things inside the module (crypto/internal/fips140/subtle)
is asking for trouble, as the import paths are rewritten by the
GOFIPS140 mechanism, and we might have to support multiple modules
in the future.
Importing crypto/subtle from inside a FIPS 140-3 module is not allowed,
and is basically asking for circular dependencies.
Instead, break off the intrinsics into their own package
(crypto/internal/constanttime), and keep the byte slice operations
in crypto/internal/fips140/subtle. crypto/subtle then becomes a thin
dispatch layer.
Change-Id: I6a6a6964cd5cb5ad06e9d1679201447f5a811da4
Reviewed-on: https://go-review.googlesource.com/c/go/+/716120 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
David Finkel [Sun, 24 Aug 2025 19:15:06 +0000 (15:15 -0400)]
cmd/go: skip git sha256 tests if git < 2.29
Fix test building on older Ubuntu LTS releases (that are still
supported). Git SHA256 support was only included in 2.29, which came out
in 2021. Check the output of `git version` and skip these tests if the
version is older than that introduction.
Thanks to @ianlancetaylor for flagging this.
Updates: #73704
Change-Id: I9d413a63fa43f34f94c274bba7f7b883c80433b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/698835
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
Auto-Submit: Michael Matloob <matloob@google.com> Reviewed-by: Ian Alexander <jitsu@google.com>
Nicholas S. Husin [Sat, 1 Nov 2025 15:15:58 +0000 (11:15 -0400)]
runtime: prevent time.Timer.Reset(0) from deadlocking testing/synctest tests
In Go 1.23+, timer channels behave synchronously. When we have a timer
channel (i.e. !async && t.isChan) we would lock the
runtime.timer.sendLock mutex at the beginning of
runtime.timer.modify()'s execution.
Calling time.Timer.Reset(0) within a testing/synctest test,
unfortunately, causes it to hang indefinitely. This is because the
runtime.timer.sendLock mutex ends up being locked twice before it could
be unlocked:
- When calling time.Timer.Reset(), runtime.timer.modify() would lock the
mutex per usual.
- Due to the 0 argument, runtime.timer.modify() would also try to
execute the bubbled timer immediately rather than adding them to a
heap. However, in doing so, it uses runtime.timer.unlockAndRun(),
which also locks the same mutex.
This CL solves this issue by making sure that a locked
runtime.timer.sendLock mutex is unlocked first, whenever we try to
execute bubbled timer immediately in the stack.
Fixes #76052
Change-Id: I66429b9bf6971400de95dcf2d5dc9670c3135492
Reviewed-on: https://go-review.googlesource.com/c/go/+/716883 Reviewed-by: Damien Neil <dneil@google.com>
Auto-Submit: Nicholas Husin <nsh@golang.org> Reviewed-by: Nicholas Husin <husin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Michael Anthony Knyszek [Wed, 30 Jul 2025 00:36:40 +0000 (00:36 +0000)]
runtime: prioritize panic output over racefini
For some reason CL 646198 uncovered #3934 and #20018 again, but only in
race mode. It turns out that because racefini does not return, and
racefini is called early after main returns, we would not properly wait
for a concurrent panic to complete. This would result in fairly
consistent failures of TestPanicRace, which specifically looks for the
panic output to appear if main concurrently exits.
The important part of this change is that race mode will no longer have
the bug described in #3934 and #20018. A byproduct, however, is that
racefini is that we're essentially prioritizing the panic output over
racefini in this scenario. If racefini were to reveal a latent race
condition and fail, we'll prefer to surface the panic. Such a case is
probably fine, because the panic is always an crashing, unrecoverable
panic.
For #3934.
For #20018.
Change-Id: I0674a75c918563c5ec4ee1eec057dfd096fcfbc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/691795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Michael Anthony Knyszek [Thu, 2 Oct 2025 17:16:49 +0000 (17:16 +0000)]
runtime: optimistically CAS atomicstatus directly in enter/exitsyscall
This change steals the performance trick from the coro implementation to
try to do the CAS directly first before calling into casgstatus, a much
more heavyweight function. We have to be careful about synctest
bubbling, but overall it's a good bit faster, and easy low-hanging
fruit.
Michael Anthony Knyszek [Mon, 3 Feb 2025 16:53:47 +0000 (16:53 +0000)]
runtime: don't track scheduling latency for _Grunning <-> _Gsyscall
The current logic causes much more tracking than necessary, when really
_Grunning and _Gsyscall are both sort of "running" from the perspective
of tracking scheduling latency.
This makes cgo calls and syscalls a little faster in the single-threaded
case, and shows much larger improvement in the multi-threaded case
by removing updates of shared variables (though this parallel
microbenchmark is a little unrealistic, so don't ascribe too much weight
to it).
Michael Anthony Knyszek [Wed, 1 Oct 2025 20:50:57 +0000 (20:50 +0000)]
runtime: document tracer invariants explicitly
This change is a documentation update for the execution tracer. Far too
much is left to small comments scattered around places. This change
accumulates the big important trace invariants, with rationale, into one
file: trace.go.
Change-Id: I5fd1402a3d8fdf14a0051e305b3a8fb5dfeafcb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/708398
Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Michael Anthony Knyszek [Sun, 2 Feb 2025 19:50:39 +0000 (19:50 +0000)]
runtime: eliminate _Psyscall
This change eliminates the _Psyscall state by using synchronization on
the G status _Gsyscall to make syscalls work instead. This removes an
atomic Store and an atomic CAS on the syscall path, which reduces
syscall and cgo overheads. It also simplifies the syscall paths quite a
bit.
The one danger with this change is that we have a new combination of
states that was previously impossible. There are brief windows where
it's possible to observe a goroutine in _Grunning but without a P. This
change is careful to hide this detail from the execution tracer, but it
may have unexpected effects in the rest of the runtime, making this
change somewhat risky.
Russ Cox [Wed, 29 Oct 2025 17:37:52 +0000 (13:37 -0400)]
runtime: delete timediv
Now that the compiler handles constant 64-bit divisions
without function calls on 32-bit systems, we no longer need
to maintain and test a bad custom implementation of 64-bit division.
Change-Id: If28807ad4f86507267ae69bc8f0b09ec18e98b66
Reviewed-on: https://go-review.googlesource.com/c/go/+/716463
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
Russ Cox [Wed, 29 Oct 2025 16:09:18 +0000 (12:09 -0400)]
strconv: remove arch-specific decision in formatBase10
There is only one architecture-specific code segment left in formatBase10.
Remove it for simplicity.
The only affected system is ppc64le, which does add 10-20% to the
runtime, but that's a ppc64le problem, not a strconv problem.
Changing the "uint32" to "uint" makes ppc64le not slower anymore,
meaning that somehow uint32 divide-by-constant is slower than
uint divide-by-constant on ppc64le. If this minor slowdown matters,
it should be addressed by improving the generated code for
ppc64le division, not by complicating strconv.
Even though some percentages look big, the geomean is +6% and
the worst case slowdown is only about 6ns/call.
Ian Lance Taylor [Sat, 25 Oct 2025 04:41:52 +0000 (21:41 -0700)]
reflect: correct internal docs for uncommonType
This updates the doc to reflect the change in CL 19790 from 2016.
Change-Id: I1017babf6660aa3b4929755e2eccbe3168b7860c
Reviewed-on: https://go-review.googlesource.com/c/go/+/714880 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Russ Cox [Wed, 29 Oct 2025 11:27:38 +0000 (07:27 -0400)]
cmd/compile/internal/ssa: model right shift more precisely
Prove currently checks for 0 sign bit extraction (x>>63) at the
end of the pass, but it is more general and more useful
(and not really more work) to model right shift during
value range tracking. This handles sign bit extraction (both 0 and -1)
but also makes the value ranges available for proving bounds checks.
'go build -a -gcflags=-d=ssa/prove/debug=1 std'
finds 105 new things to prove.
https://gist.github.com/rsc/8ac41176e53ed9c2f1a664fc668e8336
For example, the compiler now recognizes that this code in
strconv does not need to check the second shift for being ≥ 64.
msb := xHi >> 63
retMantissa := xHi >> (msb + 38)
nor does this code in regexp:
return b < utf8.RuneSelf && specialBytes[b%16]&(1<<(b/16)) != 0
This code in math no longer has a bounds check on the first index:
if 0 <= n && n <= 308 {
return pow10postab32[uint(n)/32] * pow10tab[uint(n)%32]
}
The diff shows one "lost" proof in ycbcr.go but it's not really lost:
the expression was folded to a constant instead, and that only shows
up with debug=2. A diff of that output is at
https://gist.github.com/rsc/9139ed46c6019ae007f5a1ba4bb3250f
Change-Id: I84087311e0a303f00e2820d957a6f8b29ee22519
Reviewed-on: https://go-review.googlesource.com/c/go/+/716140
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: David Chase <drchase@google.com>
Russ Cox [Mon, 27 Oct 2025 23:41:39 +0000 (19:41 -0400)]
cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.
Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:
For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:
The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.
The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.
This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.
pkg: cmd/compile/internal/test
benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base
DivconstI64 ~ ~ ~ -49.66% -21.02%
ModconstI64 ~ ~ ~ -13.45% +14.52%
DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32%
DivisibleconstI64 ~ ~ ~ -20.01% -48.28%
DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74%
DivconstU64/3 ~ ~ ~ -13.82% -4.09%
DivconstU64/5 ~ ~ ~ -14.10% -3.54%
DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55%
DivconstU64/1234567 ~ ~ ~ -61.55% -56.93%
ModconstU64 ~ ~ ~ -6.25% ~
DivisibleconstU64 ~ ~ ~ -2.78% -7.82%
DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56%
pkg: math/bits
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Add ~ ~ ~ ~
Add32 +1.59% ~ ~ ~
Add64 ~ ~ ~ ~
Add64multiple ~ ~ ~ ~
Sub ~ ~ ~ ~
Sub32 ~ ~ ~ ~
Sub64 ~ ~ -9.20% ~
Sub64multiple ~ ~ ~ ~
Mul ~ ~ ~ ~
Mul32 ~ ~ ~ ~
Mul64 ~ ~ -41.58% -53.21%
Div ~ ~ ~ ~
Div32 ~ ~ ~ ~
Div64 ~ ~ ~ ~
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Mul/P224 ~ ~ -29.95% -39.60%
Mul/P384 ~ ~ -37.11% -63.33%
Mul/P521 ~ ~ -26.62% -12.42%
Square/P224 +1.46% ~ -40.62% -49.18%
Square/P384 ~ ~ -45.51% -69.68%
Square/P521 +90.37% ~ -25.26% -11.23%
(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)
pkg: crypto/internal/fips140/edwards25519
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
EncodingDecoding ~ ~ -34.67% -35.75%
ScalarBaseMult ~ ~ -31.25% -30.29%
ScalarMult ~ ~ -33.45% -32.54%
VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68%
Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
Boris Nagaev [Wed, 22 Oct 2025 14:26:30 +0000 (14:26 +0000)]
crypto/internal/fips140/aes: fix CTR generator
Fixed two issues in AVO based generator of amd64 asm code.
1. Updated golang.org/x/tools dependency to prevent build issue in Go 1.25.
> golang.org/x/tools@v0.24.0/internal/tokeninternal/tokeninternal.go:64:9:
> invalid array length -delta * delta (constant -256 of type int64)
This error was caused by changes in layout of data structures in Go. Package
golang.org/x/tools has a mirror of that struct and a static assert that it
matches the Go's struct.
2. Changed the package name from crypto/aes to crypto/internal/fips140/aes.
This fixed run time error:
> ctr_amd64_asm.go:31: could not find function "ctrBlocks1Asm"
and other errors
Now the following works as expected:
$ cd src/crypto/internal/fips140/aes/_asm/ctr/
$ go generate
The command re-generates file "src/crypto/internal/fips140/aes/ctr_amd64.s".
Fixes #75972
Change-Id: I28e4c9ebb5bf72506a524e36a0c81a1b50367a84
GitHub-Last-Rev: afc9f506e50df6dc25fd285d5a597b0e5c93b5d9
GitHub-Pull-Request: golang/go#75973
Reviewed-on: https://go-review.googlesource.com/c/go/+/712920 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Robert Griesemer [Wed, 29 Oct 2025 22:22:14 +0000 (15:22 -0700)]
go/types, types: proceed with correct (invalid) type in case of a selector error
Fixes #76103.
Change-Id: Idc2f5d1d7aeb4a9b468e7c268e3bf5b85d1c3777
Reviewed-on: https://go-review.googlesource.com/c/go/+/716300 Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Russ Cox [Thu, 23 Oct 2025 02:22:51 +0000 (22:22 -0400)]
cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite
divisibility checks and div, mod, and mul. These happen after
prove, so that prove can make better sense of the code for
deriving bounds, and they must run before decompose, so that
64-bit ops can be lowered to 32-bit ops on 32-bit systems.
And then they need another generic pass as well, to optimize
the generated code before decomposing.
The three opt passes are "opt", "middle opt", and "late opt".
(Perhaps instead they should be "generic", "opt", and "late opt"?)
The "late opt" pass repeats the "middle opt" work on any new code
that has been generated in the interim.
There will not be new divs or mods, but there may be new muls.
The x%c==0 rewrite rules are much simpler now, since they can
match before divs have been rewritten. This has the effect of
applying them more consistently and making the rewrite rules
independent of the exact div rewrites.
Prove is also now charged with marking signed div/mod as
unsigned when the arguments call for it, allowing simpler
code to be emitted in various cases. For example,
t.Seconds()/2 and len(x)/2 are now recognized as unsigned,
meaning they compile to a simple shift (unsigned division),
avoiding the more complex fixup we need for signed values.
https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32
shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std'
output before and after. "Proved Rsh64x64 shifts to zero" is replaced
by the higher-level "Proved Div64 is unsigned" (the shift was in the
signed expansion of div by constant), but otherwise prove is only
finding more things to prove.
One short example, in code that does x[i%len(x)]:
< runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero
---
> runtime/mfinal.go:131:34: Proved Div64 is unsigned
> runtime/mfinal.go:131:38: Proved IsInBounds
A longer example:
< crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero
---
> crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds
These diffs are due to the smaller opt being better
and taking work away from prove:
In the old opt, Mul by 8 was rewritten to Lsh by 3 early.
This CL delays that rule to help prove recognize mods,
but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8].
Specifically, computing the length, opt can now do:
The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)),
Leaving the multiply as Mul enables using that step; the old
rewrite to Lsh blocked it, leaving prove to figure out the length
and then remove the bounds checks. But now opt can evaluate
the length down to a constant 8 and then constant-fold away
the bounds checks 0 < 8, 1 < 8, and so on. After that,
the compiler has nothing left to prove.
Benchmarks are noisy in general; I checked the assembly for the many
large increases below, and the vast majority are unchanged and
presumably hitting the caches differently in some way.
The divisibility optimizations were not reliably triggering before.
This leads to a very large improvement in some cases, like
DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems
and DivisbleconstU64 on 32-bit systems.
Another way the divisibility optimizations were unreliable before
was incorrectly triggering for x/3, x%3 even though they are
written not to do that. There is a real but small slowdown
in the DivisibleWDivconst benchmarks on Mac because in the cases
used in the benchmark, it is still faster (on Mac) to do the
divisibility check than to remultiply.
This may be worth further study. Perhaps when there is no rotate
(meaning the divisor is odd), the divisibility optimization
should be enabled always. In any event, this CL makes it possible
to study that.
Russ Cox [Mon, 27 Oct 2025 02:51:14 +0000 (22:51 -0400)]
test/codegen: simplify asmcheck pattern matching
Separate patterns in asmcheck by spaces instead of commas.
Many patterns end in comma (like "MOV [$]123,") so separating
patterns by comma is not great; they're already quoted, so spaces are fine.
Also replace all tabs in the assembly lines with spaces before matching.
Finally, replace \$ or \\$ with [$] as the matching idiom.
The effect of all these is to make the patterns look like:
// amd64:"BSFQ" "ORQ [$]256"
instead of the old:
// amd64:"BSFQ","ORQ\t\\$256"
Update all tests as well.
Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807
Reviewed-on: https://go-review.googlesource.com/c/go/+/716060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Russ Cox [Wed, 29 Oct 2025 02:00:26 +0000 (22:00 -0400)]
runtime: use internal/strconv
Runtime doing its own number formatting dates back to
when runtime was the bottom-most Go package.
Those days are long gone. Use internal/strconv to avoid
duplicating code and also to get better floating-point
formatting:
% go1.24.6 run x.go
+1.234568e+004
% go run x.go
12345.678
%
With accurate floating point it becomes necessary to
introduce separate printers for float32 vs float64 and
for complex64 vs complex128. Otherwise float32(93.7)
prints as 93.69999694824219.
Change-Id: I25ae3f09519342dc3d1dcabf4711651423e00128
Reviewed-on: https://go-review.googlesource.com/c/go/+/716002 Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Michael Pratt [Mon, 27 Oct 2025 19:34:18 +0000 (15:34 -0400)]
internal/runtime/gc/scan: avoid memory destination on VPCOMPRESSQ
On AMD Genoa / Zen 4, VPCOMPRESSQ with a memory destination imposes a
severe performance penalty of another an order of magnitude compared to
a register destination.
We can trivially work around this penalty with a register destination
and an additional move to memory.
Benchmark results from:
$ go test -bench=BenchmarkScanSpanPacked/.*/.*/.*/.*/impl=Platform internal/runtime/gc/scan
I've only included the summarized geomean here because there are ~2500
unique test cases.
AMD Genoa (Zen 4):
cpu: AMD EPYC 9B14 96-Core Processor
│ mem │ reg │
│ sec/op │ sec/op vs base │
geomean 1.039µ 310.1n -70.16%
│ mem │ reg │
│ B/s │ B/s vs base │
geomean 2.906Gi 10.99Gi +278.27%
As expected, we see a massive performance improvement on Genoa.
AMD Turin (Zen 5):
cpu: AMD EPYC 9B45 128-Core Processor
│ mem │ reg │
│ sec/op │ sec/op vs base │
geomean 231.9n 237.3n +2.32%
│ mem │ reg │
│ B/s │ B/s vs base │
geomean 14.79Gi 14.43Gi -2.50%
On Turin there is a minor regression. This is primarily due to a fairly
large regression (~15%) in very small microbenchmark cases where the
entire memory fits in L1 cache. This regression disappears as memory
access slows down with larger memories. The latter should be more common
in real workloads.
Intel Sapphire Rapids:
cpu: Intel(R) Xeon(R) Platinum 8481C
│ mem │ reg │
│ sec/op │ sec/op vs base │
geomean 254.9n 246.8n -3.18%
│ mem │ reg │
│ B/s │ B/s vs base │
geomean 13.65Gi 14.15Gi +3.69%
On Sapphire Rapids there is a minor improvement. Here results are fairly
noisy. Most cases are a wash, but some are arbitrary 20% slower or 20%
faster for unclear reasons.
For #73581.
Change-Id: I6a6a636cfd294a0dcdc4f34c9ece1bc9a6e5e4c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/715362 Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
Ian Lance Taylor [Mon, 27 Oct 2025 23:51:01 +0000 (16:51 -0700)]
runtime: define PanicBounds in funcdata.h
The comment in funcdata.h says that the constants must agree
with those in internal/abi/symtab.go. Make that so.
Change-Id: Ib64146bfb31fdecfc1cc6ae03ae746a1b4a4d22e
Reviewed-on: https://go-review.googlesource.com/c/go/+/715521
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Mark Freeman [Thu, 23 Oct 2025 20:48:00 +0000 (16:48 -0400)]
go/types, types2: reduce locks held at once in resolveUnderlying
There is no need to hold locks for the entire chain of Named types in
resolveUnderlying. This change moves the locking / unlocking right to
where t.underlying is set.
This change consolidates logic into resolveUnderlying where possible
and makes minor stylistic / documentation adjustments.
Change-Id: Ic5ec5a7e9a0da8bc34954bf456e4e23a28df296d
Reviewed-on: https://go-review.googlesource.com/c/go/+/714403 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The last three will become 1 lat-cycle (0.25ns) faster once #76066 is fixed.
The Select being that fast with the old code is really impressive.
I am pretty sure this happens because my CPU has BMI1&2 support and
a fusing unit able to translate non BMI code into BMI code.
This benchmark doesn't capture the CACHE gains from the shorter assembly.
It currently compiles as:
v17 = TESTQ <flags> v31 v31 // v != 0
v20 = CMOVQNE <int> v32 v33 v17 (y[int])
It is possible to remove the `TESTQ` by compiletime fusing it with the
compare in a pattern like this:
subtle.ConstantTimeSelect(subtle.ConstantTimeLessOrEq(left, right), right, left)
Saving 2 latency-cycles (1 with #76066 fixed).
Updates #76056
Change-Id: I61a1df99e97a1506f75dae13db529f43846d8f1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/715045 Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com>
Michael Pratt [Mon, 27 Oct 2025 20:17:31 +0000 (16:17 -0400)]
internal/runtime/gc/scan: correct size class size check
This check intends to skip size classes that are too big for scanSpan,
but it compares the size class index with a byte size. It must do the
conversion first.
For #73581.
Change-Id: I6a6a636c8d19fa3bf2a2b609870d67d33f47f66e
Reviewed-on: https://go-review.googlesource.com/c/go/+/715460
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Jorropo [Sat, 25 Oct 2025 16:38:23 +0000 (18:38 +0200)]
cmd/compile: remove 68857 ModU flowLimit workaround in prove
We can know this is correct because all the testcases added by CL 605156 are still passing.
Partial revert of CL 605156 (everything but the testcases).
Change-Id: I5d8daadb4cb35a9de29daaabc22baee642511fe0
Reviewed-on: https://go-review.googlesource.com/c/go/+/714941 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Jorropo [Sat, 25 Oct 2025 13:34:02 +0000 (15:34 +0200)]
cmd/compile: remove 68857 min & max flowLimit workaround in prove
We can know this is correct because all the testcases added by CL 656157 are still passing.
Partial revert of CL 656157 (everything but the testcases).
Change-Id: I24931fa1affba7e9e92233b3de74ebade3d48a09
Reviewed-on: https://go-review.googlesource.com/c/go/+/714921
Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Mark Freeman [Thu, 23 Oct 2025 20:25:28 +0000 (16:25 -0400)]
go/types, types2: clarify docs for resolveUnderlying
The resolveUnderlying method only detects cycles among type names, where
no type literal or predeclared type can be found (which would yield an
underlying type).
Change-Id: I203f3856eaf63a8a9d317c22521755390f9c1023
Reviewed-on: https://go-review.googlesource.com/c/go/+/714402 Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Mark Freeman [Wed, 22 Oct 2025 18:04:09 +0000 (14:04 -0400)]
go/types, types2: verify stateMask transitions in debug mode
Recently, we've changed the representation of Named type state from
an integer to a bit mask, which is a bit more complicated. To make
sure we uphold state invariants, we are adding a verification step
on each state transition.
This uncovered a few places where we do not uphold the transition
invariants; those are patched in this CL.
Change-Id: I76569e4326b2d362d7a1f078641029ffb3dca531
Reviewed-on: https://go-review.googlesource.com/c/go/+/714241
Auto-Submit: Mark Freeman <markfreeman@google.com> Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Ian Lance Taylor [Fri, 24 Oct 2025 21:57:55 +0000 (14:57 -0700)]
runtime: remove unused cgoCheckUsingType function
The only calls to it were removed in CL 616255.
Change-Id: I6c6b01e2e98d54300b6323fd74ccc45fa1d433dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/714820 Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
baycore [Mon, 27 Oct 2025 04:43:32 +0000 (04:43 +0000)]
time: rewrite IsZero method to use wall and ext fields
Using wall and ext fields will be more efficient.
Fixes #76001
Change-Id: If2b9f597562e0d0d3f8ab300556fa559926480a0
GitHub-Last-Rev: 4a91948413079047cb6c382ed29844f456f3064d
GitHub-Pull-Request: golang/go#76006
Reviewed-on: https://go-review.googlesource.com/c/go/+/713720 Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>
thepudds [Sat, 25 Oct 2025 04:49:45 +0000 (00:49 -0400)]
runtime: add GOEXPERIMENT=runtimefree
This CL is part of a series of CLs to triangulate between the runtime,
compiler, and standard library to reduce how much work the GC must do.
An overall design document is in CL 700255.
This CL stack implements a runtime.free within the runtime, and
then uses it via automatic calls inserted by the compiler when
the compiler proves it is safe to do so. In the future, we can
also consider possibly a limited set of explicit calls from certain
low-level portions of the standard library.
When called, runtime.free allows immediate reuse of memory
without waiting for a GC cycle. The goals include less overall
CPU usage by the GC, longer times between GC cycles
(with less overall time with the write barrier enabled),
and more cache-friendly allocations for user code.
Here, we just add the GOEXPERIMENT=runtimefree flag. It currently
defaults to on, but can be disabled with GOEXPERIMENT=noruntimefree.
The actual implementation starts in CL 673695.
Updates #74299
Change-Id: I2f1f04dbdca51f4aaa735fd65bb2719c298d922e
Reviewed-on: https://go-review.googlesource.com/c/go/+/700235 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
cmd/compile: use MOV(D|F) with const for Const(64|32)F on riscv64
The original Const64F using: AUIPC + LD + FMVDX to load
float64 const, we can use AUIPC + FLD instead, same as Const32F.
Change-Id: I8ca0a0e90d820a26e69b74cd25df3cc662132bf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/703215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>