mohanson [Tue, 4 Nov 2025 14:19:03 +0000 (22:19 +0800)]
cmd/go: remove redundant AVX regex in security flag checks
Remove "-m(no-)?avx[0-9a-z]*" from validCompilerFlags in security.go,
because "-m(no-)?avx[0-9a-z.]*" (see line 108) already covers its
matching range.
Change-Id: Ic86a45eefa7639ed3a8cdee95f08021d9515678e
Reviewed-on: https://go-review.googlesource.com/c/go/+/717740 Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
Large integer constants can take up to 4 instructions to encode.
We can encode some large constants with a single instruction, namely
those which are bit patterns (repetitions of certain runs of 0s and 1s).
Often the constants we want to encode are *close* to those bit patterns,
but don't exactly match. For those, we can use 2 instructions, one to
load the close-by bit pattern and one to fix up any mismatches.
The constants we use to strength reduce divides often fit this pattern.
For unsigned divides by 1 through 15, this CL applies to the constant
for N=3,5,6,10,12,15.
Triggers 17 times in hello world.
Change-Id: I623abf32961fb3e74d0a163f6822f0647cd94499
Reviewed-on: https://go-review.googlesource.com/c/go/+/717900
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
khr@golang.org [Sat, 8 Nov 2025 19:11:10 +0000 (11:11 -0800)]
runtime/msan: use different msan routine for copying
__msan_memmove records the fact that we're copying memory, and
actually does the copy. Use instead __msan_copy_shadow, which
records the fact that we're copying memory, but doesn't actually
do the copy itself.
We're doing the copy ourselves, so we don't need msan to do it also.
More importantly, msan doing the copy clobbers the target before
we issue the write barrier, which causes pointers to get lost.
Fixes #76138
Change-Id: I17aea739f9444de21fac2bbfd81e48534a39481d
Reviewed-on: https://go-review.googlesource.com/c/go/+/719020 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: t hepudds <thepudds1460@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Radu Berinde <radu@cockroachlabs.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Alexander [Fri, 12 Sep 2025 22:00:02 +0000 (18:00 -0400)]
cmd/go: update goSum if necessary
During investigation into #74691, we found a case where the hash was
missing from the goSum global var. This change updates DownloadZip to
update the goSum hash if necessary when both the zipfile and
ziphashfile are already present.
For #74691
Change-Id: I904b7265f667256d0c60b66503a7954b467c6454
Reviewed-on: https://go-review.googlesource.com/c/go/+/705215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
Ian Lance Taylor [Thu, 6 Nov 2025 22:03:59 +0000 (14:03 -0800)]
cmd/link: clean up some comments to Go standards
Also drop a couple of unused names.
Change-Id: I0f09775a276c34ece7809cf5d3f7c6b8cd1fa487
Reviewed-on: https://go-review.googlesource.com/c/go/+/718580
Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
runtime: correctly print panics before fatal-ing on defer
Fixes #67792
Change-Id: I93d16580cb31e54cee7c8490212404e4d0dec446
Reviewed-on: https://go-review.googlesource.com/c/go/+/613757 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Ariel Otilibili [Wed, 5 Nov 2025 21:00:52 +0000 (21:00 +0000)]
runtime/cgo: improve error messages after pointer panic
This CL improves the error messages after panics due to the
sharing of an unpinned Go pointer (or a pointer to an unpinned Go
pointer) between Go and C.
This occurs when it is:
1. returned from Go to C (through cgoCheckResult)
2. passed as argument to a C function (through cgoCheckPointer).
An unpinned Go pointer refers to a memory location that might be moved or
freed by the garbage collector.
Therefore:
- change the signature of cgoCheckArg (it does the real work behind
cgoCheckResult and cgoCheckPointer)
- change the signature of cgoCheckUnknownPointer (called by cgoCheckArg
for checking unexpected pointers)
- introduce cgoFormatErr (it is called by cgoCheckArg and
cgoCheckUnknownPointer to format panic error messages)
- update the cgo pointer tests (add new tests, and a field errTextRegexp
to the struct ptrTest)
- remove a loop variable in TestPointerChecks (similar to CL 711640).
1. cgoCheckResult
When an unpinned Go pointer (or a pointer to an unpinned Go pointer) is
returned from Go to C,
Ian Lance Taylor [Thu, 6 Nov 2025 00:13:22 +0000 (16:13 -0800)]
cmd/link: move pclntab out of relro section
The .gopclntab section should not have any relocations.
Move it out of relro to regular rodata.
Note that this is tested by tests like TestNoTextrel in
cmd/cgo/internal/testshared.
The existing test TestMachoSectionsReadOnly looks for sections in a
Mach-O file that are read-only after relocations are applied
(this is marked by a segment with a flags field set to 0x10).
We remove the __gopclntab section, as that section is now read-only
at all times, not only after relocating.
For #76038
Change-Id: I7f837e423bf1e802509277f5dc7fdd1ed0228e32
Reviewed-on: https://go-review.googlesource.com/c/go/+/718065 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Change-Id: I8983467495f587cf46083fd81cb024400c7dc2a7
Reviewed-on: https://go-review.googlesource.com/c/go/+/716804 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Change-Id: Ibc0bf926befaa2f810cfedd9a40f7ad9a6a9d7fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/716803
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Guoqi Chen [Thu, 30 Oct 2025 11:47:25 +0000 (19:47 +0800)]
internal/chacha8rand: replace VORV with instruction VMOVQ on loong64
Change-Id: Id67623f403abfca54a04fc4c47792cd5b6d5ab73
Reviewed-on: https://go-review.googlesource.com/c/go/+/716802 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: Meidan Li <limeidan@loongson.cn>
Guoqi Chen [Tue, 4 Nov 2025 09:22:24 +0000 (17:22 +0800)]
cmd/compile: fix error message on loong64
Change-Id: I90428330b17ab9f93ae94a77cefc24464e225df5
Reviewed-on: https://go-review.googlesource.com/c/go/+/717700
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Russ Cox [Wed, 5 Nov 2025 19:46:35 +0000 (14:46 -0500)]
cmd/go: fix TestScript/govcs
On my Mac, TestScript/govcs was failing because hg prints a URL
using 1.0.0.127.in-addr.arpa instead of 127.0.0.1, and my Mac
cannot resolve that name back into an IP address.
Russ Cox [Wed, 5 Nov 2025 18:42:44 +0000 (13:42 -0500)]
cmd/go: fix TestCgoPkgConfig on darwin with pkg-config installed
Most darwin systems don't have pkg-config installed and skip this test.
(And it doesn't run in all.bash because it is skipped during -short.)
But for those systems that have pkg-config and run the non-short tests,
it fails because the code was not writing out the bar.pc on darwin and
then expecting pkg-config to be able to tell us about the bar package.
Change the code to write out the bar entry always.
Fixes one failing case in 'go test cmd/go' on my Mac.
Change-Id: Ibc4e9826a652ce2e7c609b905b159ccf2d5a6444
Reviewed-on: https://go-review.googlesource.com/c/go/+/718182 Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Russ Cox [Wed, 5 Nov 2025 18:49:48 +0000 (13:49 -0500)]
cmd/go: fix TestScript/vet_flags
Test caching can cause an incorrect test failure when the vet step result
is reused, leading to not printing a vet command line at all.
Avoid that with -a (we are also using -n so no real work is done).
Fixes one failing case in 'go test cmd/go' on my Mac.
Change-Id: I028284436b1ecc77145c886db902845b524f4b97
Reviewed-on: https://go-review.googlesource.com/c/go/+/718181
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Russ Cox [Wed, 5 Nov 2025 19:05:45 +0000 (14:05 -0500)]
cmd/go: fix TestScript/tool_build_as_needed
This test assumes that changing GOOS/GOARCH results in an
unrunnable binary, but that's not true if the user has
go_GOOS_GOARCH_exec programs in their path (like I do).
This test was timing out waiting to create a gomote to run
a windows/amd64 binary.
Rewrite the test not to assume that alternate GOOS/GOARCH
binaries are unrunnable.
Fixes one failing case in 'go test cmd/go' on my Mac.
Change-Id: Ib5f721f91e10d285820efb5995a3a9bc29833214
Reviewed-on: https://go-review.googlesource.com/c/go/+/718180
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Change-Id: I9aac0b6fb2985f6833976099e7eead1f28971bee
GitHub-Last-Rev: 1aacde448c922903980420cf8a38a4827d76ad28
GitHub-Pull-Request: golang/go#76186
Reviewed-on: https://go-review.googlesource.com/c/go/+/718060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
qmuntal [Thu, 30 Oct 2025 11:36:42 +0000 (12:36 +0100)]
os: ignore O_TRUNC errors on named pipes and terminal devices on Windows
Prior to Go 1.24, os.OpenFile used to support O_TRUNC on named pipes and
terminal devices, even when the truncation was really ignored. This
behavior was consistent with Unix semantics.
CL 618836 changed the implementation of os.OpenFile on Windows and
unintentionally started returning an error when O_TRUNC was used on such
files.
Fixes #76071
Change-Id: Id10d3d8120ae9aa0548ef05423a172ff4e502ff9
Reviewed-on: https://go-review.googlesource.com/c/go/+/716420 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Ian Lance Taylor [Mon, 3 Nov 2025 04:04:57 +0000 (20:04 -0800)]
cmd/link, runtime: don't store text start in pcHeader
The textStart field requires a relocation, the only relocation in pclntab.
And nothing uses it. So remove it. Replace it with a zero,
which can itself be removed at some point in coordination with Delve.
For #76038
Change-Id: I35675c0868c5d957bb375e40b804c516ae0300ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/717240 Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Sun, 2 Nov 2025 21:54:22 +0000 (13:54 -0800)]
cmd/link: don't generate .gosymtab section
Since Go 1.2 the section is always empty.
Also remove the code looking for .gosymtab in cmd/internal/objfile.
For #76038
Change-Id: Icd34c870ed0c6da8001e8d32305f79905ee2b066
Reviewed-on: https://go-review.googlesource.com/c/go/+/717200
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Commit-Queue: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
Ian Lance Taylor [Tue, 28 Oct 2025 21:01:48 +0000 (14:01 -0700)]
cmd/link: add and use new SymKind SFirstUnallocated
The linker sources in several places used SXREF to mark the first
SymKind which is not allocated in memory. This is cryptic.
Instead use SFirstUnallocated, following the example of the
existing SFirstWritable.
Change-Id: If326ad63027402699094bcc49ef860db3772f82a
Reviewed-on: https://go-review.googlesource.com/c/go/+/715623 Reviewed-by: Than McIntosh <thanm@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Ian Lance Taylor [Tue, 28 Oct 2025 20:35:53 +0000 (13:35 -0700)]
cmd/link: remove misleading comment
The comment suggests that the text section is briefly writable.
That is not the case. As the earlier part of the comment explains,
part of the text section is mapped twice, once r-x and once rw-.
It is never the case that there is writable executable memory.
Change-Id: I56841e19a8a08f2515f29752536a5c8f180ac8c9
Reviewed-on: https://go-review.googlesource.com/c/go/+/715622
Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Than McIntosh <thanm@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Ian Lance Taylor [Tue, 28 Oct 2025 05:45:45 +0000 (22:45 -0700)]
cmd/link: remove unused SFILEPATH symbol kind
Last reference appears to have been removed in CL 227759.
Change-Id: Ieb9da0a69a8beb96dcb5309ca43cf1df61d39bce
Reviewed-on: https://go-review.googlesource.com/c/go/+/715541
Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
Ian Lance Taylor [Tue, 28 Oct 2025 05:37:13 +0000 (22:37 -0700)]
cmd/link: add comments for SymKind values
Change-Id: Ie297a19a59362e0f32eae20e511e298a0a87ab6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/715540 Reviewed-by: Than McIntosh <thanm@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Daniel Morsing [Sat, 9 Aug 2025 19:30:30 +0000 (20:30 +0100)]
cmd/compile: faster liveness analysis in regalloc
Instead of iterating until dataflow stabilization, fill in values known to be
live at the beginning of loop headers. This reduces the number of passes
over the CFG from the depth of the loopnest to just 2.
In a test instrumented version of this change, run against
cmd/compile/internal/ssa, it brought the time spent in liveness analysis
down to 150.52ms from 225.49ms on my machine.
Change-Id: Ic72762eedfd1f10b1ba74c430ed62ab4ebd3ec5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/695255 Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
We don't need to check that the bit patterns of the constants
match, it is sufficient to just check the constant is equal to the
given value.
While we're here also change the FCLASSD rules to use a bit pattern
for the mask. I think this improves readability, particularly as
more uses of FCLASSD get added (e.g. CL 717560).
These changes should not affect codegen.
Change-Id: I92a6338dc71e6a71e04306f67d7d86016c6e9c47
Reviewed-on: https://go-review.googlesource.com/c/go/+/717580 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
cmd/compile/internal/ssa: more aggressive on dead auto elim
Propagate "unread" across OpMoves. If the addr of this auto is only used
by an OpMove as its source arg, and the OpMove's target arg is the addr
of another auto. If the 2nd auto can be eliminated, this one can also be
eliminated.
This CL eliminates unnecessary memory copies and makes the frame smaller
in the following code snippet:
func contains(m map[string][16]int, k string) bool {
_, ok := m[k]
return ok
}
These are the benchmark results followed by the benchmark code:
func init() {
for i := range 1000 {
m1[i] = i
m2[i] = [16]int{15:i}
m3[i] = [256]int{255:i}
}
}
func BenchmarkMap1Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m1[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap2Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m2[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap3Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m3[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
Fixes #75398
Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771
Reviewed-on: https://go-review.googlesource.com/c/go/+/702875 Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Ian Lance Taylor [Sun, 2 Nov 2025 04:34:53 +0000 (21:34 -0700)]
cmd/cgo: drop pre-1.18 support
Now that the bootstrap compiler is 1.24, it's no longer needed.
Change-Id: I9b3d6b7176af10fbc580173d50130120b542e7f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/717060 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Russ Cox [Sun, 2 Nov 2025 17:32:01 +0000 (12:32 -0500)]
internal/strconv: handle %f with fixedFtoa when possible
Everyone writes papers about fast shortest-output formatting.
Eventually we also sped up fixed-length formatting %e and %g.
But we've neglected %f, which falls back to the slow general code
even for relatively trivial things like %.2f on 1.23.
This CL uses the fast path fixedFtoa for %f when possible by
estimating the number of digits needed.
Roland Shoemaker [Mon, 27 Oct 2025 15:15:48 +0000 (08:15 -0700)]
encoding/pem: don't reslice in failure modes
We re-slice the data being processed at the stat of each loop. If the
var that we use to calculate where to re-slice is < 0 or > the length
of the remaining data, return instead of attempting to re-slice.
Change-Id: I1d6c2b6c596feedeea8feeaace370ea73ba02c4c
Reviewed-on: https://go-review.googlesource.com/c/go/+/715260
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Roland Shoemaker <roland@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
Russ Cox [Sat, 1 Nov 2025 13:41:40 +0000 (09:41 -0400)]
internal/strconv: extract fixed-precision ftoa from ftoaryu.go
The fixed-precision ftoa algorithm is not actually
documented in the Ryū paper, and it is fairly
straightforward: multiply by a power of 10 to get
an integer that contains the digits we need.
There is also no need for separate float32 and float64
implementations.
This CL implements a new fixedFtoa, separate from Ryū.
The overall algorithm is the same, but the new code
is simpler, faster, and better documented.
Now ftoaryu.go is only about shortest-output formatting,
so if and when yet another algorithm comes along, it will
be clearer what should be replaced (all of ftoaryu.go)
and what should not (all of ftoafixed.go).
Russ Cox [Sun, 2 Nov 2025 14:59:59 +0000 (09:59 -0500)]
internal/strconv: add tests and benchmarks for ftoaFixed
ftoaFixed is in the next CL; this proves the tests are correct
against the current implementation, and it adds a benchmark
for comparison with the new implementation.
Change-Id: I7ac8a1f699b693ea6d11a7122b22fc70cc135af6
Reviewed-on: https://go-review.googlesource.com/c/go/+/717181
Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Russ Cox [Sun, 2 Nov 2025 03:26:17 +0000 (23:26 -0400)]
internal/strconv: fix pow10 off-by-one in exponent result
The exact meaning of pow10 was not defined nor tested directly.
Define it as pow10(e) returns mant, exp where mant/2^128 * 2**exp = 10^e.
This is the most natural definition but is off-by-one from what
it had been returning. Fix the off-by-one and then adjust the
call sites to stop compensating for it.
Change-Id: I9ee475854f30be4bd0d4f4d770a6b12ec68281fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/717180
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Change-Id: Ifddc3d4d3fbaa6fee2e079bf2ebfe96a2febaa1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/716801 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change-Id: Ie23b2fdd09b4c93801dc804913206f1c5a496268
Reviewed-on: https://go-review.googlesource.com/c/go/+/716800 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Michael Anthony Knyszek [Thu, 30 Oct 2025 20:26:56 +0000 (20:26 +0000)]
runtime: allow Stack to traceback goroutines in syscall _Grunning window
net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of
all goroutines, and searches for a specific line in that stack trace.
It currently sometimes fails because it encounters the goroutine its
looking for in the small window where a goroutine might be in _Grunning
while in a syscall, introduced in CL 646198. In that case, the traceback
will give up, failing to print the stack TestCopyError is expecting.
This represents a general regression, since previously runtime.Stack
could never fail to take a goroutine's stack; giving up was only
possible in fatal panic cases.
Fix this the same way we fixed goroutine profiles: allow the stack trace
to proceed if the g's syscallsp != 0. This is safe in any
stop-the-world-related context, because syscallsp won't be mutated while
the goroutine fails to acquire a P, and thus fails to fully exit the
syscall context. This also means the stack below syscallsp won't be
mutated, and thus taking a traceback is also safe.
Fixes #66639.
Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/716680 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
cmd/go: link to go.dev/doc/godebug for removed GODEBUG settings
This makes the user experience better, before users would receive
an unknown godebug error message, now we explicitly mention that
it was removed and link to go.dev/doc/godebug where users can find
more information about the removal.
Additionally we keep all the removed GODEBUGs in the source, making
sure we do not reuse such GODEBUG after it is removed.
Updates #72111
Updates #75316
Change-Id: I6a6a6964cce1c100108fdba4bfba7d13cd9a893a
Reviewed-on: https://go-review.googlesource.com/c/go/+/701875 Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Mateusz Poliwczak <mpoliwczak34@gmail.com> Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
Daniel McCarney [Mon, 3 Nov 2025 18:00:37 +0000 (13:00 -0500)]
crypto/tls: add BetterTLS test coverage
This commit adds test coverage of path building and name constraint
verification using the suite of test data provided by Netflix's
BetterTLS project.
Since the uncompressed raw JSON test data exported by BetterTLS for
external test integrations is ~31MB we use a similar approach to the
BoGo and ACVP test integrations and fetch the BetterTLS Go module, and
run its export tool on-the-fly to generate the test data in a tempdir.
As expected, all tests pass currently and this coverage is mainly
helpful in catching regressions, especially with tricky/cursed name
constraints.
Change-Id: I23d7c24232e314aece86bcbfd133b7f02c9e71b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/717420
TryBot-Bypass: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: Michael Pratt <mpratt@google.com>
Alexander Musman [Sat, 1 Nov 2025 11:44:39 +0000 (14:44 +0300)]
cmd/internal/obj: support arm64 FMOVQ large offset encoding
Support arm64 FMOVQ with large offset in immediate which is encoded
using register offset instruction in opldrr or opstrr. This will help
allowing folding immediate into new ssa ops FMOVQload and FMOVQstore.
For example: FMOVQ F0, -20000(R0) is encoded as following:
MOVD 3(PC), R27
FMOVQ F0, (R0)(R27)
RET
ffff b1e0 # constant value
Change-Id: Ib71f92f6ff4b310bda004a440b1df41ffe164523
Reviewed-on: https://go-review.googlesource.com/c/go/+/716960 Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
matloob@golang.org [Tue, 28 Oct 2025 15:18:02 +0000 (11:18 -0400)]
cmd/go/testdata/script: loosen list_empty_importpath for freebsd
We've been seeing the flakes where we get a 'no errors' output on
freebsd in addition to windows and solaris. Also allow that case to
avoid flakes.
For #73976
Change-Id: I6a6a696445ec908b55520d8d75e7c1f867b9c092
Reviewed-on: https://go-review.googlesource.com/c/go/+/715640 Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@google.com> Reviewed-by: Ian Alexander <jitsu@google.com>
Youlin Feng [Sat, 25 Oct 2025 03:49:30 +0000 (11:49 +0800)]
runtime: update outdated comments for deferprocStack
Change-Id: I0ea4d15da163cec6fe2a703376ce5a6032e15484
Reviewed-on: https://go-review.googlesource.com/c/go/+/714861 Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Maxim Merzhanov [Sun, 2 Nov 2025 11:28:31 +0000 (11:28 +0000)]
internal/profile: optimize Parse allocs
In our case, it greatly improves the performance of continuously collecting diff profiles from the net/http/pprof endpoint, such as /debug/pprof/allocs?seconds=30.
This CL is a cherry-pick of my PR upstream: https://github.com/google/pprof/pull/951
Benchmark of profile Parse func:
goos: linux
goarch: amd64
pkg: github.com/google/pprof/profile
cpu: 13th Gen Intel(R) Core(TM) i7-1360P
│ old-parse.txt │ new-parse.txt │
│ sec/op │ sec/op vs base │
Parse-16 62.07m ± 13% 55.54m ± 13% -10.52% (p=0.035 n=10)
Youlin Feng [Fri, 31 Oct 2025 02:45:26 +0000 (10:45 +0800)]
runtime: remove the pc field of _defer struct
Since we always can get the address of `CALL runtime.deferreturn(SB)`
from the unwinder, so it is not necessary to record the caller's pc
in the _defer struct. For the stack allocated _defer, this CL makes
the frame smaller.
Change-Id: I0fd347e4bc07cf8a9b954816323df30fc52552b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/716720 Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Filippo Valsorda [Wed, 29 Oct 2025 12:05:19 +0000 (13:05 +0100)]
crypto/internal/constanttime: expose intrinsics to the FIPS 140-3 packages
Intrinsifying things inside the module (crypto/internal/fips140/subtle)
is asking for trouble, as the import paths are rewritten by the
GOFIPS140 mechanism, and we might have to support multiple modules
in the future.
Importing crypto/subtle from inside a FIPS 140-3 module is not allowed,
and is basically asking for circular dependencies.
Instead, break off the intrinsics into their own package
(crypto/internal/constanttime), and keep the byte slice operations
in crypto/internal/fips140/subtle. crypto/subtle then becomes a thin
dispatch layer.
Change-Id: I6a6a6964cd5cb5ad06e9d1679201447f5a811da4
Reviewed-on: https://go-review.googlesource.com/c/go/+/716120 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
David Finkel [Sun, 24 Aug 2025 19:15:06 +0000 (15:15 -0400)]
cmd/go: skip git sha256 tests if git < 2.29
Fix test building on older Ubuntu LTS releases (that are still
supported). Git SHA256 support was only included in 2.29, which came out
in 2021. Check the output of `git version` and skip these tests if the
version is older than that introduction.
Thanks to @ianlancetaylor for flagging this.
Updates: #73704
Change-Id: I9d413a63fa43f34f94c274bba7f7b883c80433b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/698835
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
Auto-Submit: Michael Matloob <matloob@google.com> Reviewed-by: Ian Alexander <jitsu@google.com>
Nicholas S. Husin [Sat, 1 Nov 2025 15:15:58 +0000 (11:15 -0400)]
runtime: prevent time.Timer.Reset(0) from deadlocking testing/synctest tests
In Go 1.23+, timer channels behave synchronously. When we have a timer
channel (i.e. !async && t.isChan) we would lock the
runtime.timer.sendLock mutex at the beginning of
runtime.timer.modify()'s execution.
Calling time.Timer.Reset(0) within a testing/synctest test,
unfortunately, causes it to hang indefinitely. This is because the
runtime.timer.sendLock mutex ends up being locked twice before it could
be unlocked:
- When calling time.Timer.Reset(), runtime.timer.modify() would lock the
mutex per usual.
- Due to the 0 argument, runtime.timer.modify() would also try to
execute the bubbled timer immediately rather than adding them to a
heap. However, in doing so, it uses runtime.timer.unlockAndRun(),
which also locks the same mutex.
This CL solves this issue by making sure that a locked
runtime.timer.sendLock mutex is unlocked first, whenever we try to
execute bubbled timer immediately in the stack.
Fixes #76052
Change-Id: I66429b9bf6971400de95dcf2d5dc9670c3135492
Reviewed-on: https://go-review.googlesource.com/c/go/+/716883 Reviewed-by: Damien Neil <dneil@google.com>
Auto-Submit: Nicholas Husin <nsh@golang.org> Reviewed-by: Nicholas Husin <husin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Michael Anthony Knyszek [Wed, 30 Jul 2025 00:36:40 +0000 (00:36 +0000)]
runtime: prioritize panic output over racefini
For some reason CL 646198 uncovered #3934 and #20018 again, but only in
race mode. It turns out that because racefini does not return, and
racefini is called early after main returns, we would not properly wait
for a concurrent panic to complete. This would result in fairly
consistent failures of TestPanicRace, which specifically looks for the
panic output to appear if main concurrently exits.
The important part of this change is that race mode will no longer have
the bug described in #3934 and #20018. A byproduct, however, is that
racefini is that we're essentially prioritizing the panic output over
racefini in this scenario. If racefini were to reveal a latent race
condition and fail, we'll prefer to surface the panic. Such a case is
probably fine, because the panic is always an crashing, unrecoverable
panic.
For #3934.
For #20018.
Change-Id: I0674a75c918563c5ec4ee1eec057dfd096fcfbc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/691795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Michael Anthony Knyszek [Thu, 2 Oct 2025 17:16:49 +0000 (17:16 +0000)]
runtime: optimistically CAS atomicstatus directly in enter/exitsyscall
This change steals the performance trick from the coro implementation to
try to do the CAS directly first before calling into casgstatus, a much
more heavyweight function. We have to be careful about synctest
bubbling, but overall it's a good bit faster, and easy low-hanging
fruit.
Michael Anthony Knyszek [Mon, 3 Feb 2025 16:53:47 +0000 (16:53 +0000)]
runtime: don't track scheduling latency for _Grunning <-> _Gsyscall
The current logic causes much more tracking than necessary, when really
_Grunning and _Gsyscall are both sort of "running" from the perspective
of tracking scheduling latency.
This makes cgo calls and syscalls a little faster in the single-threaded
case, and shows much larger improvement in the multi-threaded case
by removing updates of shared variables (though this parallel
microbenchmark is a little unrealistic, so don't ascribe too much weight
to it).
Michael Anthony Knyszek [Wed, 1 Oct 2025 20:50:57 +0000 (20:50 +0000)]
runtime: document tracer invariants explicitly
This change is a documentation update for the execution tracer. Far too
much is left to small comments scattered around places. This change
accumulates the big important trace invariants, with rationale, into one
file: trace.go.
Change-Id: I5fd1402a3d8fdf14a0051e305b3a8fb5dfeafcb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/708398
Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Michael Anthony Knyszek [Sun, 2 Feb 2025 19:50:39 +0000 (19:50 +0000)]
runtime: eliminate _Psyscall
This change eliminates the _Psyscall state by using synchronization on
the G status _Gsyscall to make syscalls work instead. This removes an
atomic Store and an atomic CAS on the syscall path, which reduces
syscall and cgo overheads. It also simplifies the syscall paths quite a
bit.
The one danger with this change is that we have a new combination of
states that was previously impossible. There are brief windows where
it's possible to observe a goroutine in _Grunning but without a P. This
change is careful to hide this detail from the execution tracer, but it
may have unexpected effects in the rest of the runtime, making this
change somewhat risky.
Russ Cox [Wed, 29 Oct 2025 17:37:52 +0000 (13:37 -0400)]
runtime: delete timediv
Now that the compiler handles constant 64-bit divisions
without function calls on 32-bit systems, we no longer need
to maintain and test a bad custom implementation of 64-bit division.
Change-Id: If28807ad4f86507267ae69bc8f0b09ec18e98b66
Reviewed-on: https://go-review.googlesource.com/c/go/+/716463
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
Russ Cox [Wed, 29 Oct 2025 16:09:18 +0000 (12:09 -0400)]
strconv: remove arch-specific decision in formatBase10
There is only one architecture-specific code segment left in formatBase10.
Remove it for simplicity.
The only affected system is ppc64le, which does add 10-20% to the
runtime, but that's a ppc64le problem, not a strconv problem.
Changing the "uint32" to "uint" makes ppc64le not slower anymore,
meaning that somehow uint32 divide-by-constant is slower than
uint divide-by-constant on ppc64le. If this minor slowdown matters,
it should be addressed by improving the generated code for
ppc64le division, not by complicating strconv.
Even though some percentages look big, the geomean is +6% and
the worst case slowdown is only about 6ns/call.
Ian Lance Taylor [Sat, 25 Oct 2025 04:41:52 +0000 (21:41 -0700)]
reflect: correct internal docs for uncommonType
This updates the doc to reflect the change in CL 19790 from 2016.
Change-Id: I1017babf6660aa3b4929755e2eccbe3168b7860c
Reviewed-on: https://go-review.googlesource.com/c/go/+/714880 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Russ Cox [Wed, 29 Oct 2025 11:27:38 +0000 (07:27 -0400)]
cmd/compile/internal/ssa: model right shift more precisely
Prove currently checks for 0 sign bit extraction (x>>63) at the
end of the pass, but it is more general and more useful
(and not really more work) to model right shift during
value range tracking. This handles sign bit extraction (both 0 and -1)
but also makes the value ranges available for proving bounds checks.
'go build -a -gcflags=-d=ssa/prove/debug=1 std'
finds 105 new things to prove.
https://gist.github.com/rsc/8ac41176e53ed9c2f1a664fc668e8336
For example, the compiler now recognizes that this code in
strconv does not need to check the second shift for being ≥ 64.
msb := xHi >> 63
retMantissa := xHi >> (msb + 38)
nor does this code in regexp:
return b < utf8.RuneSelf && specialBytes[b%16]&(1<<(b/16)) != 0
This code in math no longer has a bounds check on the first index:
if 0 <= n && n <= 308 {
return pow10postab32[uint(n)/32] * pow10tab[uint(n)%32]
}
The diff shows one "lost" proof in ycbcr.go but it's not really lost:
the expression was folded to a constant instead, and that only shows
up with debug=2. A diff of that output is at
https://gist.github.com/rsc/9139ed46c6019ae007f5a1ba4bb3250f
Change-Id: I84087311e0a303f00e2820d957a6f8b29ee22519
Reviewed-on: https://go-review.googlesource.com/c/go/+/716140
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: David Chase <drchase@google.com>
Russ Cox [Mon, 27 Oct 2025 23:41:39 +0000 (19:41 -0400)]
cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.
Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:
For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:
The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.
The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.
This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.
pkg: cmd/compile/internal/test
benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base
DivconstI64 ~ ~ ~ -49.66% -21.02%
ModconstI64 ~ ~ ~ -13.45% +14.52%
DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32%
DivisibleconstI64 ~ ~ ~ -20.01% -48.28%
DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74%
DivconstU64/3 ~ ~ ~ -13.82% -4.09%
DivconstU64/5 ~ ~ ~ -14.10% -3.54%
DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55%
DivconstU64/1234567 ~ ~ ~ -61.55% -56.93%
ModconstU64 ~ ~ ~ -6.25% ~
DivisibleconstU64 ~ ~ ~ -2.78% -7.82%
DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56%
pkg: math/bits
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Add ~ ~ ~ ~
Add32 +1.59% ~ ~ ~
Add64 ~ ~ ~ ~
Add64multiple ~ ~ ~ ~
Sub ~ ~ ~ ~
Sub32 ~ ~ ~ ~
Sub64 ~ ~ -9.20% ~
Sub64multiple ~ ~ ~ ~
Mul ~ ~ ~ ~
Mul32 ~ ~ ~ ~
Mul64 ~ ~ -41.58% -53.21%
Div ~ ~ ~ ~
Div32 ~ ~ ~ ~
Div64 ~ ~ ~ ~
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Mul/P224 ~ ~ -29.95% -39.60%
Mul/P384 ~ ~ -37.11% -63.33%
Mul/P521 ~ ~ -26.62% -12.42%
Square/P224 +1.46% ~ -40.62% -49.18%
Square/P384 ~ ~ -45.51% -69.68%
Square/P521 +90.37% ~ -25.26% -11.23%
(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)
pkg: crypto/internal/fips140/edwards25519
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
EncodingDecoding ~ ~ -34.67% -35.75%
ScalarBaseMult ~ ~ -31.25% -30.29%
ScalarMult ~ ~ -33.45% -32.54%
VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68%
Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
Boris Nagaev [Wed, 22 Oct 2025 14:26:30 +0000 (14:26 +0000)]
crypto/internal/fips140/aes: fix CTR generator
Fixed two issues in AVO based generator of amd64 asm code.
1. Updated golang.org/x/tools dependency to prevent build issue in Go 1.25.
> golang.org/x/tools@v0.24.0/internal/tokeninternal/tokeninternal.go:64:9:
> invalid array length -delta * delta (constant -256 of type int64)
This error was caused by changes in layout of data structures in Go. Package
golang.org/x/tools has a mirror of that struct and a static assert that it
matches the Go's struct.
2. Changed the package name from crypto/aes to crypto/internal/fips140/aes.
This fixed run time error:
> ctr_amd64_asm.go:31: could not find function "ctrBlocks1Asm"
and other errors
Now the following works as expected:
$ cd src/crypto/internal/fips140/aes/_asm/ctr/
$ go generate
The command re-generates file "src/crypto/internal/fips140/aes/ctr_amd64.s".
Fixes #75972
Change-Id: I28e4c9ebb5bf72506a524e36a0c81a1b50367a84
GitHub-Last-Rev: afc9f506e50df6dc25fd285d5a597b0e5c93b5d9
GitHub-Pull-Request: golang/go#75973
Reviewed-on: https://go-review.googlesource.com/c/go/+/712920 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Robert Griesemer [Wed, 29 Oct 2025 22:22:14 +0000 (15:22 -0700)]
go/types, types: proceed with correct (invalid) type in case of a selector error
Fixes #76103.
Change-Id: Idc2f5d1d7aeb4a9b468e7c268e3bf5b85d1c3777
Reviewed-on: https://go-review.googlesource.com/c/go/+/716300 Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Russ Cox [Thu, 23 Oct 2025 02:22:51 +0000 (22:22 -0400)]
cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite
divisibility checks and div, mod, and mul. These happen after
prove, so that prove can make better sense of the code for
deriving bounds, and they must run before decompose, so that
64-bit ops can be lowered to 32-bit ops on 32-bit systems.
And then they need another generic pass as well, to optimize
the generated code before decomposing.
The three opt passes are "opt", "middle opt", and "late opt".
(Perhaps instead they should be "generic", "opt", and "late opt"?)
The "late opt" pass repeats the "middle opt" work on any new code
that has been generated in the interim.
There will not be new divs or mods, but there may be new muls.
The x%c==0 rewrite rules are much simpler now, since they can
match before divs have been rewritten. This has the effect of
applying them more consistently and making the rewrite rules
independent of the exact div rewrites.
Prove is also now charged with marking signed div/mod as
unsigned when the arguments call for it, allowing simpler
code to be emitted in various cases. For example,
t.Seconds()/2 and len(x)/2 are now recognized as unsigned,
meaning they compile to a simple shift (unsigned division),
avoiding the more complex fixup we need for signed values.
https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32
shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std'
output before and after. "Proved Rsh64x64 shifts to zero" is replaced
by the higher-level "Proved Div64 is unsigned" (the shift was in the
signed expansion of div by constant), but otherwise prove is only
finding more things to prove.
One short example, in code that does x[i%len(x)]:
< runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero
---
> runtime/mfinal.go:131:34: Proved Div64 is unsigned
> runtime/mfinal.go:131:38: Proved IsInBounds
A longer example:
< crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero
---
> crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds
These diffs are due to the smaller opt being better
and taking work away from prove:
In the old opt, Mul by 8 was rewritten to Lsh by 3 early.
This CL delays that rule to help prove recognize mods,
but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8].
Specifically, computing the length, opt can now do:
The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)),
Leaving the multiply as Mul enables using that step; the old
rewrite to Lsh blocked it, leaving prove to figure out the length
and then remove the bounds checks. But now opt can evaluate
the length down to a constant 8 and then constant-fold away
the bounds checks 0 < 8, 1 < 8, and so on. After that,
the compiler has nothing left to prove.
Benchmarks are noisy in general; I checked the assembly for the many
large increases below, and the vast majority are unchanged and
presumably hitting the caches differently in some way.
The divisibility optimizations were not reliably triggering before.
This leads to a very large improvement in some cases, like
DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems
and DivisbleconstU64 on 32-bit systems.
Another way the divisibility optimizations were unreliable before
was incorrectly triggering for x/3, x%3 even though they are
written not to do that. There is a real but small slowdown
in the DivisibleWDivconst benchmarks on Mac because in the cases
used in the benchmark, it is still faster (on Mac) to do the
divisibility check than to remultiply.
This may be worth further study. Perhaps when there is no rotate
(meaning the divisor is odd), the divisibility optimization
should be enabled always. In any event, this CL makes it possible
to study that.
Russ Cox [Mon, 27 Oct 2025 02:51:14 +0000 (22:51 -0400)]
test/codegen: simplify asmcheck pattern matching
Separate patterns in asmcheck by spaces instead of commas.
Many patterns end in comma (like "MOV [$]123,") so separating
patterns by comma is not great; they're already quoted, so spaces are fine.
Also replace all tabs in the assembly lines with spaces before matching.
Finally, replace \$ or \\$ with [$] as the matching idiom.
The effect of all these is to make the patterns look like:
// amd64:"BSFQ" "ORQ [$]256"
instead of the old:
// amd64:"BSFQ","ORQ\t\\$256"
Update all tests as well.
Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807
Reviewed-on: https://go-review.googlesource.com/c/go/+/716060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Russ Cox [Wed, 29 Oct 2025 02:00:26 +0000 (22:00 -0400)]
runtime: use internal/strconv
Runtime doing its own number formatting dates back to
when runtime was the bottom-most Go package.
Those days are long gone. Use internal/strconv to avoid
duplicating code and also to get better floating-point
formatting:
% go1.24.6 run x.go
+1.234568e+004
% go run x.go
12345.678
%
With accurate floating point it becomes necessary to
introduce separate printers for float32 vs float64 and
for complex64 vs complex128. Otherwise float32(93.7)
prints as 93.69999694824219.
Change-Id: I25ae3f09519342dc3d1dcabf4711651423e00128
Reviewed-on: https://go-review.googlesource.com/c/go/+/716002 Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Michael Pratt [Mon, 27 Oct 2025 19:34:18 +0000 (15:34 -0400)]
internal/runtime/gc/scan: avoid memory destination on VPCOMPRESSQ
On AMD Genoa / Zen 4, VPCOMPRESSQ with a memory destination imposes a
severe performance penalty of another an order of magnitude compared to
a register destination.
We can trivially work around this penalty with a register destination
and an additional move to memory.
Benchmark results from:
$ go test -bench=BenchmarkScanSpanPacked/.*/.*/.*/.*/impl=Platform internal/runtime/gc/scan
I've only included the summarized geomean here because there are ~2500
unique test cases.
AMD Genoa (Zen 4):
cpu: AMD EPYC 9B14 96-Core Processor
│ mem │ reg │
│ sec/op │ sec/op vs base │
geomean 1.039µ 310.1n -70.16%
│ mem │ reg │
│ B/s │ B/s vs base │
geomean 2.906Gi 10.99Gi +278.27%
As expected, we see a massive performance improvement on Genoa.
AMD Turin (Zen 5):
cpu: AMD EPYC 9B45 128-Core Processor
│ mem │ reg │
│ sec/op │ sec/op vs base │
geomean 231.9n 237.3n +2.32%
│ mem │ reg │
│ B/s │ B/s vs base │
geomean 14.79Gi 14.43Gi -2.50%
On Turin there is a minor regression. This is primarily due to a fairly
large regression (~15%) in very small microbenchmark cases where the
entire memory fits in L1 cache. This regression disappears as memory
access slows down with larger memories. The latter should be more common
in real workloads.
Intel Sapphire Rapids:
cpu: Intel(R) Xeon(R) Platinum 8481C
│ mem │ reg │
│ sec/op │ sec/op vs base │
geomean 254.9n 246.8n -3.18%
│ mem │ reg │
│ B/s │ B/s vs base │
geomean 13.65Gi 14.15Gi +3.69%
On Sapphire Rapids there is a minor improvement. Here results are fairly
noisy. Most cases are a wash, but some are arbitrary 20% slower or 20%
faster for unclear reasons.
For #73581.
Change-Id: I6a6a636cfd294a0dcdc4f34c9ece1bc9a6e5e4c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/715362 Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>