]> Cypherpunks repositories - gostls13.git/commit
cmd/compile: CSE loads across disjoint stores
authoramusman <alexander.musman@gmail.com>
Fri, 23 Aug 2024 21:25:32 +0000 (00:25 +0300)
committerKeith Randall <khr@golang.org>
Thu, 5 Feb 2026 19:50:32 +0000 (11:50 -0800)
commitc34b99a5e20307a55a047543f6b48d8a28d830b5
tree770109a6071b6bb884a6fa789418dc222a0efea4
parent99d7121934a9cfa7963d3a9bfd840779fd2869f6
cmd/compile: CSE loads across disjoint stores

Enable partitioning together memory user instructions, such as regular
loads, across disjoint memory defining instructions (currently only stores).
Keep a memory table to remember appropriate memory definition for any
supported memory using instruction. This allows to match more load
instructions and potentially may be further improved with handling
additional cases in the common utility `disjoint`.

Generally this change allows to improve code size. For example, here is
code size difference on linux_arm64:

Executable            Old .text  New .text     Change
-------------------------------------------------------
asm                     1963124    1961972     -0.06%
cgo                     1734228    1733140     -0.06%
compile                 8948740    8948516     -0.00%
cover                   1864500    1863588     -0.05%
link                    2555700    2552676     -0.12%
preprofile               863636     862980     -0.08%
vet                     2869220    2867556     -0.06%

Some benchmarks result from a local run:

shortname: aws_jsonutil
pkg: github.com/aws/aws-sdk-go/private/protocol/json/jsonutil
             │ Orig-rand.stdout │          Cse1-rand.stdout          │
             │      sec/op      │   sec/op     vs base               │
BuildJSON-8         1.511µ ± 0%   1.516µ ± 0%  +0.33% (p=0.003 n=15)
StdlibJSON-8        1.254µ ± 0%   1.227µ ± 0%  -2.15% (p=0.000 n=15)
geomean             1.377µ        1.364µ       -0.92%

shortname: kanzi
toolchain: Cse1-rand
goos: linux
goarch: arm64
pkg: github.com/flanglet/kanzi-go/benchmark
        │ Orig-rand.stdout │          Cse1-rand.stdout          │
        │      sec/op      │   sec/op     vs base               │
FPAQ-4         26.11m ± 0%   25.61m ± 0%  -1.93% (p=0.000 n=10)
LZ-4           1.461m ± 1%   1.445m ± 1%       ~ (p=0.105 n=10)
MTFT-4         1.197m ± 0%   1.201m ± 0%  +0.36% (p=0.000 n=10)
geomean        3.574m        3.543m       -0.88%

This change also tends to increase number of NilChecks matched, which
led to moving statement boundary marks from OpNilCheck to its user
instruction (such as OpOffPtr), where it is more likely to be lost
during subsequent optimizations - e.g. see #75249. Because we don't
remove the nil checks in cse, here we also update it to not move
the statement boundary marks from OpNilCheck - the later related
optimizations can handle that better.

Change-Id: Iddf4aa13d44de78ffecf6ccb4c0fd1d35533e844
Reviewed-on: https://go-review.googlesource.com/c/go/+/608115
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
src/cmd/compile/internal/ssa/cse.go
src/cmd/compile/internal/ssa/rewrite.go
test/prove.go