From: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Date: Tue, 28 Nov 2023 18:42:43 +0000 (+0000)
Subject: runtime: short path for equal pointers in arm64 memequal
X-Git-Tag: go1.23rc1~1397
X-Git-Url: http://www.git.cypherpunks.su/?a=commitdiff_plain;h=66b776b0250dd980d8f6aac264b5e3443ec465dc;p=gostls13.git

runtime: short path for equal pointers in arm64 memequal

If memequal is invoked with the same pointers as arguments it ends up
comparing the whole memory contents, instead of just comparing the pointers.

This effectively makes an operation that could be O(1) into O(n). All the
other architectures already have this optimization in place. For
instance, arm64 also have it, in memequal_varlen.

Such optimization is very specific, one case that it will probably benefit is
programs that rely heavily on interning of strings.

goos: darwin
goarch: arm64
pkg: bytes
                 │      old.txt       │               new.txt                │
                 │       sec/op       │    sec/op     vs base                │
Equal/same/1-8           2.678n ± ∞ ¹   2.400n ± ∞ ¹   -10.38% (p=0.008 n=5)
Equal/same/6-8           3.267n ± ∞ ¹   2.431n ± ∞ ¹   -25.59% (p=0.008 n=5)
Equal/same/9-8           2.981n ± ∞ ¹   2.385n ± ∞ ¹   -19.99% (p=0.008 n=5)
Equal/same/15-8          2.974n ± ∞ ¹   2.390n ± ∞ ¹   -19.64% (p=0.008 n=5)
Equal/same/16-8          2.983n ± ∞ ¹   2.380n ± ∞ ¹   -20.21% (p=0.008 n=5)
Equal/same/20-8          3.567n ± ∞ ¹   2.384n ± ∞ ¹   -33.17% (p=0.008 n=5)
Equal/same/32-8          3.568n ± ∞ ¹   2.385n ± ∞ ¹   -33.16% (p=0.008 n=5)
Equal/same/4K-8         78.040n ± ∞ ¹   2.378n ± ∞ ¹   -96.95% (p=0.008 n=5)
Equal/same/4M-8      78713.000n ± ∞ ¹   2.385n ± ∞ ¹  -100.00% (p=0.008 n=5)
Equal/same/64M-8   1348095.000n ± ∞ ¹   2.381n ± ∞ ¹  -100.00% (p=0.008 n=5)
geomean                  43.52n         2.390n         -94.51%
¹ need >= 6 samples for confidence interval at level 0.95

                 │    old.txt    │                     new.txt                      │
                 │      B/s      │         B/s          vs base                     │
Equal/same/1-8     356.1Mi ± ∞ ¹         397.3Mi ± ∞ ¹        +11.57% (p=0.008 n=5)
Equal/same/6-8     1.711Gi ± ∞ ¹         2.298Gi ± ∞ ¹        +34.35% (p=0.008 n=5)
Equal/same/9-8     2.812Gi ± ∞ ¹         3.515Gi ± ∞ ¹        +24.99% (p=0.008 n=5)
Equal/same/15-8    4.698Gi ± ∞ ¹         5.844Gi ± ∞ ¹        +24.41% (p=0.008 n=5)
Equal/same/16-8    4.995Gi ± ∞ ¹         6.260Gi ± ∞ ¹        +25.34% (p=0.008 n=5)
Equal/same/20-8    5.222Gi ± ∞ ¹         7.814Gi ± ∞ ¹        +49.63% (p=0.008 n=5)
Equal/same/32-8    8.353Gi ± ∞ ¹        12.496Gi ± ∞ ¹        +49.59% (p=0.008 n=5)
Equal/same/4K-8    48.88Gi ± ∞ ¹       1603.96Gi ± ∞ ¹      +3181.17% (p=0.008 n=5)
Equal/same/4M-8    49.63Gi ± ∞ ¹    1637911.85Gi ± ∞ ¹   +3300381.91% (p=0.008 n=5)
Equal/same/64M-8   46.36Gi ± ∞ ¹   26253069.97Gi ± ∞ ¹  +56626517.99% (p=0.008 n=5)
geomean            6.737Gi               122.7Gi            +1721.01%
¹ need >= 6 samples for confidence interval at level 0.95

Fixes #64381

Change-Id: I7d423930a688edd88c4ba60d45e097296d9be852
GitHub-Last-Rev: ae8189fafb1cba87b5394f09f971746ae9299273
GitHub-Pull-Request: golang/go#64419
Reviewed-on: https://go-review.googlesource.com/c/go/+/545416
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
---

diff --git a/src/bytes/bytes_test.go b/src/bytes/bytes_test.go
index f0733edd3f..5e8cf85fd9 100644
--- a/src/bytes/bytes_test.go
+++ b/src/bytes/bytes_test.go
@@ -629,6 +629,11 @@ func BenchmarkEqual(b *testing.B) {
 	})
 
 	sizes := []int{1, 6, 9, 15, 16, 20, 32, 4 << 10, 4 << 20, 64 << 20}
+
+	b.Run("same", func(b *testing.B) {
+		benchBytes(b, sizes, bmEqual(func(a, b []byte) bool { return Equal(a, a) }))
+	})
+
 	benchBytes(b, sizes, bmEqual(Equal))
 }
 
diff --git a/src/internal/bytealg/equal_arm64.s b/src/internal/bytealg/equal_arm64.s
index d3aabba587..4db9515474 100644
--- a/src/internal/bytealg/equal_arm64.s
+++ b/src/internal/bytealg/equal_arm64.s
@@ -9,6 +9,9 @@
 TEXT runtimeÂ·memequal<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-25
 	// short path to handle 0-byte case
 	CBZ	R2, equal
+	// short path to handle equal pointers
+	CMP	R0, R1
+	BEQ	equal
 	B	memeqbody<>(SB)
 equal:
 	MOVD	$1, R0