]> Cypherpunks repositories - gostls13.git/commitdiff
runtime: improve scan inner loop
authorKeith Randall <khr@golang.org>
Thu, 24 Apr 2025 18:10:05 +0000 (11:10 -0700)
committerKeith Randall <khr@golang.org>
Thu, 15 May 2025 01:11:51 +0000 (18:11 -0700)
On every arch except amd64, it is faster to do x&(x-1) than x^(1<<n).

Most archs need 3 instructions for the latter: MOV $1, R; SLL n, R;
ANDN R, x. Maybe 4 if there's no ANDN.

Most archs need only 2 instructions to do x&(x-1). It takes 3 on
x86/amd64 because NEG only works in place.

Only amd64 can do x^(1<<n) in a single instruction.
(We could on 386 also, but that's currently not implemented.)

Change-Id: I3b74b7a466ab972b20a25dbb21b572baf95c3467
Reviewed-on: https://go-review.googlesource.com/c/go/+/672956
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

src/runtime/mbitmap.go

index 7d528b94b43cbca8b62d02e85080c8a0c47119a5..f9a4c4ce3d7d6bf1eb69a39399659897617f99fd 100644 (file)
@@ -219,8 +219,13 @@ func (tp typePointers) nextFast() (typePointers, uintptr) {
        } else {
                i = sys.TrailingZeros32(uint32(tp.mask))
        }
-       // BTCQ
-       tp.mask ^= uintptr(1) << (i & (ptrBits - 1))
+       if GOARCH == "amd64" {
+               // BTCQ
+               tp.mask ^= uintptr(1) << (i & (ptrBits - 1))
+       } else {
+               // SUB, AND
+               tp.mask &= tp.mask - 1
+       }
        // LEAQ (XX)(XX*8)
        return tp, tp.addr + uintptr(i)*goarch.PtrSize
 }