Previously, the implementation used bit manipulation to approximate
counters to the nearest power of two.
Given the 0-255 range of byte, we can precompute values at
initialization and use a lookup table, reducing runtime computation.
Benchmarks show an 18% performance gain on AMD64 and 5% on ARM64.
* net/netip/FuzzParse (n=10, t=60s, state reset per run)
* AMD64 (Intel Alder Lake i5-12600k):
17,349,217 -> 20,487,756 execs/s
* ARM64 (M3 Pro):
19,606,471 -> 20,657,041 execs/s
* compress/gzip/FuzzReader (n=10, t=60s, mature corpus)
* AMD64 (Intel Alder Lake i5-12600k):
5,655,956 -> 6,707,035 execs/s
Change-Id: If11f7fe866f54c7cd2c5a48e251c027b67980df7
Reviewed-on: https://go-review.googlesource.com/c/go/+/627378
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
func SnapshotCoverage() {
cov := coverage()
for i, b := range cov {
- b |= b >> 1
- b |= b >> 2
- b |= b >> 4
- b -= b >> 1
- coverageSnapshot[i] = b
+ coverageSnapshot[i] = pow2Table[b]
}
}
// the 8-bit coverage counters reside in memory. They're known to cmd/link,
// which specially assigns their addresses for this purpose.
_counters, _ecounters [0]byte
+
+ // lookup table for faster power of two rounding
+ pow2Table [256]byte
)
+
+func init() {
+ for i := range pow2Table {
+ b := byte(i)
+ b |= b >> 1
+ b |= b >> 2
+ b |= b >> 4
+ b -= b >> 1
+ pow2Table[i] = b
+ }
+}