bytes, strings: replace asciiSet bitset with lookup table
Replace the 32-byte bitset implementation with a 256-byte lookup table
for simpler and faster ASCII character membership testing.
The bitset implementation used bit manipulation (shifts, masks, AND/OR)
requiring multiple CPU operations per lookup. The lookup table uses
direct array indexing with a single load and compare, reducing CPU
overhead significantly.
Using [256]bool instead of [256]byte allows the compiler to eliminate
the comparison instruction entirely, as bool values are guaranteed to be
either 0 or 1.
The full 256-element array (rather than 128 elements) is used because it
eliminates branches entirely. Testing shows [256]bool is 68% faster than
[128]bool with an explicit bounds check (488µs vs 821µs) due to avoiding
branch misprediction penalties in the hot path. Using [128]bool with bit
masking (c&0x7f) eliminates bounds checks but still costs ~10% performance
due to the AND operation.
The 224-byte increase in memory usage is acceptable for modern systems,
and the simpler implementation is easier to understand and maintain.
Full benchmark results demonstrating ~1.5x improvements across all
affected functions are available at:
https://github.com/golang/go/issues/77194#issuecomment-
3814095806
This supersedes CL 737920 with a simpler approach that improves
performance for all architectures without requiring SIMD instructions.
Updates #77194
Change-Id: I272ee6de05b963a8efc62e7e8838735fb0c4f41b
Reviewed-on: https://go-review.googlesource.com/c/go/+/739982
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>