]>
Cypherpunks repositories - gostls13.git/commit
byte,strings: improve IndexRune performance by ~45%
Change IndexRune to search for the last byte of a multi-byte rune
instead of using the first byte. This improves search performance
by 45% on average when dealing with Unicode text.
The rationale here is that the last byte of a UTF-8 encoded multi-byte
rune is significantly more unique (evenly distributed) than the first
byte which has a 78% chance of being [240, 243, 244].
This approach is typically much faster, but can be slower when there
are a large number of false positives (see Han benchmarks) because
the more even distribution of bytes can delay/prevent falling back
to a brute-force search using bytealg.Index, which is particularly
powerful on amd64/x86_64 (particularly Skylake, but less so with
newer processors).
bytes package benchmarks:
goos: darwin
goarch: arm64
pkg: bytes
cpu: Apple M1 Max
│ base.10.txt │ new.10.txt │
│ sec/op │ sec/op vs base │
IndexRune/10-10 9.784n ± 0% 8.470n ± 0% -13.43% (p=0.000 n=10)
IndexRune/32-10 11.660n ± 0% 8.473n ± 0% -27.34% (p=0.000 n=10)
IndexRune/4K-10 83.96n ± 0% 81.08n ± 0% -3.44% (p=0.000 n=10)
IndexRune/4M-10 63.92µ ± 0% 64.67µ ± 0% +1.17% (p=0.000 n=10)
IndexRune/64M-10 1.121m ± 1% 1.125m ± 1% ~ (p=0.218 n=10)
IndexRuneUnicode/Latin/10-10 10.125n ± 0% 7.347n ± 0% -27.43% (p=0.000 n=10)
IndexRuneUnicode/Latin/32-10 11.435n ± 0% 7.349n ± 0% -35.73% (p=0.000 n=10)
IndexRuneUnicode/Latin/4K-10 882.6n ± 0% 334.9n ± 1% -62.06% (p=0.000 n=10)
IndexRuneUnicode/Latin/4M-10 977.2µ ± 0% 370.9µ ± 1% -62.04% (p=0.000 n=10)
IndexRuneUnicode/Latin/64M-10 15.649m ± 1% 6.028m ± 1% -61.48% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/10-10 10.070n ± 0% 8.701n ± 0% -13.59% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/32-10 19.045n ± 0% 8.704n ± 1% -54.30% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4K-10 2.734µ ± 0% 1.046µ ± 1% -61.75% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4M-10 2.671m ± 0% 1.143m ± 1% -57.22% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/64M-10 43.12m ± 1% 18.26m ± 1% -57.64% (p=0.000 n=10)
IndexRuneUnicode/Han/10-10 10.10n ± 0% 10.82n ± 1% +7.08% (p=0.000 n=10)
IndexRuneUnicode/Han/32-10 38.29n ± 1% 10.87n ± 1% -71.62% (p=0.000 n=10)
IndexRuneUnicode/Han/4K-10 1409.0n ± 0% 489.1n ± 1% -65.28% (p=0.000 n=10)
IndexRuneUnicode/Han/4M-10 1338.4µ ± 0% 821.1µ ± 2% -38.65% (p=0.000 n=10)
IndexRuneUnicode/Han/64M-10 21.42m ± 1% 13.42m ± 2% -37.34% (p=0.000 n=10)
geomean 3.983µ 2.305µ -42.14%
│ base.10.txt │ new.10.txt │
│ B/s │ B/s vs base │
IndexRune/10-10 974.8Mi ± 0% 1126.1Mi ± 0% +15.52% (p=0.000 n=10)
IndexRune/32-10 2.556Gi ± 0% 3.517Gi ± 0% +37.62% (p=0.000 n=10)
IndexRune/4K-10 45.43Gi ± 0% 47.05Gi ± 0% +3.56% (p=0.000 n=10)
IndexRune/4M-10 61.12Gi ± 0% 60.41Gi ± 0% -1.16% (p=0.000 n=10)
IndexRune/64M-10 55.74Gi ± 1% 55.57Gi ± 1% ~ (p=0.218 n=10)
IndexRuneUnicode/Latin/10-10 942.0Mi ± 0% 1297.9Mi ± 0% +37.78% (p=0.000 n=10)
IndexRuneUnicode/Latin/32-10 2.606Gi ± 0% 4.055Gi ± 0% +55.61% (p=0.000 n=10)
IndexRuneUnicode/Latin/4K-10 4.322Gi ± 0% 11.392Gi ± 1% +163.57% (p=0.000 n=10)
IndexRuneUnicode/Latin/4M-10 3.998Gi ± 0% 10.532Gi ± 1% +163.47% (p=0.000 n=10)
IndexRuneUnicode/Latin/64M-10 3.994Gi ± 1% 10.369Gi ± 1% +159.61% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/10-10 947.2Mi ± 0% 1096.1Mi ± 0% +15.72% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/32-10 1.565Gi ± 0% 3.424Gi ± 1% +118.80% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4K-10 1.396Gi ± 0% 3.649Gi ± 1% +161.43% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4M-10 1.462Gi ± 0% 3.418Gi ± 1% +133.76% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/64M-10 1.450Gi ± 1% 3.422Gi ± 1% +136.08% (p=0.000 n=10)
IndexRuneUnicode/Han/10-10 944.6Mi ± 0% 881.7Mi ± 1% -6.66% (p=0.000 n=10)
IndexRuneUnicode/Han/32-10 797.0Mi ± 1% 2809.3Mi ± 1% +252.47% (p=0.000 n=10)
IndexRuneUnicode/Han/4K-10 2.707Gi ± 0% 7.798Gi ± 1% +188.04% (p=0.000 n=10)
IndexRuneUnicode/Han/4M-10 2.919Gi ± 0% 4.757Gi ± 2% +63.01% (p=0.000 n=10)
IndexRuneUnicode/Han/64M-10 2.917Gi ± 1% 4.656Gi ± 2% +59.60% (p=0.000 n=10)
geomean 3.036Gi 5.246Gi +72.82%
goos: linux
goarch: amd64
pkg: bytes
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
IndexRune/10-4 10.805n ± 0% 6.999n ± 0% -35.22% (p=0.000 n=10)
IndexRune/32-4 12.515n ± 0% 7.539n ± 0% -39.76% (p=0.000 n=10)
IndexRune/4K-4 71.69n ± 0% 68.39n ± 0% -4.60% (p=0.000 n=10)
IndexRune/4M-4 125.19µ ± 2% 63.05µ ± 0% -49.63% (p=0.000 n=10)
IndexRune/64M-4 1.050m ± 1% 1.053m ± 0% ~ (p=0.353 n=10)
IndexRuneUnicode/Latin/10-4 9.471n ± 0% 6.144n ± 1% -35.13% (p=0.000 n=10)
IndexRuneUnicode/Latin/32-4 12.540n ± 0% 6.655n ± 0% -46.93% (p=0.000 n=10)
IndexRuneUnicode/Latin/4K-4 522.1n ± 0% 207.2n ± 0% -60.32% (p=0.000 n=10)
IndexRuneUnicode/Latin/4M-4 626.1µ ± 0% 297.2µ ± 0% -52.54% (p=0.000 n=10)
IndexRuneUnicode/Latin/64M-4 13.866m ± 3% 5.069m ± 4% -63.44% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/10-4 10.920n ± 0% 7.213n ± 0% -33.95% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/32-4 12.515n ± 0% 7.780n ± 0% -37.83% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4K-4 2650.0n ± 0% 621.5n ± 0% -76.55% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4M-4 2744.7µ ± 0% 723.2µ ± 0% -73.65% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/64M-4 44.18m ± 0% 14.22m ± 14% -67.82% (p=0.000 n=10)
IndexRuneUnicode/Han/10-4 10.795n ± 0% 9.734n ± 1% -9.83% (p=0.000 n=10)
IndexRuneUnicode/Han/32-4 12.79n ± 0% 10.42n ± 1% -18.46% (p=0.000 n=10)
IndexRuneUnicode/Han/4K-4 519.7n ± 0% 288.4n ± 0% -44.51% (p=0.000 n=10)
IndexRuneUnicode/Han/4M-4 498.2µ ± 0% 443.0µ ± 0% -11.07% (p=0.000 n=10)
IndexRuneUnicode/Han/64M-4 9.654m ± 2% 12.223m ± 1% +26.61% (p=0.000 n=10)
geomean 3.168µ 1.828µ -42.30%
│ old.txt │ new.txt │
│ B/s │ B/s vs base │
IndexRune/10-4 882.5Mi ± 0% 1362.6Mi ± 0% +54.41% (p=0.000 n=10)
IndexRune/32-4 2.381Gi ± 0% 3.953Gi ± 0% +66.00% (p=0.000 n=10)
IndexRune/4K-4 53.21Gi ± 0% 55.77Gi ± 0% +4.82% (p=0.000 n=10)
IndexRune/4M-4 31.20Gi ± 2% 61.95Gi ± 0% +98.55% (p=0.000 n=10)
IndexRune/64M-4 59.54Gi ± 1% 59.37Gi ± 0% ~ (p=0.353 n=10)
IndexRuneUnicode/Latin/10-4 1006.9Mi ± 0% 1552.3Mi ± 1% +54.17% (p=0.000 n=10)
IndexRuneUnicode/Latin/32-4 2.376Gi ± 0% 4.478Gi ± 0% +88.45% (p=0.000 n=10)
IndexRuneUnicode/Latin/4K-4 7.306Gi ± 0% 18.411Gi ± 0% +152.01% (p=0.000 n=10)
IndexRuneUnicode/Latin/4M-4 6.239Gi ± 0% 13.145Gi ± 0% +110.70% (p=0.000 n=10)
IndexRuneUnicode/Latin/64M-4 4.507Gi ± 3% 12.329Gi ± 4% +173.54% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/10-4 873.0Mi ± 0% 1322.2Mi ± 0% +51.46% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/32-4 2.382Gi ± 0% 3.831Gi ± 0% +60.84% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4K-4 1.439Gi ± 0% 6.138Gi ± 0% +326.43% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4M-4 1.423Gi ± 0% 5.401Gi ± 0% +279.52% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/64M-4 1.415Gi ± 0% 4.396Gi ± 17% +210.79% (p=0.000 n=10)
IndexRuneUnicode/Han/10-4 883.4Mi ± 0% 979.7Mi ± 1% +10.90% (p=0.000 n=10)
IndexRuneUnicode/Han/32-4 2.331Gi ± 0% 2.858Gi ± 1% +22.61% (p=0.000 n=10)
IndexRuneUnicode/Han/4K-4 7.340Gi ± 0% 13.226Gi ± 0% +80.19% (p=0.000 n=10)
IndexRuneUnicode/Han/4M-4 7.841Gi ± 0% 8.817Gi ± 0% +12.44% (p=0.000 n=10)
IndexRuneUnicode/Han/64M-4 6.474Gi ± 2% 5.113Gi ± 1% -21.02% (p=0.000 n=10)
geomean 3.816Gi 6.614Gi +73.32%
strings package benchmarks:
goos: darwin
goarch: arm64
pkg: strings
│ base.index_rune.10.txt │ new.index_rune.10.txt │
│ sec/op │ sec/op vs base │
IndexRune-10 11.905n ± 5% 6.633n ± 6% -44.28% (p=0.000 n=10)
IndexRuneLongString-10 13.800n ± 1% 7.330n ± 2% -46.88% (p=0.000 n=10)
IndexRuneFastPath-10 3.477n ± 0% 3.481n ± 1% ~ (p=0.468 n=10)
geomean 8.297n 5.531n -33.34%
Change-Id: I59357fda1c8ac85315b759930f620dbce1ba4721
Reviewed-on: https://go-review.googlesource.com/c/go/+/539116
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>