internal/bytealg: optimize Count/CountString on arm64
Introduce ABIInternal support for Count/CountString
Move <32 size block from function end to beginning as fastpath
goos: linux
goarch: arm64
pkg: strings
│ base.txt │ new.txt │
│ B/s │ B/s vs base │
CountByte/10 672.5Mi ± 0% 692.9Mi ± 0% +3.04% (p=0.000 n=10)
CountByte/32 3.592Gi ± 0% 3.970Gi ± 0% +10.53% (p=0.000 n=10)
CountByte/4096 16.63Gi ± 0% 16.73Gi ± 0% +0.64% (p=0.000 n=10)
CountByte/
4194304 14.97Gi ± 2% 15.02Gi ± 1% ~ (p=0.190 n=10)
CountByte/
67108864 12.50Gi ± 0% 12.50Gi ± 0% ~ (p=0.853 n=10)
geomean 5.931Gi 6.099Gi +2.83%
Change-Id: I5af1be2b117d9fb8d570739637499923de62251c
Reviewed-on: https://go-review.googlesource.com/c/go/+/662395
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Commit-Queue: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>