Add optimization patterns for MemEq with small constant sizes
(3-32 bytes). These patterns help to avoid runtime calls for
small sizes.
For sizes 3-16, combine two chunks loading and comparison.
For sizes 17-32, combine a 16-byte comparison with the remaining bytes.
This change may increase binary size slightly due to inline expansion,
but improves performance for code with many small memequals,
e.g. DecodehealingTracker benchmark on arm64:
Change-Id: If3d7e7395656d5f36e3ab303a71044293d17bc3e
Reviewed-on: https://go-review.googlesource.com/c/go/+/688195 Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@google.com>