cmd/internal/obj/loong64: auto-align loop heads to 16-byte boundaries
CL 479816 took care of loops in hand-written assembly, but did not
account for those written in Go, that may become performance-sensitive
as well.
In this patch, all loop heads are automatically identified and aligned
to 16-byte boundaries, by inserting a synthetic `PCALIGN $16` before
them. "Loop heads" are defined as targets of backward branches.
While at it, tweak some of the local comments so the flow is hopefully
clearer.
Because LoongArch instructions are all 32 bits long, at most 3 NOOPs
can be inserted for each target Prog. This may sound excessive, but
benchmark results indicate the current approach is overall profitable
anyway.
The slight regression on the Regexp cases is likely because the previous
numbers are just coincidental: indeed, large regressions or improvements
(of roughly ±10%) happen with definitely irrelevant changes during
development. This CL should (hopefully) bring such random performance
fluctuations down a bit.
Change-Id: I8bdda6e65336da00d4ad79650937b3eeb9db0e7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/479817 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: WANG Xuerui <git@xen0n.name>