cmd/compile, cmd/internal: fine-grained fiddling with loop alignment
This appears to be useful only on amd64, and was specifically
benchmarked on Apple Silicon and did not produce any benefit there.
This CL adds the assembly instruction `PCALIGNMAX align,amount`
which aligns to `align` if that can be achieved with `amount`
or fewer bytes of padding. (0 means never, but will align the
enclosing function.)
Specifically, if low-order-address-bits + amount are
greater than or equal to align; thus, `PCALIGNMAX 64,63` is
the same as `PCALIGN 64` and `PCALIGNMAX 64,0` will never
emit any alignment, but will still cause the function itself
to be aligned to (at least) 64 bytes.
Change-Id: Id51a056f1672f8095e8f755e01f72836c9686aa3
Reviewed-on: https://go-review.googlesource.com/c/go/+/577935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>