The ported cryptogam implementation uses a subtle and tricky mechanism
using lxv/vperm/lvsl to load unaligned vectors. This is difficult to
read, and may read and write unrelated bytes if reading from an
unaligned address.
Instead, POWER8 instructions can be used to load from unaligned memory
with much less overhead. Alignment interrupts only occur when reading
or writing cache-inhibited memory, which we assume isn't used in go
today, otherwise alignment penalties are usually marginal.
Instead lxvd2x+xxpermdi and xxpermdi+stxvd2x can be used to emulate
unaligned LE bytewise loads, similar to lxv/stxv on POWER9 in
little-endian mode.
Likewise, a custom permute vector is used to emulate BE bytewise
storage operations, lxvb16x/stxvb16x, on POWER9.
This greatly simplifies the code, and it makes it much easier to store
the keys in reverse (which is exactly how the decrypt keys are expected
to be stored).
Change-Id: I2334337e31a8fdf8d13ba96231142a039f237098
Reviewed-on: https://go-review.googlesource.com/c/go/+/395494 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Paul Murphy <murp@ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>