This reworks how we load/store vector registers using the new
bi-endian P9 instruction emulation macros. This also removes
quite a bit of asm used to align and reorder vector registers.
This is also a slight improvement on P9 ppc64le/linux:
name old speed new speed delta
AESCBCEncrypt1K 936MB/s ± 0% 943MB/s ± 0% +0.80%
AESCBCDecrypt1K 1.28GB/s ± 0% 1.37GB/s ± 0% +6.76%
Updates #18499
Change-Id: Ic5ff71d217d7302b6ae4e8d877c25004bfda5ecd
Reviewed-on: https://go-review.googlesource.com/c/go/+/405134
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>