crypto/aes: improve performance for aes-cbc on ppc64le
This adds an asm implementation of aes-cbc for ppc64le to
improve performance. This is ported from the
cryptogams implementation as are other functions in
crypto/aes with further description at the top of
the asm file.
Improvements on a power10:
name old time/op new time/op delta
AESCBCEncrypt1K 1.67µs ± 0% 0.87µs ±-48.15%
AESCBCDecrypt1K 1.35µs ± 0% 0.43µs ±-68.48%
name old speed new speed delta
AESCBCEncrypt1K 614MB/s ± 0% 1184MB/s ± 0%+92.84%
AESCBCDecrypt1K 757MB/s ± 0% 2403M/s ± 0 +217.21%
A fuzz test to compare the generic Go implemenation
against the asm implementation has been added.