crypto/sha512: add BE support to PPC64 asm implementation
This adds big endian support for the assembly implementation of
sha512. There was a recent request to do this for sha256 for
AIX users; for completeness, the same is being done for sha512.
The majority of the code is common between big and little
endian with a few differences controlled by ifdefs: with LE
the generation of a mask is needed along with VPERM instructions
to put bytes in the correct order; some VPERMs need the V
registers in a different order.
name old time/op new time/op delta
Hash8Bytes 1.02µs ± 0% 0.38µs ± 0% -62.68%
Hash1K 7.01µs ± 0% 2.43µs ± 0% -65.42%
Hash8K 50.2µs ± 0% 14.6µs ± 0% -70.89%