crypto/subtle: improve xorBytes assembler on PPC64
This makes some improvements to the xorBytes assembler
implementation for PPC64 targets.
The loops to process large streams of bytes has been changed to
do 64 bytes at a time. Other changes were made to prevent
degradations in some of the common sizes like 8, 16.
The case for < 8 bytes on power10 has been modified to use
the LXVL and STXVL instructions.