crypto/internal/nistec: re-enable ppc64le asm for P-256
Add support for ppc64le assembler to p256. Most of the changes
are due to the change in nistec interfaces.
There is a change to p256MovCond based on a reviewer's comment.
LXVD2X replaces the use of LXVW4X in one function.
In addition, some refactoring has been done to this file to
reduce size and improve readability:
- Eliminate the use of defines to switch between V and VSX
registers. V regs can be used for instructions some that
previously required VSX.
- Use XXPERMDI instead of VPERM to swap bytes loaded and
stored with LXVD2X and STXVD2X instructions. This eliminates
the need to load the byte swap string into a vector.
- Use VMRGEW and VMRGOW instead of VPERM in the VMULT
macros. This also avoids the need to load byte strings to
swap the high and low values.
These changes reduce the file by about 10% and shows an
improvement of about 2% at runtime.