internal/bytealg: improve asm for memequal on ppc64x
This includes two changes to the memequal function.
Previously the asm implementation on ppc64x for Equal called the internal
function memequal using a BL, whereas the other asm implementations for
bytes functions on ppc64x used BR. The BR is preferred because the BL
causes the calling function to stack a frame. This changes Equal so it
uses BR and is consistent with the others.
This also uses vsx instructions where possible to improve performance
of the compares for sizes over 32.