Rewrite equal asm function to use the new power10 instruction lxvl
and stxvl- load and store with variable length which can simplify
the tail end bytes comparison process. Cleaned up code on CR
register usage.
On power9 and power8 the code remains unchanged. The performance
for multiple sizes<=16 improve on power10 with the change.