internal/bytealg: optimize index function for ppc64le/power9
Optimized index2to16 loop by unrolling the loop by 4.
Multiple benchmark tests show performance improvement on
POWER9. Similar improvements are seen on POWER10. Added
tests to ensure changes work fine.