The code wasn't checking to see if the data was still >= 64 bytes
long after aligning it.
Aligning the data is an optimization and we don't actually need
to do it. In fact for smaller sizes it slows things down due to
the overhead of calling the generic function. Therefore for now
I have simply removed the alignment stage. I have also added a
check into the assembly to deliberately trigger a segmentation
fault if the data is too short.