Where possible replace ANDQ with MOV?ZX.
Takes care that we don't regress wrt bounds checking,
for example [1000]int{}[i&255].
According to "Intel 64 and IA-32 Architectures Optimization Reference
Manual" Section: "3.5.1.13 Zero-Latency MOV Instructions"
MOV?ZX instructions have zero latency on newer processors.