Dim performance has regressed by 14% vs 1.9 on amd64.
Current pure go version of Dim is faster and,
what is even more important for performance, is inlinable, so
instead of tweaking asm implementation, just remove it.
I had to update BenchmarkDim, because it was simply reloading
constant(answer) in a loop.
Perf data below:
name old time/op new time/op delta
Dim-6 6.79ns ± 0% 1.60ns ± 1% -76.39% (p=0.000 n=7+10)
If I modify benchmark to be the same as in this CL results are even better:
name old time/op new time/op delta
Dim-6 10.2ns ± 0% 1.6ns ± 1% -84.27% (p=0.000 n=8+10)