cmd/compile: use magic multiply for unsigned values less than 1<<16 on 32-bit architectures
This is done by decomposing the number to be divided in 32-bit
components and using the 32-bit magic multiply. For the lowering to be
effective the constant must fit in 16 bits.
On ARM the expression n / 5 compiles to 25 instructions.
Benchmark for GOARCH=arm (Cortex-A53)
name old time/op new time/op delta
DivconstU64/3-6 1.19µs ± 0% 0.03µs ± 1% -97.40% (p=0.000 n=9+9)
DivconstU64/5-6 1.18µs ± 1% 0.03µs ± 1% -97.38% (p=0.000 n=10+8)
DivconstU64/37-6 1.13µs ± 1% 0.04µs ± 1% -96.51% (p=0.000 n=10+8)
DivconstU64/
1234567-6 852ns ± 0% 901ns ± 1% +5.73% (p=0.000 n=8+9)
Benchmark for GOARCH=386 (Haswell)
name old time/op new time/op delta
DivconstU64/3-4 18.0ns ± 2% 5.6ns ± 1% -69.06% (p=0.000 n=10+10)
DivconstU64/5-4 17.8ns ± 1% 5.5ns ± 1% -68.87% (p=0.000 n=9+10)
DivconstU64/37-4 17.8ns ± 1% 7.3ns ± 0% -58.90% (p=0.000 n=10+10)
DivconstU64/
1234567-4 17.5ns ± 1% 16.0ns ± 0% -8.55% (p=0.000 n=10+9)
Change-Id: I38a19b4d59093ec021ef2e5241364a3dad4eae73
Reviewed-on: https://go-review.googlesource.com/c/go/+/264683
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>