]> Cypherpunks repositories - gostls13.git/commit
cmd/compile: use LZCNT instruction for GOAMD64>=3
authorWayne Zuo <wdvxdr@golangcn.org>
Wed, 30 Mar 2022 13:44:44 +0000 (21:44 +0800)
committerEmmanuel Odeke <emmanuel@orijtech.com>
Mon, 4 Apr 2022 04:01:17 +0000 (04:01 +0000)
commita92ca515077e5cf54673eb8c5c2d9db4824330db
tree7dc63db107f5cef14d2819196e7a85b082bc2dc3
parentba6df85c7c94c7b26d4979e92fdb9ec7fa4cc1e4
cmd/compile: use LZCNT instruction for GOAMD64>=3

LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using
LZCNT can avoid a special case for zero input. Except that case,
LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x).

And according to https://www.agner.org/optimize/instruction_tables.pdf,
LZCNT instructions are much faster than BSR on AMD CPU.

name              old time/op  new time/op  delta
LeadingZeros-8    0.91ns ± 1%  0.80ns ± 7%  -11.68%  (p=0.000 n=9+9)
LeadingZeros8-8   0.98ns ±15%  0.91ns ± 1%   -7.34%  (p=0.000 n=9+9)
LeadingZeros16-8  0.94ns ± 3%  0.92ns ± 2%   -2.36%  (p=0.001 n=10+10)
LeadingZeros32-8  0.89ns ± 1%  0.78ns ± 2%  -12.49%  (p=0.000 n=10+10)
LeadingZeros64-8  0.92ns ± 1%  0.78ns ± 1%  -14.48%  (p=0.000 n=10+10)

Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a
Reviewed-on: https://go-review.googlesource.com/c/go/+/396794
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
src/cmd/compile/internal/amd64/ssa.go
src/cmd/compile/internal/amd64/versions_test.go
src/cmd/compile/internal/ssa/gen/AMD64.rules
src/cmd/compile/internal/ssa/gen/AMD64Ops.go
src/cmd/compile/internal/ssa/opGen.go
src/cmd/compile/internal/ssa/rewriteAMD64.go
test/codegen/mathbits.go