]> Cypherpunks repositories - gostls13.git/commit
cmd/compile: use MOVBQZX for OpAMD64LoweredHasCPUFeature
authorJosh Bleecher Snyder <josharian@gmail.com>
Sun, 5 Apr 2020 02:22:28 +0000 (19:22 -0700)
committerJosh Bleecher Snyder <josharian@gmail.com>
Tue, 7 Apr 2020 18:19:55 +0000 (18:19 +0000)
commit7ee8467b276fbe442df8c84c3d13a99e80519c24
tree9c85ac85cbcf84fd378abc93c694c00cb258e16b
parent64f19d70805a6da347a55dab5ab4f4c57ddb3278
cmd/compile: use MOVBQZX for OpAMD64LoweredHasCPUFeature

In the commit message of CL 212360, I wrote:

> This new intrinsic ... generates MOVB+TESTB+NE.
> (It is possible that MOVBQZX+TESTQ+NE would be better.)

I should have tested. MOVBQZX+TESTQ+NE does in fact appear to be better.

For the benchmark in #36196, on my machine:

name      old time/op  new time/op  delta
FMA-8     0.86ns ± 6%  0.70ns ± 5%  -18.79%  (p=0.000 n=98+97)
NonFMA-8  0.61ns ± 5%  0.60ns ± 4%   -0.74%  (p=0.001 n=100+97)

Interestingly, these are both considerably faster than
the measurements I took a couple of months ago (1.4ns/2ns).
It appears that CL 219131 (clearing VZEROUPPER in asyncPreempt) helped a lot.
And FMA is now once again slower than NonFMA, although this change
helps it regain some ground.

Updates #15808
Updates #36351
Updates #36196

Change-Id: I8a326289a963b1939aaa7eaa2fab2ec536467c7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/227238
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
src/cmd/compile/internal/amd64/ssa.go
src/cmd/compile/internal/ssa/gen/AMD64.rules
src/cmd/compile/internal/ssa/gen/AMD64Ops.go
src/cmd/compile/internal/ssa/rewriteAMD64.go