Alberto Donizetti [Fri, 2 Mar 2018 14:16:27 +0000 (15:16 +0100)]
test: port bits.Len intrinsics tests to the new codegen harness
This change move bits.Len* intrinsification tests to the new codegen
test harness, removing them from the old ssa_test file. Five different
test functions (one for each bit.Len function tested) was used, to
avoid possible unwanted interactions between multiple calls inside one
function.
Change-Id: Iffd5be55b58e88597fa30a562a28dacb01236d8b
Reviewed-on: https://go-review.googlesource.com/98156
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
Giovanni Bajo [Sat, 3 Mar 2018 15:00:43 +0000 (16:00 +0100)]
test: move load/store combines into asmcheck
This CL moves the load/store combining tests into asmcheck.
In addition at being more compact, it's also now easier to
spot what it is missing in each architecture.
While doing so, I think I uncovered a bug in ppc64le and arm64
rules, because they fail to load/store combine in non-trivial
functions. Not sure why, I'll open an issue.
Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3
Reviewed-on: https://go-review.googlesource.com/98441
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Giovanni Bajo [Sat, 3 Mar 2018 18:44:47 +0000 (19:44 +0100)]
test: in asmcheck, dump only the functions which fail
Before this change, in case of any failure, asmcheck was
dumping to stderr the whole output of compile -S, which
can be very long if it contains multiple functions.
Make it so it filters the output to only display the
assembly output of functions for which at least one opcode
check failed. This greatly simplifies debugging.
Change-Id: I1bbf54473b8252a3384e2c1dade82d926afc119d
Reviewed-on: https://go-review.googlesource.com/98444
Run-TryBot: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>
Giovanni Bajo [Sat, 3 Mar 2018 13:47:20 +0000 (14:47 +0100)]
test: use the version of Go used to run run.go
Currently, the top-level testsuite always uses whatever version
of Go is found in the PATH to execute all the tests. This
forces the developers to tweak the PATH to run the testsuite.
Change it to use the same version of Go used to run run.go.
This allows developers to run the testsuite using the tip
compiler by simply saying "../bin/go run run.go".
I think this is a better solution compared to always forcing
"../bin/go", because it allows developers to run the testsuite
using different Go versions, for instance to check if a new
test is fixed in tip compared to the installed compiler.
Fixes #24217
Change-Id: I41b299c753b6e77c41e28be9091b2b630efea9d2
Reviewed-on: https://go-review.googlesource.com/98439
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Tobias Klauser [Fri, 2 Mar 2018 10:27:15 +0000 (11:27 +0100)]
runtime: use vDSO for clock_gettime on linux/arm
Use the __vdso_clock_gettime fast path via the vDSO on linux/arm to
speed up nanotime and walltime. This results in the following
performance improvement for time.Now on a RaspberryPi 3 (running
32bit Raspbian, i.e. GOOS=linux/GOARCH=arm):
name old time/op new time/op delta
TimeNow 0.99µs ± 0% 0.39µs ± 1% -60.74% (p=0.000 n=12+20)
Change-Id: I3598278a6c88d7f6a6ce66c56b9d25f9dd2f4c9a
Reviewed-on: https://go-review.googlesource.com/98095 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Joe Tsai [Fri, 10 Nov 2017 01:33:12 +0000 (17:33 -0800)]
encoding/json: use sync.Map for field cache
The previous type cache is quadratic in time in the situation where
new types are continually encountered. Now that it is possible to dynamically
create new types with the reflect package, this can cause json to
perform very poorly.
Switch to sync.Map which does well when the cache has hit steady state,
but also handles occasional updates in better than quadratic time.
Keith Randall [Fri, 2 Mar 2018 00:38:41 +0000 (16:38 -0800)]
internal/bytealg: move IndexByte asssembly to the new bytealg package
Move the IndexByte function from the runtime to a new bytealg package.
The new package will eventually hold all the optimized assembly for
groveling through byte slices and strings. It seems a better home for
this code than randomly keeping it in runtime.
Once this is in, the next step is to move the other functions
(Compare, Equal, ...).
Update #19792
This change seems complicated enough that we might just declare
"not worth it" and abandon. Opinions welcome.
The core assembly is all unchanged, except minor modifications where
the code reads cpu feature bits.
The wrapper functions have been cleaned up as they are now actually
checked by vet.
Brad Fitzpatrick [Tue, 27 Feb 2018 17:23:14 +0000 (17:23 +0000)]
net: skip flaky TestLookupLongTXT for now
Flaky tests failing trybots help nobody.
Updates #22857
Change-Id: I87bc018651ab4fe02560a6d24c08a1d7ccd8ba37
Reviewed-on: https://go-review.googlesource.com/97416 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Hana Kim [Thu, 22 Feb 2018 18:27:39 +0000 (13:27 -0500)]
internal/trace: remove backlinks from span/task end to start
Even though undocumented, the assumption is the Event's link field
points to the following event in the future. The new span/task event
processing breaks the assumption.
Michael Fraenkel [Thu, 1 Mar 2018 14:26:38 +0000 (09:26 -0500)]
cmd/compile: convert type during finishcompare
When recursively calling walkexpr, r.Type is still the untyped value.
It then sometimes recursively calls finishcompare, which complains that
you can't compare the resulting expression to that untyped value.
Updates #23834.
Change-Id: I6b7acd3970ceaff8da9216bfa0ae24aca5dee828
Reviewed-on: https://go-review.googlesource.com/97856 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Than McIntosh [Fri, 1 Dec 2017 17:24:56 +0000 (12:24 -0500)]
cmd/compile: add DWARF register mappings for ARM64.
Add DWARF register mappings for ARM64, so that that arch will become
usable with "-dwarflocationlists". [NB: I've plugged in a set of
numbers from the doc, but this will require additional manual testing.]
Alessandro Arzilli [Mon, 19 Feb 2018 14:26:49 +0000 (15:26 +0100)]
cmd/link: fix up debug_range for dsymutil (revert CL 72371)
Dsymutil, an utility used on macOS when externally linking executables,
does not support base address selector entries in debug_ranges.
CL 73271 worked around this problem by removing base address selectors
and emitting CU-relative relocations for each list entry.
This commit, as an optimization, reintroduces the base address
selectors and changes the linker to remove them again, but only when it
knows that it will have to invoke the external linker on macOS.
Compilecmp comparing master with a branch that has scope tracking
always enabled:
Heschi Kreinick [Wed, 28 Feb 2018 22:53:31 +0000 (17:53 -0500)]
cmd/compile/internal/ssa: batch up all zero-width instructions
When generating location lists, batch up changes for all zero-width
instructions, not just phis. This prevents the creation of location list
entries that don't actually cover any instructions.
This isn't perfect because of the caveats in the prior CL (Copy is
zero-width sometimes) but in practice this seems to fix all of the empty
lists in std.
Heschi Kreinick [Wed, 28 Feb 2018 21:30:07 +0000 (16:30 -0500)]
cmd/compile/internal/ssa: note zero-width Ops
Add a bool to opInfo to indicate if an Op never results in any
instructions. This is a conservative approximation: some operations,
like Copy, may or may not generate code depending on their arguments.
I built the list by reading each arch's ssaGenValue function. Hopefully
I got them all.
Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d
Reviewed-on: https://go-review.googlesource.com/97957 Reviewed-by: Keith Randall <khr@golang.org>
Alessandro Arzilli [Thu, 15 Feb 2018 06:40:44 +0000 (07:40 +0100)]
cmd/compile: optimize scope tracking
1. Detect and remove the markers of lexical scopes that don't contain
any variables early in noder, instead of waiting until the end of DWARF
generation.
This saves memory by never allocating some of the markers and optimizes
some of the algorithms that depend on the number of scopes.
2. Assign scopes to Progs by doing, for each Prog, a binary search over
the markers array. This is faster, compared to sorting the Prog list
because there are fewer markers than there are Progs.
Heschi Kreinick [Wed, 24 Jan 2018 18:26:15 +0000 (13:26 -0500)]
cmd/link: fix up location lists for dsymutil
LLVM tools, particularly lldb and dsymutil, don't support base address
selection entries in location lists. When targeting GOOS=darwin,
mode, have the linker translate location lists to CU-relative form
instead.
Technically, this isn't necessary when linking internally, as long as
nobody plans to use anything other than Delve to look at the DWARF. But
someone might want to use lldb, and it's really confusing when dwarfdump
shows gibberish for the location entries. The performance cost isn't
noticeable, so enable it even for internal linking.
Doing this in the linker is a little weird, but it was more expensive in
the compiler, probably because the compiler is much more stressful to
the GC. Also, if we decide to only do it for external linking, the
compiler can't see the link mode.
Benchmark before and after this commit on Mac with -dwarflocationlists=1:
name old time/op new time/op delta
StdCmd 21.3s ± 1% 21.3s ± 1% ~ (p=0.310 n=27+27)
Only StdCmd is relevant, because only StdCmd runs the linker. Whatever
the cost is here, it's not very large.
Heschi Kreinick [Wed, 28 Feb 2018 21:32:11 +0000 (16:32 -0500)]
cmd/compile/internal/ssa: avoid accidental list ends
Some SSA values don't translate into any instructions. If a function
began with two of them, and both modified the storage of the same
variable, we'd end up with a location list entry that started and ended
at 0. That looks like an end-of-list entry, which would then confuse
downstream tools, particularly the fixup in the linker.
"Fix" this by changing the end of such entries to 1. Should be harmless,
since AFAIK we don't generate any 1-byte instructions. Later CLs will
reduce the frequency of these entries anyway.
Ilya Tocar [Thu, 1 Mar 2018 18:52:21 +0000 (12:52 -0600)]
crypto: remove hand encoded amd64 instructions
Replace BYTE.. encodings with asm. This is possible due to asm
implementing more instructions and removal of
MOV $0, reg -> XOR reg, reg transformation from asm.
Ilya Tocar [Thu, 1 Mar 2018 18:30:19 +0000 (12:30 -0600)]
math: remove unused variable
useSSE41 was used inside asm implementation of floor to select between base and ss4 code path.
We intrinsified floor and left asm functions as a backup for non-sse4 systems.
This made variable unused, so remove it.
Hana Kim [Thu, 1 Mar 2018 16:42:09 +0000 (11:42 -0500)]
runtime/trace: skip TestUserTaskSpan upon timestamp error
Change-Id: I030baaa0a0abf1e43449faaf676d389a28a868a3
Reviewed-on: https://go-review.googlesource.com/97857
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com> Reviewed-by: Peter Weinberger <pjw@google.com>
Giovanni Bajo [Thu, 1 Mar 2018 00:55:33 +0000 (01:55 +0100)]
test: in asmcheck, regexp must match from beginning of line
This avoid simple bugs like "ADD" matching "FADD". Obviously
"ADD" will still match "ADDQ" so some care is still required
in this regard, but at least a first class of possible errors
is taken care of.
Change-Id: I7deb04c31de30bedac9c026d9889ace4a1d2adcb
Reviewed-on: https://go-review.googlesource.com/97817 Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>
Giovanni Bajo [Thu, 1 Mar 2018 00:39:01 +0000 (01:39 +0100)]
test: improve asmcheck syntax
asmcheck comments now support a compact form of specifying
multiple checks for each platform, using the following syntax:
amd64:"SHL\t[$]4","SHR\t[$]4"
Negative checks are also parsed using the following syntax:
amd64:-"ROR"
though they are still not working.
Moreover, out-of-line comments have been implemented. This
allows to specify asmchecks on comment-only lines, that will
be matched on the first subsequent non-comment non-empty line.
// amd64:"XOR"
// arm:"EOR"
x ^= 1
Change-Id: I110c7462fc6a5c70fd4af0d42f516016ae7f2760
Reviewed-on: https://go-review.googlesource.com/97816 Reviewed-by: Keith Randall <khr@golang.org>
runtime: fix amd64p32 indexbytes in presence of overflow
When the slice/string length is very large,
probably artifically large as in CL 97523,
adding BX (length) to R11 (pointer) overflows.
As a result, checking DI < R11 yields the wrong result.
Since they will be equal when the loop is done,
just check DI != R11 instead.
Yes, the pointer itself could overflow, but if that happens,
something else has gone pretty wrong; not our concern here.
Chad Rosier [Tue, 27 Feb 2018 15:35:17 +0000 (10:35 -0500)]
cmd/compile/internal/ssa: combine consecutive LittleEndian stores on arm64
This optimization mirrors that which is already implemented for AMD64. The
optimization specifically targets the binary.LittleEndian.PutUint* functions.
Marcel van Lohuizen [Wed, 28 Feb 2018 12:14:44 +0000 (13:14 +0100)]
testing: gracefully handle subtest failing parent’s T
Don’t panic if a subtest inadvertently calls FailNow
on a parent’s T. Instead, report the offending subtest
while still reporting the error with the ancestor test and
keep exiting goroutines\10.
Note that this implementation has a race if parallel
subtests are failing the parent concurrently.
This is fine:
Calling FailNow on a parent is considered an error
in principle, at the moment, and is reported if it is
detected. Having the race allows the race detector
to detect the error as well.
Fixes #22882
Change-Id: Ifa6d5e55bb88f6bcbb562fc8c99f1f77e320015a
Reviewed-on: https://go-review.googlesource.com/97635
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> Reviewed-by: Kunpei Sakai <namusyaka@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Giovanni Bajo [Tue, 27 Feb 2018 00:59:58 +0000 (01:59 +0100)]
test: add support for code generation tests (asmcheck)
The top-level test harness is modified to support a new kind
of test: "asmcheck". This is meant to replace asm_test.go
as an easier and more readable way to test code generation.
I've added a couple of codegen tests to get initial feedback
on the syntax. I've created them under a common "codegen"
subdirectory, so that it's easier to run them all with
"go run run.go -v codegen".
The asmcheck syntax allows to insert line comments that
can specify a regular expression to match in the assembly code,
for multiple architectures (the testsuite will automatically
build each testfile multiple times, one per mentioned architecture).
Negative matches are unsupported for now, so this cannot fully
replace asm_test yet.
Change-Id: Ifdbba389f01d55e63e73c99e5f5449e642101d55
Reviewed-on: https://go-review.googlesource.com/97355
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
Consider the following:
type child struct{ Field string }
type parent struct{ child }
p := new(parent)
v := reflect.ValueOf(p).Elem().Field(0)
v.Field(0).SetString("hello") // v.Field = "hello"
v = v.Addr().Elem() // v = *(&v)
v.Field(0).SetString("goodbye") // v.Field = "goodbye"
It would appear that v.Addr().Elem() should have the same value, and
that it would be safe to set "goodbye".
However, after CL 66331, any interspersed calls between Field calls
causes the RO flag to be set.
Thus, setting to "goodbye" actually causes a panic.
That CL affects decodeState.indirect which assumes that back-to-back
Value.Addr().Elem() is side-effect free. We fix that logic to keep
track of the Addr() and Elem() calls and set v back to the original
after a full round-trip has occured.
Fixes #24152
Updates #24153
Change-Id: Ie50f8fe963f00cef8515d89d1d5cbc43b76d9f9c
Reviewed-on: https://go-review.googlesource.com/97796 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
erifan01 [Tue, 23 Jan 2018 11:35:54 +0000 (11:35 +0000)]
cmd/asm: enable several arm64 load & store instructions
Instructions LDARB, LDARH, LDAXPW, LDAXP, STLRB, STLRH, STLXP, STLXPW, STXP,
STXPW have been added before, but they are not enabled. This CL enabled them.
Change the form of LDXP and LDXPW to the form of LDP, and fix a bug of STLXP.
Ben Shi [Sun, 25 Feb 2018 09:10:54 +0000 (09:10 +0000)]
cmd/compile: optimize ARM64 code with EON/ORN
EON and ORN are efficient ARM64 instructions. EON combines (x ^ ^y)
into a single operation, and so ORN does for (x | ^y).
This CL implements that optimization. And here are benchmark results
with RaspberryPi3/ArchLinux.
1. A specific test gets about 13% improvement.
EONORN 181µs ± 0% 157µs ± 0% -13.26% (p=0.000 n=26+23)
(https://github.com/benshi001/ugo1/blob/master/eonorn_test.go)
Matthew Dempsky [Wed, 28 Feb 2018 19:22:05 +0000 (11:22 -0800)]
cmd/compile: fix unexpected type alias crash
OCOMPLIT stores the pre-typechecked type in n.Right, and then moves it
to n.Type. However, it wasn't clearing n.Right, so n.Right continued
to point to the OTYPE node. (Exception: slice literals reused n.Right
to store the array length.)
When exporting inline function bodies, we don't expect to need to save
any type aliases. Doing so wouldn't be wrong per se, but it's
completely unnecessary and would just bloat the export data.
However, reexportdep (whose role is to identify types needed by inline
function bodies) uses a generic tree traversal mechanism, which visits
n.Right even for O{ARRAY,MAP,STRUCT}LIT nodes. This means it finds the
OTYPE node, and mistakenly interpreted that the type alias needs to be
exported.
The straight forward fix is to just clear n.Right when typechecking
composite literals.
Fixes #24173.
Change-Id: Ia2d556bfdd806c83695b08e18b6cd71eff0772fc
Reviewed-on: https://go-review.googlesource.com/97719
Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org>
Adam Langley [Thu, 22 Feb 2018 20:05:29 +0000 (12:05 -0800)]
crypto/x509: parse invalid DNS names and email addresses.
Go 1.10 requires that SANs in certificates are valid. However, a
non-trivial number of (generally non-WebPKI) certificates have invalid
strings in dnsName fields and some have even put those dnsName SANs in
CA certificates.
This change defers validity checking until name constraints are checked.
The tests checking for empty interfaces so that they can be fast-
tracked in the code actually didn't test the right field and the
fast track code never executed. Doing it now.
Change-Id: I58b2951efb3fb40b3366874c79fd653591ae0e99
Reviewed-on: https://go-review.googlesource.com/97519 Reviewed-by: Alan Donovan <adonovan@google.com>
Robert Griesemer [Tue, 27 Feb 2018 22:02:09 +0000 (14:02 -0800)]
go/types: fix incorrect context when type-checking interfaces
Regression, introduced by https://go-review.googlesource.com/c/go/+/79575
which meant to be more conservative but ended up destroying an important
context.
Fixes #24140.
Change-Id: Id428dbb295ce9f11ab7cd54ec5ab51ef4291ac3f
Reviewed-on: https://go-review.googlesource.com/97535 Reviewed-by: Alan Donovan <adonovan@google.com>
Richard Miller [Wed, 28 Feb 2018 14:29:27 +0000 (14:29 +0000)]
syscall: reduce redundant getwd tracking in Plan 9
In Plan 9, each M is implemented as a separate OS process with
its own working directory. To keep the wd consistent across
goroutines (or rescheduling of the same goroutine), CL 6350
introduced a Fixwd procedure which checks using getwd and calls
chdir if necessary before any syscall operating on a pathname.
This wd checking will not be necessary if the pathname is absolute
(starts with '/' or '#'). Getwd is a fairly expensive operation
in Plan 9 (implemented by opening "." and calling Fd2path on the
file descriptor). Eliminating the redundant getwd calls can
significantly reduce overhead for common operations like
"dist test --list" which perform many syscalls on absolute pathnames.
Updates #9428.
Change-Id: I13fd9380779de27b0ac2f2b488229778d6839255
Reviewed-on: https://go-review.googlesource.com/97675 Reviewed-by: David du Colombier <0intro@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: David du Colombier <0intro@gmail.com>
Richard Miller [Wed, 28 Feb 2018 09:35:55 +0000 (09:35 +0000)]
runtime: don't try to shrink address space with brk in Plan 9
Plan 9 won't let brk shrink the data segment if it's shared with
other processes (which it is in the go runtime). So we keep track
of the notional end of the segment as it moves up and down, and
call brk only when it grows.
Corrects CL 94776.
Updates #23860.
Fixes #24013.
Change-Id: I754232decab81dfd71d690f77ee6097a17d9be11
Reviewed-on: https://go-review.googlesource.com/97595 Reviewed-by: David du Colombier <0intro@gmail.com> Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Robert Griesemer [Tue, 27 Feb 2018 01:35:29 +0000 (17:35 -0800)]
cmd/compile, cmd/compile/internal/syntax: print relative column info
This change enables printing of relative column information if a
prior line directive specified a valid column. If there was no
line directive, or the line directive didn't specify a column
(or the -C flag is specified), no column information is shown in
file positions.
Implementation: Column values (and line values, for that matter)
that are zero are interpreted as "unknown". A line directive that
doesn't specify a column records that as a zero column in the
respective PosBase data structure. When computing relative columns,
a relative value is zero of the base's column value is zero.
When formatting a position, a zero column value is not printed.
To make this work without special cases, the PosBase for a file
is given a concrete (non-0:0) position 1:1 with the PosBase's
line and column also being 1:1. In other words, at the position
1:1 of a file, it's relative positions are starting with 1:1 as
one would expect.
In the package syntax, this requires self-recursive PosBases for
file bases, matching what cmd/internal/src.PosBase was already
doing. In src.PosBase, file and inlining bases also need to be
based at 1:1 to indicate "known" positions.
This change completes the cmd/compiler part of the issue below.
Fixes #22662.
Change-Id: I6c3d2dee26709581fba0d0261b1d12e93f1cba1a
Reviewed-on: https://go-review.googlesource.com/97375 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Keith Randall [Tue, 27 Feb 2018 21:46:03 +0000 (13:46 -0800)]
cmd/compile: mark the first word of an interface as a uintptr
The first word of an interface is a pointer, but for the purposes
of GC we don't need to treat it as such.
1. If it is a non-empty interface, the pointer points to an itab
which is always in persistentalloc space.
2. If it is an empty interface, the pointer points to a _type.
a. If it is a compile-time-allocated type, it points into
the read-only data section.
b. If it is a reflect-allocated type, it points into the Go heap.
Reflect is responsible for keeping a reference to
the underlying type so it won't be GCd.
If we ever have a moving GC, we need to change this for 2b (as
well as scan itabs to update their itab._type fields).
Write barriers on the first word of interfaces have already been removed.
Change-Id: I643e91d7ac4de980ac2717436eff94097c65d959
Reviewed-on: https://go-review.googlesource.com/97518
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
Explicitly whitelist args of OpSelect{1|2} that zero upper 32 bits.
Use better values in corresponding test.
This should have been a part of CL 96815, but it was submitted, before
relevant comments.
Ilya Tocar [Tue, 20 Feb 2018 16:59:19 +0000 (10:59 -0600)]
cmd/compile/internal/ssa: combine byte stores on amd64
On amd64 we optimize encoding/binary.BigEndian.PutUint{16,32,64}
into bswap + single store, but strangely enough not LittleEndian.PutUint{16,32}.
We have similar rules, but they use 64-bit shifts everywhere,
and fail for 16/32-bit case. Add rules that matchLittleEndian.PutUint,
and relevant tests. Performance results:
Heschi Kreinick [Wed, 24 Jan 2018 18:26:15 +0000 (13:26 -0500)]
cmd/link: fix up location lists for dsymutil
LLVM tools, particularly lldb and dsymutil, don't support base address
selection entries in location lists. When targeting GOOS=darwin,
mode, have the linker translate location lists to CU-relative form
instead.
Technically, this isn't necessary when linking internally, as long as
nobody plans to use anything other than Delve to look at the DWARF. But
someone might want to use lldb, and it's really confusing when dwarfdump
shows gibberish for the location entries. The performance cost isn't
noticeable, so enable it even for internal linking.
Doing this in the linker is a little weird, but it was more expensive in
the compiler, probably because the compiler is much more stressful to
the GC. Also, if we decide to only do it for external linking, the
compiler can't see the link mode.
Benchmark before and after this commit on Mac with -dwarflocationlists=1:
name old time/op new time/op delta
StdCmd 21.3s ± 1% 21.3s ± 1% ~ (p=0.310 n=27+27)
Only StdCmd is relevant, because only StdCmd runs the linker. Whatever
the cost is here, it's not very large.
Change-Id: I200246dedaee4f824966f7551ac95f8d7123d3b1
Reviewed-on: https://go-review.googlesource.com/89535 Reviewed-by: David Chase <drchase@google.com>
Philip Hofer [Wed, 21 Feb 2018 21:35:08 +0000 (13:35 -0800)]
cmd/compile/internal/ssa: clear branch likeliness in clobberBlock
The branchelim pass makes some blocks unreachable, but does not
remove them from Func.Values. Consequently, ssacheck complains
when it finds a block with a non-zero likeliness value but no
successors.
Joe Tsai [Fri, 16 Feb 2018 23:35:35 +0000 (15:35 -0800)]
go/doc: replace unexported values with underscore if necessary
When a var or const declaration contains a mixture of exported and unexported
identifiers, replace the unexported identifiers with underscore.
Otherwise, the LHS and the RHS may mismatch or the declaration may mismatch
with an iota from above.
Fixes #22426
Change-Id: Icd5fb81b4ece647232a9f7d05cb140227091e9cb
Reviewed-on: https://go-review.googlesource.com/94877
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Giovanni Bajo [Tue, 20 Feb 2018 08:39:09 +0000 (09:39 +0100)]
cmd/compile: fix bit-test rules for highest bit
Bit-test rules failed to match when matching the highest bit
of a word because operands in SSA are signed int64. Fix
them by treating them as unsigned (and correctly handling
32-bit operands as well).
Tests will be added in next CL.
Change-Id: I491c4e88e7e2f87e9bb72bd0d9fa5d4025b90736
Reviewed-on: https://go-review.googlesource.com/94765 Reviewed-by: Keith Randall <khr@golang.org>
Moving tighten after lowering benefits from the removal of values by
lowering and lowered CSE. It lets us make better decisions about
which values are rematerializable and which generate flags.
Empirically, it lowers stack usage (by avoiding spills)
and generates slightly smaller and faster binaries.
Keith Randall [Wed, 3 Jan 2018 22:38:55 +0000 (14:38 -0800)]
cmd/compile: implement comparisons directly with memory
Allow the compiler to generate code like CMPQ 16(AX), $7
It's tricky because it's difficult to spill such a comparison during
flagalloc, because the same memory state might not be available at
the restore locations.
Solve this problem by decomposing the compare+load back into its parts
if it needs to be spilled.
The big win is that the write barrier test goes from:
MOVL runtime.writeBarrier(SB), CX
TESTL CX, CX
JNE 60
Kunpei Sakai [Sun, 25 Feb 2018 09:14:20 +0000 (18:14 +0900)]
cmd/compile: fix typechecking in finishcompare
Previously, finishcompare just used SetTypecheck, but this didn't
recursively update any untyped bool typed subexpressions. This CL
changes it to call typecheck, which correctly handles this.
Also cleaned up outdated code for simplifying logic.
Updates #23834
Change-Id: Ic7f92d2a77c2eb74024ee97815205371761c1c90
Reviewed-on: https://go-review.googlesource.com/97035 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Ilya Tocar [Fri, 23 Feb 2018 19:46:44 +0000 (13:46 -0600)]
cmd/compile/internal/amd64: use appropriate NEG for div
Currently we generate NEGQ for DIV{Q,L,W}. By generating NEGL and NEGW,
we will reduce code size, because NEGL doesn't require rex prefix.
This also guarantees that upper 32 bits are zeroed, so we can revert CL 85736,
and remove zero-extensions of DIVL results.
Also adds test for redundant zero extend elimination.