Jan Mercl [Tue, 2 Aug 2016 11:00:46 +0000 (13:00 +0200)]
go/token: Fix race in FileSet.PositionFor.
Methods of FileSet are documented to be safe for concurrent use by
multiple goroutines, so FileSet is protected by a mutex and all its
methods use it to prevent concurrent mutations. All methods of File that
mutate the respective FileSet, including AddLine, do also lock its
mutex, but that does not help when PositionFor is invoked concurrently
and reads without synchronization what AddLine mutates.
The change adds acquiring a RLock around the racy call of File.position
and the respective test.
Michael Hudson-Doyle [Thu, 11 Aug 2016 22:31:17 +0000 (10:31 +1200)]
cmd/link: when dynlinking, do not mangle short symbol names
When dynamically linking, a type symbol's name is replaced with a name based on
the SHA1 of the name as type symbol's names can be very long. However, this
can make a type's symbol name longer in some cases. So skip it in that case.
One of the symbols this changes the treatment of is 'type.string' and that fixes a
bug where -X doesn't work when dynamically linking.
Fixes #16671
Change-Id: If5269038261b76fb0ec52e25a9c1d64129631e3c
Reviewed-on: https://go-review.googlesource.com/26890
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org>
Robert Griesemer [Wed, 15 Jun 2016 00:35:36 +0000 (17:35 -0700)]
go/types: fix computation of initialization order
The old algorithm operated on a dependency graph that included
all objects (including functions) for simplicity: it was based
directly on the dependencies collected for each object during
type checking an object's initialization expression. It also
used that graph to compute the objects involved in an erroneous
initialization cycle.
Cycles that consist only of (mutually recursive) functions are
permitted in initialization code; so those cycles were silently
ignored if encountered. However, such cycles still inflated the
number of dependencies a variable might have (due to the cycle),
which in some cases lead to the wrong variable being scheduled
for initialization before the one with the inflated dependency
count.
Correcting for the cycle when it is found is too late since at
that point another variable may have already been scheduled.
The new algorithm computes the initialization dependency graph as
before but adds an extra pass during which functions are eliminated
from the graph (and their dependencies are "back-propagated").
This eliminates the problem of cycles only involving functions
(there are no functions).
When a cycle is found, the new code computes the cycle path from
the original object dependencies so it can still include functions
on the path as before, for the same detailed error message.
The new code also more clearly distinguishes between objects that
can be in the dependency graph (constants, variables, functions),
and objects that cannot, by introducing the dependency type, a new
subtype of Object. As a consequence, the dependency graph is smaller.
Fixes #10709.
Change-Id: Ib58d6ea65cfb279041a0286a2c8e865f11d244eb
Reviewed-on: https://go-review.googlesource.com/24131 Reviewed-by: Alan Donovan <adonovan@google.com>
Brad Fitzpatrick [Wed, 10 Aug 2016 19:46:48 +0000 (19:46 +0000)]
syscall: test Gettimeofday everywhere, not just on Darwin
The Darwin-only restriction was because we were late in the Go 1.7
cycle when the test was added.
In the process, I noticed Gettimeofday wasn't in the "unimplemented
midden heap" section of syscall_nacl.go, despite this line in the
original go1.txt:
pkg syscall, func Gettimeofday(*Timeval) error
So, add it, returning ENOSYS like the others.
Change-Id: Id7e02e857b753f8d079bee335c22368734e92254
Reviewed-on: https://go-review.googlesource.com/26772
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Quentin Smith <quentin@golang.org>
David Chase [Wed, 10 Aug 2016 18:44:57 +0000 (11:44 -0700)]
[dev.ssa] cmd/compile: PPC64, FP to/from int conversions.
Passes ssa_test.
Requires a few new instructions and some scratchpad
memory to move data between G and F registers.
Also fixed comparisons to be correct in case of NaN.
Added missing instructions for run.bash.
Removed some FP registers that are apparently "reserved"
(but that are also apparently also unused except for a
gratuitous multiplication by two when y = x+x would work
just as well).
Cherry Zhang [Wed, 10 Aug 2016 17:24:03 +0000 (13:24 -0400)]
[dev.ssa] cmd/compile, etc.: more ARM64 optimizations, and enable SSA by default
Add more ARM64 optimizations:
- use hardware zero register when it is possible.
- use shifted ops.
The assembler supports shifted ops but not documented, nor knows
how to print it. This CL adds them.
- enable fast division.
This was disabled because it makes the old backend generate slower
code. But with SSA it generates faster code.
Keith Randall [Thu, 11 Aug 2016 19:56:36 +0000 (12:56 -0700)]
[dev.ssa] cmd/compile: simplify 386+PIC+globals a bit
We shouldn't issue instructions like MOVL foo(SB), AX directly from the
SSA backend. Instead we should do LEAL foo(SB), AX; MOVL (AX), AX.
This simplifies obj logic because now only LEAL needs to be treated
specially. The register allocator uses the LEAL to in effect allocate
the temporary register required for the shared library thunk calls.
Also, the LEALs can now be CSEd. So code like
var g int
func f() { g += 5 }
Requires only one thunk call instead of 2.
Change-Id: Ib87d465f617f73af437445871d0ea91a630b2355
Reviewed-on: https://go-review.googlesource.com/26814
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
Keith Randall [Thu, 11 Aug 2016 16:49:48 +0000 (09:49 -0700)]
[dev.ssa] cmd/compile: fix fp constant loads for 386+PIC
In position-independent 386 code, loading floating-point constants from
the constant pool requires two steps: materializing the address of
the constant pool entry (requires calling a thunk) and then loading
from that address.
Before this CL, the materializing happened implicitly in CX, which
clobbered that register.
Change-Id: Id094e0fb2d3be211089f299e8f7c89c315de0a87
Reviewed-on: https://go-review.googlesource.com/26811
Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Cherry Zhang [Wed, 3 Aug 2016 15:47:40 +0000 (11:47 -0400)]
[dev.ssa] cmd/internal/obj/arm64: fix encoding constant into some instructions
When a constant can be encoded in a logical instruction (BITCON), do
it this way instead of using the constant pool. The BITCON testing
code runs faster than table lookup (using map):
(on AMD64 machine, with pseudo random input)
BenchmarkIsBitcon-4 300000000 4.04 ns/op
BenchmarkTable-4 50000000 27.3 ns/op
The equivalent C code of BITCON testing is formally verified with
model checker CBMC against linear search of the lookup table.
Also handle cases when a constant can be encoded in a MOV instruction.
In this case, materializa the constant into REGTMP without using the
constant pool.
When constants need to be added to the constant pool, make sure to
check whether it fits in 32-bit. If not, store 64-bit.
Both legacy and SSA compiler backends are happy with this.
Keith Randall [Tue, 26 Jul 2016 18:51:33 +0000 (11:51 -0700)]
[dev.ssa] cmd/compile: implement GO386=387
Last part of the 386 SSA port.
Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.
Turn on SSA backend for 386 by default.
Fixes #16358
Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Keith Randall [Tue, 9 Aug 2016 20:58:06 +0000 (13:58 -0700)]
[dev.ssa] cmd/compile: more fixes for 386 shared libraries
Use the destination register for materializing the pc
for GOT references also. See https://go-review.googlesource.com/c/25442/
The SSA backend assumes CX does not get clobbered for these instructions.
Mark duffzero as clobbering CX. The linker needs to clobber CX
to materialize the address to call. (This affects the non-shared-library
duffzero also, but hopefully forbidding one register across duffzero
won't be a big deal.)
Hopefully this is all the cases where the linker is clobbering CX
under the hood and SSA assumes it isn't.
Change-Id: I080c938170193df57cd5ce1f2a956b68a34cc886
Reviewed-on: https://go-review.googlesource.com/26611
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
The call returns the PC of the next instruction in a register.
The next instruction then offsets from that register to get the
address required. The tricky part is the allocation of the
temp register. The legacy compiler always used CX, and forbid
the register allocator from allocating CX when in PIC mode.
We can't easily do that in SSA because CX is actually a required
register for shift instructions. (I think the old backend got away
with this because the register allocator never uses CX, only
codegen knows that shifts must use CX.)
Instead, we allow the temp register to be anything. When the
destination of the MOV (or LEA) is an integer register, we can
use that register. Otherwise, we make sure to compile the
operation using an LEA to reference the global. So
MOVL AX, foo(SB)
is never generated directly. Instead, SSA generates:
So this CL modifies the obj package to use different thunks
to materialize the pc into different registers. We use the
registers that regalloc chose so that SSA can still allocate
the full set of registers.
Change-Id: Ie095644f7164a026c62e95baf9d18a8bcaed0bba
Reviewed-on: https://go-review.googlesource.com/25442
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
Brad Fitzpatrick [Mon, 8 Aug 2016 17:14:01 +0000 (17:14 +0000)]
net/http: make Transport use new connection if over HTTP/2 concurrency limit
The Go HTTP/1 client will make as many new TCP connections as the user requests.
The HTTP/2 client tried to have that behavior, but the policy of
whether a connection is re-usable didn't take into account the extra 1
stream counting against SETTINGS_MAX_CONCURRENT_STREAMS so in practice
users were getting errors.
For example, if the server's advertised max concurrent streams is 100
and 200 concurrrent Go HTTP requests ask for a connection at once, all
200 will think they can reuse that TCP connection, but then 100 will
fail later when the number of concurrent streams exceeds 100.
Instead, recognize the "no cached connections" error value in the
shouldRetryRequest method, so those 100 will retry a new connection.
This is the conservative fix for Go 1.7 so users don't get errors, and
to match the HTTP/1 behavior. Issues #13957 and #13774 are the more
involved bugs for Go 1.8.
Updates #16582
Updates #13957
Change-Id: I1f15a7ce60c07a4baebca87675836d6fe03993e8
Reviewed-on: https://go-review.googlesource.com/25580
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Chris Broadfoot <cbro@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Brad Fitzpatrick [Sat, 6 Aug 2016 17:12:03 +0000 (10:12 -0700)]
doc: fix required OS X version inconsistency for binary downloads
Updates #16625
Change-Id: Icac6705828bd9b29379596ba64b34d922b9002c3
Reviewed-on: https://go-review.googlesource.com/25548 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Shenghou Ma [Fri, 5 Aug 2016 23:16:07 +0000 (19:16 -0400)]
runtime: make stack 16-byte aligned for external code in _rt0_amd64_linux_lib
Fixes #16618.
Change-Id: Iffada12e8672bbdbcf2e787782c497e2c45701b1
Reviewed-on: https://go-review.googlesource.com/25550
Run-TryBot: Minux Ma <minux@golang.org> Reviewed-by: Arjan Van De Ven <arjan.van.de.ven@intel.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
David Chase [Tue, 2 Aug 2016 20:17:09 +0000 (13:17 -0700)]
[dev.ssa] cmd/compile: PPC64, add cmp->bool, some shifts, hmul
Includes hmul (all widths)
compare for boolean result and simplifications
shift operations plus changes/additions for implementation
(ORN, ADDME, ADDC)
David Crawshaw [Thu, 4 Aug 2016 17:09:29 +0000 (13:09 -0400)]
runtime: initialize hash algs before typemap
When compiling with -buildmode=shared, a map[int32]*_type is created for
each extra module mapping duplicate types back to a canonical object.
This is done in the function typelinksinit, which is called before the
init function that sets up the hash functions for the map
implementation. The result is typemap becomes unusable after
runtime initialization.
The fix in this CL is to move algorithm init before typelinksinit in
the runtime setup process. (For 1.8, we may want to turn typemap into
a sorted slice of types and use binary search.)
Manually tested on GOOS=linux with:
GOHOSTARCH=386 GOARCH=386 ./make.bash && \
go install -buildmode=shared std && \
cd ../test && \
go run run.go -linkshared
Fixes #16590
Change-Id: Idc08c50cc70d20028276fbf564509d2cd5405210
Reviewed-on: https://go-review.googlesource.com/25469
Run-TryBot: David Crawshaw <crawshaw@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
[dev.ssa] cmd/compile: refactor out rulegen value parsing
Previously, genMatch0 and genResult0 contained
lots of duplication: locating the op, parsing
the value, validation, etc.
Parsing and validation was mixed in with code gen.
Extract a helper, parseValue. It is responsible
for parsing the value, locating the op, and doing
shared validation.
As a bonus (and possibly as my original motivation),
make op selection pay attention to the number
of args present.
This allows arch-specific ops to share a name
with generic ops as long as there is no ambiguity.
It also detects and reports unresolved ambiguity,
unlike before, where it would simply always
pick the generic op, with no warning.
Also use parseValue when generating the top-level
op dispatch, to ensure its opinion about ops
matches genMatch0 and genResult0.
The order of statements in the generated code used
to depend on the exact rule. It is now somewhat
independent of the rule. That is the source
of some of the generated code changes in this CL.
See rewritedec64 and rewritegeneric for examples.
It is a one-time change.
The op dispatch switch and functions used to be
sorted by opname without architecture. The sort
now includes the architecture, leading to further
generated code changes.
See rewriteARM and rewriteAMD64 for examples.
Again, it is a one-time change.
Brad Fitzpatrick [Tue, 2 Aug 2016 04:54:40 +0000 (21:54 -0700)]
runtime: fix nanotime for macOS Sierra, again.
macOS Sierra beta4 changed the kernel interface for getting time.
DX now optionally points to an address for additional info.
Set it to zero to avoid corrupting memory.
Fixes #16570
Change-Id: I9f537e552682045325cdbb68b7d0b4ddafade14a
Reviewed-on: https://go-review.googlesource.com/25400 Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Quentin Smith <quentin@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Joe Tsai [Tue, 2 Aug 2016 03:04:25 +0000 (20:04 -0700)]
cmd/doc: ensure functions with unexported return values are shown
The commit in golang.org/cl/22354 groups constructors functions under
the type that they construct to. However, this caused a minor regression
where functions that had unexported return values were not being printed
at all. Thus, we forgo the grouping logic if the type the constructor falls
under is not going to be printed.
Fixes #16568
Change-Id: Idc14f5d03770282a519dc22187646bda676af612
Reviewed-on: https://go-review.googlesource.com/25369
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Joe Tsai [Mon, 1 Aug 2016 21:33:19 +0000 (14:33 -0700)]
cmd/doc: handle embedded interfaces properly
Changes made:
* Disallow star expression on interfaces as this is not possible.
* Show an embedded "error" in an interface as public similar to
how godoc does it.
* Properly handle selector expressions in both structs and interfaces.
This is possible since a type may refer to something defined in
another package (e.g. io.Reader).
Before:
<<<
$ go doc runtime.Error
type Error interface {
// RuntimeError is a no-op function but
// serves to distinguish types that are run time
// errors from ordinary errors: a type is a
// run time error if it has a RuntimeError method.
RuntimeError()
// Has unexported methods.
}
$ go doc compress/flate Reader
doc: invalid program: unexpected type for embedded field
doc: invalid program: unexpected type for embedded field
type Reader interface {
io.Reader
io.ByteReader
}
>>>
After:
<<<
$ go doc runtime.Error
type Error interface {
error
// RuntimeError is a no-op function but
// serves to distinguish types that are run time
// errors from ordinary errors: a type is a
// run time error if it has a RuntimeError method.
RuntimeError()
}
$ go doc compress/flate Reader
type Reader interface {
io.Reader
io.ByteReader
}
>>>
Fixes #16567
Change-Id: I272dede971eee9f43173966233eb8810e4a8c907
Reviewed-on: https://go-review.googlesource.com/25365 Reviewed-by: Rob Pike <r@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
cmd/compile: fix possible spill of invalid pointer with DUFFZERO on AMD64
SSA compiler on AMD64 may spill Duff-adjusted address as scalar. If
the object is on stack and the stack moves, the spilled address become
invalid.
Making the spill pointer-typed does not work. The Duff-adjusted address
points to the memory before the area to be zeroed and may be invalid.
This may cause stack scanning code panic.
Fix it by doing Duff-adjustment in genValue, so the intermediate value
is not seen by the reg allocator, and will not be spilled.
Add a test to cover both cases. As it depends on allocation, it may
be not always triggered.
doc/go1.7.html: add known issues section for FreeBSD crashes
Updates #16396
Change-Id: I7b4f85610e66f2c77c17cf8898cc41d81b2efc8c
Reviewed-on: https://go-review.googlesource.com/25283 Reviewed-by: Chris Broadfoot <cbro@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Andrew Gerrand <adg@golang.org>
[dev.ssa] cmd/compile: fix build for old backend on ARM64
Apparently the old backend needs NEG instruction having RegRead set,
even this instruction does not take a Reg field... I don't think SSA
uses this flag, so just leave it as it was. SSA is still happy.
Fix ARM64 build on https://build.golang.org/?branch=dev.ssa
Change-Id: Ia7e7f2ca217ddae9af314d346af5406bbafb68e8
Reviewed-on: https://go-review.googlesource.com/25302 Reviewed-by: David Chase <drchase@google.com>
Rhys Hiltner [Fri, 22 Jul 2016 23:36:30 +0000 (16:36 -0700)]
runtime: reduce GC assist extra credit
Mutator goroutines that allocate memory during the concurrent mark
phase are required to spend some time assisting the garbage
collector. The magnitude of this mandatory assistance is proportional
to the goroutine's allocation debt and subject to the assistance
ratio as calculated by the pacer.
When assisting the garbage collector, a mutator goroutine will go
beyond paying off its allocation debt. It will build up extra credit
to amortize the overhead of the assist.
In fast-allocating applications with high assist ratios, building up
this credit can take the affected goroutine's entire time slice.
Reduce the penalty on each goroutine being selected to assist the GC
in two ways, to spread the responsibility more evenly.
First, do a consistent amount of extra scan work without regard for
the pacer's assistance ratio. Second, reduce the magnitude of the
extra scan work so it can be completed within a few hundred
microseconds.
Commentary on gcOverAssistWork is by Austin Clements, originally in
https://golang.org/cl/24704
[dev.ssa] cmd/compile: fix possible invalid pointer spill in large Zero/Move on ARM
Instead of comparing the address of the end of the memory to zero/copy,
comparing the address of the last element, which is a valid pointer.
Also unify large and unaligned Zero/Move, by passing alignment as AuxInt.
Support the following:
- Shifts. ARM64 machine instructions only use lowest 6 bits of the
shift (i.e. mod 64). Use conditional selection instruction to
ensure Go semantics.
- Zero/Move. Alignment is ensured.
- Hmul, Avg64u, Sqrt.
- reserve R18 (platform register in ARM64 ABI) and R29 (frame pointer
in ARM64 ABI).
Everything compiles, all.bash passed (with non-SSA test disabled).
Currently the pprof package gives almost no guidance for how to use it
and, despite the standard boilerplate used to create CPU and memory
profiles, this boilerplate appears nowhere in the pprof documentation.
Update the pprof package documentation to give the standard
boilerplate in a form people can copy, paste, and tweak. This
boilerplate is based on rsc's 2011 blog post on profiling Go programs
at https://blog.golang.org/profiling-go-programs, which is where I
always go when I need to copy-paste the boilerplate.
Change-Id: I74021e494ea4dcc6b56d6fb5e59829ad4bb7b0be
Reviewed-on: https://go-review.googlesource.com/25182 Reviewed-by: Rick Hudson <rlh@golang.org>
Jack Lindamood [Fri, 15 Jul 2016 20:28:27 +0000 (13:28 -0700)]
context: add test for WithDeadline in the past
Adds a test case for calling context.WithDeadline() where the deadline
exists in the past. This change increases the code coverage of the
context package.
net/http: make Transport.RoundTrip return raw Conn.Read error on peek failure
From at least Go 1.4 to Go 1.6, Transport.RoundTrip would return the
error value from net.Conn.Read directly when the initial Read (1 byte
Peek) failed while reading the HTTP response, if a request was
outstanding. While never a documented or tested promise, Go 1.7 changed the
behavior (starting at https://golang.org/cl/23160).
This restores the old behavior and adds a test (but no documentation
promises yet) while keeping the fix for spammy logging reported in #15446.
This looks larger than it is: it just changes errServerClosedConn from
a variable to a type, where the type preserves the underlying
net.Conn.Read error, for unwrapping later in Transport.RoundTrip.
Fixes #16465
Change-Id: I6fa018991221e93c0cfe3e4129cb168fbd98bd27
Reviewed-on: https://go-review.googlesource.com/25153 Reviewed-by: Andrew Gerrand <adg@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Because PPC lacks store-immediate, remove the instruction
that implies that it exists. Replace it with storezero for
the special case of storing zero, because R0 is reserved zero
for Go (though the assembler knows this, do it in SSA).
Also added address folding for storezero.
(Now corrected to use right-sized stores in bulk-zero code.)
Keith Randall [Thu, 21 Jul 2016 17:37:59 +0000 (10:37 -0700)]
[dev.ssa] cmd/compile: 386 port now works
GOARCH=386 SSATEST=1 ./all.bash passes
Caveat: still needs changes to test/ files to use *_ssa.go versions. I
won't check those changes in with this CL because the builders will
complain as they don't have SSATEST=1.
Mostly minor fixes.
Implement float <-> uint32 in assembly. It seems the simplest option
for now.
GO386=387 does not work. That's why I can't make SSA the default for
386 yet.
It was removed in upstream Chrome https://codereview.chromium.org/2016863004
Rather than update to the latest version, make the minimal change for Go 1.7 and
change the "showToUser" boolean from true to false.
Tested by hand that it goes away after this change.
Updates #16247
Change-Id: I051f49da878e554b1a34a88e9abc70ab50e18780
Reviewed-on: https://go-review.googlesource.com/25117 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
David Chase [Wed, 20 Jul 2016 14:44:49 +0000 (10:44 -0400)]
cmd/compile: change phi location to be optimistic at backedges
This is:
(1) a simple trick that cuts the number of phi-nodes
(temporarily) inserted into the ssa representation by a factor
of 10, and can cut the user time to compile tricky inputs like
gogo/protobuf tests from 13 user minutes to 9.5, and memory
allocation from 3.4GB to 2.4GB.
(2) a fix to sparse lookup, that does not rely on
an assumption proven false by at least one pathological
input "etldlen".
These two changes fix unrelated compiler performance bugs,
both necessary to obtain good performance compiling etldlen.
Without them it takes 20 minutes or longer, with them it
completes in 2 minutes, without a gigantic memory footprint.
Updates #16407
Change-Id: Iaa8aaa8c706858b3d49de1c4865a7fd79e6f4ff7
Reviewed-on: https://go-review.googlesource.com/23136 Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Keith Randall [Tue, 19 Jul 2016 06:06:04 +0000 (23:06 -0700)]
cmd/compile: move phi args which are constants closer to the phi
entry:
x = MOVQconst [7]
...
b1:
goto b2
b2:
v = Phi(x, y, z)
Transform that program to:
entry:
...
b1:
x = MOVQconst [7]
goto b2
b2:
v = Phi(x, y, z)
This CL moves constant-generating instructions used by a phi to the
appropriate immediate predecessor of the phi's block.
We used to put all constants in the entry block. Unfortunately, in
large functions we have lots of constants at the start of the
function, all of which are used by lots of phis throughout the
function. This leads to the constants being live through most of the
function (especially if there is an outer loop). That's an O(n^2)
problem.
Note that most of the non-phi uses of constants have already been
folded into instructions (ADDQconst, MOVQstoreconst, etc.).
This CL may be generally useful for other instances of compiler
slowness, I'll have to check. It may cause some programs to run
slower, but probably not by much, as rematerializeable values like
these constants are allocated late (not at their originally scheduled
location) anyway.
This CL is definitely a minimal change that can be considered for 1.7.
We probably want to do a better job in the tighten pass generally, not
just for phi args. Leaving that for 1.8.
Ian Lance Taylor [Wed, 20 Jul 2016 22:40:10 +0000 (15:40 -0700)]
runtime: add explicit `INT $3` at end of Darwin amd64 sigtramp
The omission of this instruction could confuse the traceback code if a
SIGPROF occurred during a signal handler. The traceback code would
trace up to sigtramp, but would then get confused because it would see a
PC address that did not appear to be in the function.
runtime: support smaller physical pages than PhysPageSize
Most operations need an upper bound on the physical page size, which
is what sys.PhysPageSize is for (this is checked at runtime init on
Linux). However, a few operations need a *lower* bound on the physical
page size. Introduce a "minPhysPageSize" constant to act as this lower
bound and use it where it makes sense:
1) In addrspace_free, we have to query each page in the given range.
Currently we increment by the upper bound on the physical page
size, which means we may skip over pages if the true size is
smaller. Worse, we currently pass a result buffer that only has
enough room for one page. If there are actually multiple pages in
the range passed to mincore, the kernel will overflow this buffer.
Fix these problems by incrementing by the lower-bound on the
physical page size and by passing "1" for the length, which the
kernel will round up to the true physical page size.
2) In the write barrier, the bad pointer check tests for pointers to
the first physical page, which are presumably small integers
masquerading as pointers. However, if physical pages are smaller
than we think, we may have legitimate pointers below
sys.PhysPageSize. Hence, use minPhysPageSize for this test since
pointers should never fall below that.
In particular, this applies to ARM64 and MIPS. The runtime is
configured to use 64kB pages on ARM64, but by default Linux uses 4kB
pages. Similarly, the runtime assumes 16kB pages on MIPS, but both 4kB
and 16kB kernel configurations are common. This also applies to ARM on
systems where the runtime is recompiled to deal with a larger page
size. It is also a step toward making the runtime use only a
dynamically-queried page size.
Change-Id: I1fdfd18f6e7cbca170cc100354b9faa22fde8a69
Reviewed-on: https://go-review.googlesource.com/25020 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
The leak was reported internally on a sever canary that runs for days.
After a day server consumes 5.6GB, after 6 days -- 12.2GB.
The leak is exposed by the added benchmark.
The leak is fixed upstream in :
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/tsan/rtl/tsan_rtl_thread.cc?view=diff&r1=276102&r2=276103&pathrev=276103
Fixes #16441
Change-Id: I9d4f0adef48ca6cf2cd781b9a6990ad4661ba49b
Reviewed-on: https://go-review.googlesource.com/25091 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Ian Lance Taylor [Tue, 19 Jul 2016 06:00:43 +0000 (23:00 -0700)]
runtime: add as many extra M's as needed
When a non-Go thread calls into Go, the runtime needs an M to run the Go
code. The runtime keeps a list of extra M's available. When the last
extra M is allocated, the needextram field is set to tell it to allocate
a new extra M as soon as it is running in Go. This ensures that an extra
M will always be available for the next thread.
However, if many threads need an extra M at the same time, this
serializes them all. One thread will get an extra M with the needextram
field set. All the other threads will see that there is no M available
and will go to sleep. The one thread that succeeded will create a new
extra M. One lucky thread will get it. All the other threads will see
that there is no M available and will go to sleep. The effect is
thundering herd, as all the threads looking for an extra M go through
the process one by one. This seems to have a particularly bad effect on
the FreeBSD scheduler for some reason.
With this change, we track the number of threads waiting for an M, and
create all of them as soon as one thread gets through. This still means
that all the threads will fight for the lock to pick up the next M. But
at least each thread that gets the lock will succeed, instead of going
to sleep only to fight again.
This smooths out the performance greatly on FreeBSD, reducing the
average wall time of `testprogcgo CgoCallbackGC` by 74%. On GNU/Linux
the average wall time goes down by 9%.
Fixes #13926
Fixes #16396
Change-Id: I6dc42a4156085a7ed4e5334c60b39db8f8ef8fea
Reviewed-on: https://go-review.googlesource.com/25047
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>