Tamir Duberstein [Sun, 8 Nov 2015 00:30:25 +0000 (19:30 -0500)]
cmd/internal/ld: skip dwarf output if dsymutil no-ops
Fixes #11994.
Change-Id: Icee6ffa6e3a9d15b68b4ae9b2716d65ecbdba73a
Reviewed-on: https://go-review.googlesource.com/16702 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Brad Fitzpatrick [Sun, 8 Nov 2015 10:45:04 +0000 (11:45 +0100)]
net/http: update bundled http2 revision
Updates to git rev 042ba42f (https://golang.org/cl/16734)
This moves all the code for glueing the HTTP1 and HTTP2 transports
together out of net/http and into x/net/http2 where others can use it,
and where it has tests.
Keith Randall [Thu, 5 Nov 2015 20:39:56 +0000 (12:39 -0800)]
runtime: memmove/memclr pointers atomically
Make sure that we're moving or zeroing pointers atomically.
Anything that is a multiple of pointer size and at least
pointer aligned might have pointers in it. All the code looks
ok except for the 1-pointer-sized moves.
Keith Randall [Fri, 6 Nov 2015 08:06:52 +0000 (00:06 -0800)]
runtime: teach peephole optimizer that duffcopy clobbers X0
Duffcopy now uses X0, as of 5cf281a. Teach the peephole
optimizer that duffcopy clobbers X0 so that it does not
rename registers use X0 across the duffcopy instruction.
Fixes #13171
Change-Id: I389cbf1982cb6eb2f51e6152ac96736a8589f085
Reviewed-on: https://go-review.googlesource.com/16715
Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Joe Tsai [Thu, 1 Oct 2015 09:30:29 +0000 (02:30 -0700)]
archive/tar: detect truncated files
Motivation:
* Reader.skipUnread never reports io.ErrUnexpectedEOF. This is strange
given that io.ErrUnexpectedEOF is given through Reader.Read if the
user manually reads the file.
* Reader.skipUnread fails to detect truncated files since io.Seeker
is lazy about reporting errors. Thus, the behavior of Reader differs
whether the input io.Reader also satisfies io.Seeker or not.
To solve this, we seek to one before the end of the data section and
always rely on at least one call to io.CopyN. If the tr.r satisfies
io.Seeker, this is guarunteed to never read more than blockSize.
David du Colombier [Thu, 5 Nov 2015 09:20:32 +0000 (10:20 +0100)]
cmd/go: skip TestBuildOutputToDevNull on Plan 9
TestBuildOutputToDevNull was added in CL 16585.
However, copying to /dev/null couldn't work on Plan 9,
because /dev/null is a regular file. Since it's not
different from any other file, the logic in copyFile
couldn't distinguish it from another, already existing,
file, that we wouldn't want to overwrite.
Change-Id: Ie8d353f318fedfc7cfb9541fed00a2397e232592
Reviewed-on: https://go-review.googlesource.com/16691 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: David du Colombier <0intro@gmail.com>
Michael Hudson-Doyle [Thu, 5 Nov 2015 21:10:07 +0000 (10:10 +1300)]
cmd/internal/obj/ppc64: fix assembly of SRADCC with immediate
sradi and sradi. hide the top bit of their immediate argument apart from the
rest of it, but the code only handled the sradi case.
I'm pretty sure this is the only instruction missing (a couple of the rotate
instructions encode their immediate the same way but their handling looks OK).
This fixes the failure of "GOARCH=amd64 ~/go/bin/go install -v runtime" as
reported in the bug.
Fixes #11987
Change-Id: I0cdefcd7a04e0e8fce45827e7054ffde9a83f589
Reviewed-on: https://go-review.googlesource.com/16710 Reviewed-by: Minux Ma <minux@golang.org>
David du Colombier [Thu, 5 Nov 2015 08:41:55 +0000 (09:41 +0100)]
cmd/go: skip TestGoGenerateEnv on Plan 9
TestGoGenerateEnv was added in CL 16537.
However, Plan 9 doesn't have the env command.
Change-Id: I5f0c937a1b9b456dcea41ceac7865112f2f65c45
Reviewed-on: https://go-review.googlesource.com/16690 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David du Colombier <0intro@gmail.com>
Austin Clements [Tue, 27 Oct 2015 21:34:11 +0000 (17:34 -0400)]
runtime: remove GC start up/shutdown workaround in mallocgc
Currently mallocgc detects if the GC is in a state where it can't
assist, but also can't allocate uncontrolled and yields to help out
the GC. This was a workaround for periods when we were trying to
schedule the GC coordinator. It is no longer necessary because there
is no GC coordinator and malloc can always assist with any GC
transitions that are necessary.
Austin Clements [Mon, 26 Oct 2015 15:27:37 +0000 (11:27 -0400)]
runtime: decentralize mark done and mark termination
This moves all of the mark 1 to mark 2 transition and mark termination
to the mark done transition function. This means these transitions are
now handled on the goroutine that detected mark completion. This also
means that the GC coordinator and the background completion barriers
are no longer used and various workarounds to yield to the coordinator
are no longer necessary. These will be removed in follow-up commits.
One consequence of this is that mark workers now need to be
preemptible when performing the mark done transition. This allows them
to stop the world and to perform the final clean-up steps of GC after
restarting the world. They are only made preemptible while performing
this transition, so if the worker findRunnableGCWorker would schedule
isn't available, we didn't want to schedule it anyway.
Austin Clements [Mon, 26 Oct 2015 20:48:36 +0000 (16:48 -0400)]
runtime: account mark worker time before gcMarkDone
Currently gcMarkDone takes basically no time, so it's okay to account
the worker time after calling it. However, gcMarkDone is about to take
potentially *much* longer because it may perform all of mark
termination. Prepare for this by swapping the order so we account the
time before calling gcMarkDone.
Austin Clements [Sun, 25 Oct 2015 01:30:59 +0000 (21:30 -0400)]
runtime: factor mark done transition
Currently the code for completion of mark 1/mark 2 is duplicated in
background workers and assists. Factor this in to a single function
that will serve as the transition function for concurrent mark.
Austin Clements [Fri, 23 Oct 2015 21:44:11 +0000 (17:44 -0400)]
runtime: eliminate mark completion in scheduler
Currently, findRunnableGCWorker will perform mark completion if there
is no remaining work and no running workers. This used to be necessary
to resolve a race in the transition from mark 1 to mark 2 where we
would enter mark 2 with no mark work (and no dedicated workers), so no
workers would run, so no worker would signal mark completion.
However, we're about to make mark completion also perform the entire
follow-on process, which includes mark termination. We really don't
want to do that in the scheduler if it happens to detect completion.
Conveniently, this hack is no longer necessary because we always
enqueue root scanning work at the beginning of both mark 1 and mark 2,
so a mark worker will always run. Hence, we can simply eliminate it.
Austin Clements [Wed, 4 Nov 2015 00:47:07 +0000 (19:47 -0500)]
runtime: don't start idle mark workers when barriers are cleared
Currently, we don't start dedicated or fractional mark workers unless
the mark 1 or mark 2 barriers have been cleared. One intended
consequence of this is that no background workers run between the
forEachP that disposes all gcWork caches and the beginning of mark 2.
However, we (unintentionally) did not apply this restriction to idle
mark workers. As a result, these can start in the interim between mark
1 completion and mark 2 starting. This explains why it was necessary
to reset the root marking jobs using carefully ordered atomic writes
when setting up mark 2. It also means that, even though we definitely
enqueue work before starting mark 2, it may be drained by the time we
reset the mark 2 barrier. If this happens, currently the only thing
preventing the runtime from deadlocking is that the scheduler itself
also checks for mark completion and will signal mark 2 completion.
Were it not for the odd behavior of idle workers, this check in the
scheduler would not be necessary.
Clean all of this up and prepare to remove this check in the scheduler
by applying the same restriction to starting idle mark workers.
Austin Clements [Fri, 23 Oct 2015 19:55:03 +0000 (15:55 -0400)]
runtime: decentralize sweep termination and mark transition
This moves all of GC initialization, sweep termination, and the
transition to concurrent marking in to the off->mark transition
function. This means it's now handled on the goroutine that detected
the state exit condition.
As a result, malloc no longer needs to Gosched() at the beginning of
the GC cycle to prevent over-allocation while the GC is starting up
because it will now *help* the GC to start up. The Gosched hack is
still necessary during GC shutdown (this is easy to test by enabling
gctrace and hitting Ctrl-S to block the gctrace output).
At this point, the GC coordinator still handles later phases. This
requires a small tweak to how we start the GC coordinator. Currently,
starting the GC coordinator is best-effort and may fail if the
coordinator is about to park from the previous cycle but hasn't yet.
We fix this by replacing the park/ready to wake up the coordinator
with a semaphore. This is temporary since the coordinator will be
going away in a few commits.
This moves concurrent sweep termination from the coordinator to the
off->mark transition. This allows it to be performed by all Gs
attempting to start the GC.
Austin Clements [Fri, 23 Oct 2015 18:15:18 +0000 (14:15 -0400)]
runtime: beginning of decentralized off->mark transition
This begins the conversion of the centralized GC coordinator to a
decentralized state machine by introducing the internal API that
triggers the first state transition from _GCoff to _GCmark (or
_GCmarktermination).
This change introduces the transition lock, the off->mark transition
condition (which is very similar to shouldtriggergc()), and the
general structure of a state transition. Since we're doing this
conversion in stages, it then falls back to the GC coordinator to
actually execute the cycle. We'll start moving logic out of the GC
coordinator and in to transition functions next.
This fixes a minor bug in gcstoptheworld debug mode where passing the
heap trigger once could trigger multiple STW GCs.
Austin Clements [Sun, 25 Oct 2015 01:19:52 +0000 (21:19 -0400)]
runtime: move concurrent mark setup off system stack
For historical reasons we currently do a lot of the concurrent mark
setup on the system stack. In fact, at this point the one and only
thing that needs to happen on the system stack is the start-the-world.
Clean up this code by lifting everything other than the
start-the-world off the system stack.
The diff for this change looks large, but the only code change is to
narrow the systemstack call. Everything else is re-indentation.
Austin Clements [Fri, 23 Oct 2015 19:17:04 +0000 (15:17 -0400)]
runtime: lift state variables from func gc to var work
We're about to split func gc across several functions, so lift the
local variables it uses for tracking statistics and state across the
cycle into the global "work" variable.
Ian Lance Taylor [Thu, 5 Nov 2015 01:38:53 +0000 (17:38 -0800)]
cmd/go: change ar argument to rc
Put 'r' first because that is the command, and 'c' is the modifier.
Keep 'c' because it means to not warn when creating an archive.
Drop 'u' because it is unnecessary and fails on Arch Linux.
No test because this is only for gccgo (I tested it manually).
Fixes #12310.
Change-Id: Id740257fb1c347dfaa60f7d613af2897dae2c059
Reviewed-on: https://go-review.googlesource.com/16664
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org>
Keith Randall [Fri, 20 Mar 2015 21:12:05 +0000 (14:12 -0700)]
runtime: simplify buffered channels.
This change removes the retry mechanism we use for buffered channels.
Instead, any sender waking up a receiver or vice versa completes the
full protocol with its counterpart. This means the counterpart does
not need to relock the channel when it wakes up. (Currently
buffered channels need to relock on wakeup.)
For sends on a channel with waiting receivers, this change replaces
two copies (sender->queue, queue->receiver) with one (sender->receiver).
For receives on channels with a waiting sender, two copies are still required.
This change unifies to a large degree the algorithm for buffered
and unbuffered channels, simplifying the overall implementation.
Shenghou Ma [Thu, 5 Nov 2015 02:20:33 +0000 (21:20 -0500)]
crypto/x509: add /etc/ssl/certs to certificate directories
Fixes #12139.
Change-Id: Ied760ac37e2fc21ef951ae872136dc3bfd49bf9f
Reviewed-on: https://go-review.googlesource.com/16671 Reviewed-by: Ian Lance Taylor <iant@golang.org>
While here, enable getrandom on arm64 too (using the value found in
include/uapi/asm-generic/unistd.h, which seems to match up with other
GOARCH=arm64 syscall numbers).
Updates #10848.
Change-Id: I5ab36ccf6ee8d5cc6f0e1a61d09f0da7410288b9
Reviewed-on: https://go-review.googlesource.com/16662
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Austin Clements [Mon, 26 Oct 2015 21:07:02 +0000 (17:07 -0400)]
runtime: make putfull start mark workers
Currently we depend on the good graces and timing of the scheduler to
get opportunities to start dedicated mark workers. In the worst case,
it may take 10ms to get dedicated mark workers going at the beginning
of mark 1 and mark 2 or after the amount of available work has dropped
and gone back up.
Instead of waiting for the regular preemption logic to get around to
us, make putfull enlist a random P if we're not already running enough
dedicated workers. This should improve performance stability of the
garbage collector and is likely to improve the overall performance
somewhat.
No overall effect on the go1 benchmarks. It speeds up the garbage
benchmark by 12%, which more than counters the performance loss from
the previous commit.
name old time/op new time/op delta
XBenchGarbage-12 6.32ms ± 4% 5.58ms ± 2% -11.68% (p=0.000 n=20+16)
Austin Clements [Mon, 26 Oct 2015 20:29:25 +0000 (16:29 -0400)]
runtime: eliminate getfull barrier from concurrent mark
Currently dedicated mark workers participate in the getfull barrier
during concurrent mark. However, the getfull barrier wasn't designed
for concurrent work and this causes no end of headaches.
In the concurrent setting, participants come and go. This makes mark
completion susceptible to live-lock: since dedicated workers are only
periodically polling for completion, it's possible for the program to
be in some transient worker each time one of the dedicated workers
wakes up to check if it can exit the getfull barrier. It also
complicates reasoning about the system because dedicated workers
participate directly in the getfull barrier, but transient workers
must instead use trygetfull because they have exit conditions that
aren't captured by getfull (e.g., fractional workers exit when
preempted). The complexity of implementing these exit conditions
contributed to #11677. Furthermore, the getfull barrier is inefficient
because we could be running user code instead of spinning on a P. In
effect, we're dedicating 25% of the CPU to marking even if that means
we have to spin to make that 25%. It also causes issues on Windows
because we can't actually sleep for 100µs (#8687).
Fix this by making dedicated workers no longer participate in the
getfull barrier. Instead, dedicated workers simply return to the
scheduler when they fail to get more work, regardless of what others
workers are doing, and the scheduler only starts new dedicated workers
if there's work available. Everything that needs to be handled by this
barrier is already handled by detection of mark completion.
This makes the system much more symmetric because all workers and
assists now use trygetfull during concurrent mark. It also loosens the
25% CPU target so that we can give some of that 25% back to user code
if there isn't enough work to keep the mark worker busy. And it
eliminates the problematic 100µs sleep on Windows during concurrent
mark (though not during mark termination).
The downside of this is that if we hit a bottleneck in the heap graph
that then expands back out, the system may shut down dedicated workers
and take a while to start them back up. We'll address this in the next
commit.
Updates #12041 and #8687.
No effect on the go1 benchmarks. This slows down the garbage benchmark
by 9%, but we'll more than make it up in the next commit.
name old time/op new time/op delta
XBenchGarbage-12 5.80ms ± 2% 6.32ms ± 4% +9.03% (p=0.000 n=20+20)
Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b
Reviewed-on: https://go-review.googlesource.com/16297 Reviewed-by: Rick Hudson <rlh@golang.org>
David Crawshaw [Wed, 4 Nov 2015 16:21:55 +0000 (11:21 -0500)]
misc/ios: keep whole buffer in go_darwin_arm_exec
The existing go_darwin_arm_exec.go script does not work with Xcode 7,
not due to any significant changes, but just ordering and timing of
statements from lldb. Unfortunately the current design of
go_darwin_arm_exec.go makes it not obvious what gets stuck where, so
this moves from a moving buffer window to a complete buffer of the
lldb output.
The result is easier code to follow, and it works with Xcode 7.
Updates #12660.
Change-Id: I3b8b890b0bf4474119482e95d84e821a86d1eaed
Reviewed-on: https://go-review.googlesource.com/16634 Reviewed-by: Michael Matloob <matloob@golang.org>
Brad Fitzpatrick [Wed, 4 Nov 2015 07:17:57 +0000 (02:17 -0500)]
syscall: allow nacl's fake network code to Listen twice on the same address
Noticed from nacl trybot failures on new tests in
https://golang.org/cl/16630
Related earlier fix of mine to nacl's listen code:
syscall: fix nacl listener to not accept connections once closed
https://go-review.googlesource.com/15940
Perhaps a better fix (in the future?) would be to remove the listener
from the map at close, but that didn't seem entirely straightforward
last time I looked into it. It's not my code, but it seems that the
map entry continues to have a purpose even after Listener close. (?)
But given that this code is only really used for running tests and the
playground, this seems fine.
Change-Id: I43bfedc57c07f215f4d79c18f588d3650687a48f
Reviewed-on: https://go-review.googlesource.com/16650
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
David Crawshaw [Wed, 4 Nov 2015 15:41:05 +0000 (10:41 -0500)]
cmd/vet: use testenv
Fix for iOS builder.
Change-Id: I5b6c977b187446c848182a9294d5bed6b5f9f6e4
Reviewed-on: https://go-review.googlesource.com/16633
Run-TryBot: David Crawshaw <crawshaw@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
unknown [Wed, 4 Nov 2015 12:55:21 +0000 (15:55 +0300)]
crypto/md5: uniform Write func
Unification of implementation of existing md5.Write function
with other implementations (sha1, sha256, sha512).
Change-Id: I58ae02d165b17fc221953a5b4b986048b46c0508
Reviewed-on: https://go-review.googlesource.com/16621 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Mohit Agarwal [Mon, 2 Nov 2015 18:34:46 +0000 (00:04 +0530)]
cmd/go: check if destination is a regular file
builder.copyFile ensures that the destination is an object file. This
wouldn't be true if we are not writing to a regular file and the copy
fails. Check if the destination is an object file only if we are
writing to a regular file. While removing the file, ensure that it is a
regular file so that device files and such aren't removed when running
as a user with suggicient privileges.
Fixes #12407
Change-Id: Ie86ce9770fa59aa56fc486a5962287859b69db3d
Reviewed-on: https://go-review.googlesource.com/16585 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Austin Clements [Mon, 2 Nov 2015 21:45:07 +0000 (16:45 -0500)]
cmd/compile: add go:nowritebarrierrec annotation
This introduces a recursive variant of the go:nowritebarrier
annotation that prohibits write barriers not only in the annotated
function, but in all functions it calls, recursively. The error
message gives the shortest call stack from the annotated function to
the function containing the prohibited write barrier, including the
names of the functions and the line numbers of the calls.
To demonstrate the annotation, we apply it to gcmarkwb_m, the write
barrier itself.
This is a new annotation rather than a modification of the existing
go:nowritebarrier annotation because, for better or worse, there are
many go:nowritebarrier functions that do call functions with write
barriers. In most of these cases this is benign because the annotation
was conservative, but it prohibits simply coopting the existing
annotation.
Change-Id: I225ca483c8f699e8436373ed96349e80ca2c2479
Reviewed-on: https://go-review.googlesource.com/16554 Reviewed-by: Keith Randall <khr@golang.org>
Brad Fitzpatrick [Tue, 3 Nov 2015 20:04:20 +0000 (12:04 -0800)]
net/http: don't panic after request if Handler sets Request.Body to nil
The Server's server goroutine was panicing (but recovering) when
cleaning up after handling a request. It was pretty harmless (it just
closed that one connection and didn't kill the whole process) but it
was distracting.
Ian Lance Taylor [Mon, 26 Oct 2015 20:58:23 +0000 (13:58 -0700)]
cmd/compile: make sure instrumented call has type width
The width of the type of an external variable defined with a type
literal may not be set when the instrumentation pass is run. There are
two cases in the standard library that fail without the call to dowidth:
Austin Clements [Fri, 16 Oct 2015 13:53:11 +0000 (09:53 -0400)]
runtime: cache two workbufs to reduce contention
Currently the gcWork abstraction caches a single work buffer. As a
result, if a worker is putting and getting pointers right at the
boundary of a work buffer, it can flap between work buffers and
(potentially significantly) increase contention on the global work
buffer lists.
This change modifies gcWork to instead cache two work buffers and
switch off between them. This introduces one buffers' worth of
hysteresis and eliminates the above performance worst case by
amortizing the cost of getting or putting a work buffer over at least
one buffers' worth of work.
In practice, it's difficult to trigger this worst case with reasonably
large work buffers. On the garbage benchmark, this reduces the max
writes/sec to the global work list from 32K to 25K and the median from
6K to 5K. However, if a workload were to trigger this worst case
behavior, it could significantly drive up this contention.
This has negligible effects on the go1 benchmarks and slightly speeds
up the garbage benchmark.
name old time/op new time/op delta
XBenchGarbage-12 5.90ms ± 3% 5.83ms ± 4% -1.18% (p=0.011 n=18+18)
Dmitry Vyukov [Tue, 3 Nov 2015 11:19:15 +0000 (12:19 +0100)]
runtime: fix finalization and profiling of tiny allocations
Handling of special records for tiny allocations has two problems:
1. Once we queue a finalizer we mark the object. As the result any
subsequent finalizers for the same object will not be queued
during this GC cycle. If we have 16 finalizers setup (the worst case),
finalization will take 16 GC cycles. This is what caused misbehave
of tinyfin.go. The actual flakiness was caused by the fact that fing
is asynchronous and don't always run before the check.
2. If a tiny block has both finalizer and profile specials,
it is possible that we both queue finalizer, preserve the object live
and free the profile record. As the result heap profile can be skewed.
Fix both issues by analyzing all special records for a single object at once.
Also, make tinyfin test stricter and remove reliance on real time.
Also, add a test for the problem 2. Currently heap profile missed about
a half of live memory.
Austin Clements [Fri, 2 Oct 2015 13:51:12 +0000 (09:51 -0400)]
runtime: enlarge GC work buffer size
Currently the GC work buffers are only 256 bytes and hence can record
only 24 64-bit pointer. They were reduced from 4K in commits db7fd1c
and a15818f as a way to minimize the amount of work the per-P workbuf
caches could "hide" from the mark phase and carry in to the mark
termination phase. However, this approach wasn't very robust and we
later added a "mark 2" phase to address this problem head-on.
Because of mark 2, there's now no benefit to having very small work
buffers. But there are plenty of downsides: small work buffers
increase contention on the work lists, increase the frequency and
hence net overhead of acquiring and releasing work buffers, and
somewhat increase memory overhead of the GC.
This commit expands work buffers back to 4K (504 64-bit pointers).
This reduces the rate of writes to work.full in the garbage benchmark
from a peak of ~780,000 writes/sec to a peak of ~32,000 writes/sec.
This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark.
name old time/op new time/op delta
XBenchGarbage-12 5.37ms ± 5% 5.60ms ± 2% +4.37% (p=0.000 n=20+20)
Austin Clements [Thu, 15 Oct 2015 21:58:17 +0000 (17:58 -0400)]
runtime: make assists preemptible
Currently, assists are non-preemptible, which means a heavily
assisting G can block other Gs from running. At the beginning of a GC
cycle, it can also delay scang, which will spin until the assist is
done. Since scanning is currently done sequentially, this can
seriously extend the length of the scan phase.
Fix this by making assists preemptible. Since the assist holds work
buffers and runs on the system stack, this must be done cooperatively:
we make gcDrainN return on preemption, and make the assist return from
the system stack and voluntarily Gosched.
This is prerequisite to enlarging the work buffers. Without this
change, the delays and spinning in scang increase significantly.
This has no effect on the go1 benchmarks.
name old time/op new time/op delta
XBenchGarbage-12 5.72ms ± 4% 5.37ms ± 5% -6.11% (p=0.000 n=20+20)
Austin Clements [Thu, 15 Oct 2015 01:31:33 +0000 (21:31 -0400)]
runtime: replace assist sleep loop with park/ready
GC assists must block until the assist can be satisfied (either
through stealing credit or doing work) or the GC cycle ends.
Currently, this is implemented as a retry loop with a 100 µs delay.
This obviously isn't ideal, as it wastes CPU and delays mutator
execution. It also has the somewhat peculiar downside that sleeping a
G requires allocation, and this requires working around recursive
allocation.
Replace this timed delay with a proper scheduling queue. When an
assist can't be satisfied immediately, it adds the allocating G to a
queue and parks it. Any time background scan credit is flushed, it
consults this queue, directly satisfies the debt of queued assists,
and wakes up satisfied assists before flushing any remaining credit to
the background credit pool.
No effect on the go1 benchmarks. Slightly speeds up the garbage
benchmark.
name old time/op new time/op delta
XBenchGarbage-12 5.81ms ± 1% 5.72ms ± 4% -1.65% (p=0.011 n=20+20)
Austin Clements [Mon, 2 Nov 2015 21:59:39 +0000 (16:59 -0500)]
runtime: change p.runq from []*g to []guintptr
This eliminates many write barriers in the scheduler code that are
unnecessary and will interfere with upcoming changes where the garbage
collector will have to invoke run queue functions in contexts that
must not have write barriers.
Michael Hudson-Doyle [Mon, 2 Nov 2015 20:36:53 +0000 (09:36 +1300)]
cmd/link: remove duplicate symtab entry for global functions
golang.org/cl/16436 added a local symbol for every global function, but also
added a duplicate entry for the global symbol. Surprisingly this hasn't caused
any noticeable problems, but it's still wrong.
Change-Id: Icd3906760f8aaf7bef31ffd4f2d866d73d36dc2c
Reviewed-on: https://go-review.googlesource.com/16581 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Brad Fitzpatrick [Mon, 2 Nov 2015 15:46:44 +0000 (07:46 -0800)]
test: update tinyfin test
* use new(int32) to be pedantic about documented SetFinalizer rules:
"The argument x must be a pointer to an object allocated by calling
new or by taking the address of a composite literal"
* remove the amd64-only restriction. The GC is fully precise everywhere
now, even on 32-bit. (keep the gccgo restriction, though)
* remove a data race (perhaps the actual bug) and use atomic.LoadInt32
for the final check. The race detector is now happy, too.
Michael Hudson-Doyle [Thu, 29 Oct 2015 22:48:43 +0000 (11:48 +1300)]
cmd/go, runtime: define GOBUILDMODE_shared rather than shared when dynamically linking
To avoid collisions with what existing code may already be doing.
Change-Id: Ice639440aafc0724714c25333d90a49954372230
Reviewed-on: https://go-review.googlesource.com/16503 Reviewed-by: Ian Lance Taylor <iant@golang.org>
net: make Dial, Listen{,Packet} for TCP/UDP with invalid port fail
This change makes Dial, Listen and ListenPacket with invalid port fail
whatever GODEBUG=netdns is.
Please be informed that cgoLookupPort with an out of range literal
number may return either the lower or upper bound value, 0 or 65535,
with no error on some platform.
Fixes #11715.
Change-Id: I43f9c4fb5526d1bf50b97698e0eb39d29fd74c35
Reviewed-on: https://go-review.googlesource.com/12447 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Sat, 31 Oct 2015 21:44:48 +0000 (14:44 -0700)]
cmd/link: support new 386/amd64 relocations
The GNU binutils recently picked up support for new 386/amd64
relocations. Add support for them in the Go linker when doing an
internal link.
The 386 relocation R_386_GOT32X was proposed in
https://groups.google.com/forum/#!topic/ia32-abi/GbJJskkid4I . It can
be treated as identical to the R_386_GOT32 relocation.
The amd64 relocations R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX were
proposed in
https://groups.google.com/forum/#!topic/x86-64-abi/n9AWHogmVY0 . They
can both be treated as identical to the R_X86_64_GOTPCREL relocation.
The purpose of the new relocations is to permit additional linker
relaxations in some cases. We do not attempt to support those cases.
While we're at it, remove the unused and in some cases out of date
_COUNT names from ld/elf.go.
Fixes #13114.
Change-Id: I34ef07f6fcd00cdd2996038ecf46bb77a49e968b
Reviewed-on: https://go-review.googlesource.com/16529 Reviewed-by: Minux Ma <minux@golang.org>
Matthew Dempsky [Thu, 29 Oct 2015 01:55:30 +0000 (18:55 -0700)]
go/types: fix TypeString(nil, nil)
The code is meant to return "<nil>", but because of a make([]Type, 8)
call that should be make([]Type, 0, 8), the nil Type happens to
already appear in the array.
Change-Id: I2db140046e52f27db1b0ac84bde2b6680677dd95
Reviewed-on: https://go-review.googlesource.com/16464
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Matthew Dempsky [Fri, 30 Oct 2015 02:16:20 +0000 (19:16 -0700)]
math: fix bad shift in Expm1
Noticed by cmd/vet.
Expected values array produced by Python instead of Keisan because:
1) Keisan's website calculator is painfully difficult to copy/paste
values into and out of, and
2) after tediously computing e^(vf[i] * 10) - 1 via Keisan I
discovered that Keisan computing vf[i]*10 in a higher precision was
giving substantially different output values.
Also, testing uses "close" instead of "veryclose" because 386's
assembly implementation produces values for some of the test cases
that fail "veryclose". Curiously, Expm1(vf[i]*10) is identical to
Exp(vf[i]*10)-1 on 386, whereas with the portable implementation
they're only "veryclose".
Investigating these questions is left to someone else. I just wanted
to fix the cmd/vet warning.
Fixes #13101.
Change-Id: Ica8f6c267d01aa4cc31f53593e95812746942fbc
Reviewed-on: https://go-review.googlesource.com/16505
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Klaus Post <klauspost@gmail.com> Reviewed-by: Robert Griesemer <gri@golang.org>
Austin Clements [Mon, 19 Oct 2015 17:46:32 +0000 (13:46 -0400)]
runtime: perform concurrent scan in GC workers
Currently the concurrent root scan is performed in its entirety by the
GC coordinator before entering concurrent mark (which enables GC
workers). This scan is done sequentially, which can prolong the scan
phase, delay the mark phase, and means that the scan phase does not
obey the 25% CPU goal. Furthermore, there's no need to complete the
root scan before starting marking (in fact, we already allow GC
assists to happen during the scan phase), so this acts as an
unnecessary barrier between root scanning and marking.
This change shifts the root scan work out of the GC coordinator and in
to the GC workers. The coordinator simply sets up the scan state and
enqueues the right number of root scan jobs. The GC workers then drain
the root scan jobs prior to draining heap scan jobs.
This parallelizes the root scan process, makes it obey the 25% CPU
goal, and effectively eliminates root scanning as an isolated phase,
allowing the system to smoothly transition from root scanning to heap
marking. This also eliminates a major non-STW responsibility of the GC
coordinator, which will make it easier to switch to a decentralized
state machine. Finally, it puts us in a good position to perform root
scanning in assists as well, which will help satisfy assists at the
beginning of the GC cycle.
This is mostly straightforward. One tricky aspect is that we have to
deal with preemption deadlock: where two non-preemptible gorountines
are trying to preempt each other to perform a stack scan. Given the
context where this happens, the only instance of this is two
background workers trying to scan each other. We avoid this by simply
not scanning the stacks of background workers during the concurrent
phase; this is safe because we'll scan them during mark termination
(and their stacks are *very* small and should not contain any new
pointers).
This change also switches the root marking during mark termination to
use the same gcDrain-based code path as concurrent mark. This
shouldn't affect performance because STW root marking was already
parallel and tasks switched to heap marking immediately when no more
root marking tasks were available. However, it simplifies the code and
unifies these code paths.
This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark, possibly by making GC run slightly more
frequently.
name old time/op new time/op delta
XBenchGarbage-12 5.10ms ± 1% 5.24ms ± 1% +2.87% (p=0.000 n=18+18)
Austin Clements [Mon, 19 Oct 2015 17:35:25 +0000 (13:35 -0400)]
runtime: consolidate "out of GC work" checks
We already have gcMarkWorkAvailable, but the check for GC mark work is
open-coded in several places. Generalize gcMarkWorkAvailable slightly
and replace these open-coded checks with calls to gcMarkWorkAvailable.
In addition to cleaning up the code, this puts us in a better position
to make this check slightly more complicated.
cmd/compile/internal: named types for Etype and Op in struct Node
Type Op is enfored now.
Type EType will need further CLs.
Added TODOs where Node.EType is used as a union type.
The TODOs have the format `TODO(marvin): Fix Node.EType union type.`.
Furthermore:
-The flag of Econv function in fmt.go is removed, since unused.
-Some cleaning along the way, e.g. declare vars first when getting initialized.
Passes go build -toolexec 'toolstash -cmp' -a std.
Russ Cox [Fri, 30 Oct 2015 15:03:02 +0000 (11:03 -0400)]
runtime: introduce GOTRACEBACK=single, now the default
Abandon (but still support) the old numbering system.
GOTRACEBACK=none is old 0
GOTRACEBACK=single is the new behavior
GOTRACEBACK=all is old 1
GOTRACEBACK=system is old 2
GOTRACEBACK=crash is unchanged
See doc comment change in runtime1.go for details.
Filed #13107 to decide whether to change default back to GOTRACEBACK=all for Go 1.6 release.
If you run into programs where printing only the current goroutine omits
needed information, please add details in a comment on that issue.
Todd Neal [Tue, 25 Aug 2015 00:45:59 +0000 (19:45 -0500)]
cmd/compile: add support for a go:noinline directive
Some tests need to disable inlining of a function. It's currently done
in one of a few ways (adding a function call, an empty switch, or a
defer). Add support for a less fragile 'go:noinline' directive that
prevents inlining.
Michael Hudson-Doyle [Thu, 29 Oct 2015 21:13:13 +0000 (10:13 +1300)]
cmd/internal/obj, cmd/link, runtime: handle TLS more like a platform linker on ppc64
On ppc64x, the thread pointer, held in R13, points 0x7000 bytes past where
thread-local storage begins (presumably to maximize the amount of storage that
can be accessed with a 16-bit signed displacement). The relocations used to
indicate thread-local storage to the platform linker account for this, so to be
able to support external linking we need to change things so the linker applies
this offset instead of the runtime assembly.
Brad Fitzpatrick [Thu, 29 Oct 2015 20:02:28 +0000 (13:02 -0700)]
math: fix typo and braino in my earlier commit
The bug number was a typo, and I forgot to switch the implementation
back to if statements after the change from Float64bits in the first
patchset back to branching.
if statements can currently be inlined, but switch cannot (#13071)
Taru Karttunen [Fri, 16 Oct 2015 10:26:20 +0000 (13:26 +0300)]
net/http: extra documentation for Redirect and RedirectHandler
Errors with http.Redirect and http.StatusOk seem
to occur from time to time on the irc channel.
This change adds documentation suggesting
to use one of the 3xx codes and not StatusOk
with Redirect.
Michael Hudson-Doyle [Thu, 29 Oct 2015 07:24:29 +0000 (20:24 +1300)]
cmd/compile, cmd/go, cmd/link: enable -buildmode=shared and related flags on linux/arm64
Change-Id: Ibddbbf6f4a5bd336a8b234d40fad0fcea574cd6e
Reviewed-on: https://go-review.googlesource.com/13994 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Michael Hudson-Doyle [Wed, 28 Oct 2015 23:17:43 +0000 (12:17 +1300)]
cmd/link: always resolve functions locally when linking dynamically
When dynamically linking, we want references to functions defined
in this module to always be to the function object, not to the
PLT. We force this by writing an additional local symbol for
every global function symbol and making all relocations against
the global symbol refer to this local symbol instead. This is
approximately equivalent to the ELF linker -Bsymbolic-functions
option, but that is buggy on several platforms.
Change-Id: Ie6983eb4d1947f8543736fd349f9a90df3cce91a
Reviewed-on: https://go-review.googlesource.com/16436 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Michael Hudson-Doyle [Thu, 3 Sep 2015 20:31:03 +0000 (08:31 +1200)]
runtime/cgo: export _cgo_reginit on ppc64x
This is needed to make external linking work.
Change-Id: I4cf7edb4ea318849cab92a697952f8745eed40c4
Reviewed-on: https://go-review.googlesource.com/14237 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Michael Hudson-Doyle [Thu, 3 Sep 2015 00:33:06 +0000 (12:33 +1200)]
cmd/go: implicitly include math in a shared library on arm
In the same manner in which runtime/cgo is included on other architectures.
Change-Id: I90a5ad8585248b2566d763d33994a600508d89cb
Reviewed-on: https://go-review.googlesource.com/14221 Reviewed-by: Ian Lance Taylor <iant@golang.org>