Robert Griesemer [Tue, 26 May 2015 18:44:18 +0000 (11:44 -0700)]
spec: removed TODOs (invisible html comment) in favor of issues
- no "visible" change to spec but for updated date
- retired several outdated TODO items
- filed non-urgent issues 10953, 10954, 10955 for current TODOs
Change-Id: If87ad0fb546c6955a6d4b5801e06e5c7d5695ea2
Reviewed-on: https://go-review.googlesource.com/10382 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Sun, 24 May 2015 18:29:38 +0000 (11:29 -0700)]
cmd/link/internal/ld: if -v, display external linker output
It can be useful when debugging to be able to see what the external
linker is doing even when it succeeds. In particular this permits
passing -v to the external linker to see precisely what it is doing.
Change-Id: Ifed441912d97bbebea20303fdb899e140b380215
Reviewed-on: https://go-review.googlesource.com/10363 Reviewed-by: Minux Ma <minux@golang.org>
Elias Naur [Sat, 23 May 2015 09:26:22 +0000 (11:26 +0200)]
runtime: don't always block all signals on OpenBSD
Implement the changes from CL 10173 on OpenBSD.
Change-Id: I2db1cd8141fd392a34753a1b8113e2e0401173b9
Reviewed-on: https://go-review.googlesource.com/10342
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Mikio Hara [Thu, 14 May 2015 01:18:10 +0000 (10:18 +0900)]
net: adjust dual stack support on dragonfly
As mentioned in
http://gitweb.dragonflybsd.org/dragonfly.git/commit/727ccde8cce813911d885b7f6ed749dcea68a886,
DragonFly BSD is dropping support for IPv6 IPv4-mapped address.
Unfortunately, on some released versions we see the kernels pretend to
support the feature but actually not (unless tweaking some kernel states
via sysctl.)
To avoid unpredictable behavior, the net package assumes that all
DragonFly BSD kernels don't support IPv6 IPv4-mapped address.
Fixes #10764.
Change-Id: Ic7af3651e0372ec03774432fbb6b2eb0c455e994
Reviewed-on: https://go-review.googlesource.com/10071 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Mikio Hara [Fri, 15 May 2015 03:10:10 +0000 (12:10 +0900)]
net: don't show verbose information when -test.v=false
Updates #10845.
Change-Id: I4cec670c7db88c50a6e5619e611744e161d73b3c
Reviewed-on: https://go-review.googlesource.com/10131 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Robert Griesemer [Fri, 22 May 2015 20:58:03 +0000 (13:58 -0700)]
math/big: Always print exponent sign when using 'p' exponent for Floats.
Float.Format supports the 'b' and 'p' format, both of which print
a binary ('p') exponent. The 'b' format always printed a sign ('+'
or '-') for the exponent; the 'p' format only printed a negative
sign for the exponent. This change makes the two consistent. It
also makes the 'p' format easier to read if the exponent is >= 0.
Also:
- Comments added elsewhere.
Change-Id: Ifd2e01bdafb3043345972ca22a90248d055bd29b
Reviewed-on: https://go-review.googlesource.com/10359 Reviewed-by: Alan Donovan <adonovan@google.com>
Elias Naur [Fri, 22 May 2015 22:46:10 +0000 (00:46 +0200)]
misc/cgo/test: fix build for CC=clang
Fix build error when CL=clang introduced by CL 10173.
Change-Id: I8edf210787a9803280c0779ff710c7e634a820d6
Reviewed-on: https://go-review.googlesource.com/10341 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Robert Griesemer [Fri, 22 May 2015 22:21:56 +0000 (15:21 -0700)]
cmd/compile/internal/big: update and apply vendor.bash
Package-external tests must use the vendored math/big package, not
the original one, otherwise tests may fail if there are discrepancies
in the implementation.
Change-Id: Ic5f0489aa6420ffea1f488633453f871ce1f0f66
Reviewed-on: https://go-review.googlesource.com/10380 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Robert Griesemer [Fri, 22 May 2015 00:52:49 +0000 (17:52 -0700)]
math/big: add more Float.Float64 conversion tests
- structure the Float64 conversion tests the same way as for Float32
- add additional test cases, including one that exposes a current issue
(currently disabled, same issue as was fixed for Float32)
The Float64 fix will be in a subsequent change for easier reviewing.
Change-Id: I95dc9e8d1f6b6073a98c7bc2289e6d3248fc3420
Reviewed-on: https://go-review.googlesource.com/10351 Reviewed-by: Alan Donovan <adonovan@google.com>
Robert Griesemer [Fri, 22 May 2015 00:00:37 +0000 (17:00 -0700)]
math/big: fix Float.Float32 conversion for denormal corner cases
The existing code was incorrect for numbers that after rounding would
become the smallest denormal float32 (instead the result was 0). This
caused all.bash to fail if Float32() were used in the compiler for
constant arithmetic (there's currently a work-around - see also issue
10321.
This change fixes the implementation of Float.Float32 and adds
corresponding test cases. Float32 and Float64 diverge at this point.
For ease of review, this change only fixes Float32. Float64 will be
made to match in a subsequent change.
Fixes #10321.
Change-Id: Iccafe37c1593a4946bc552e4ad2045f69be62d80
Reviewed-on: https://go-review.googlesource.com/10350 Reviewed-by: Alan Donovan <adonovan@google.com>
Elias Naur [Mon, 18 May 2015 09:00:24 +0000 (11:00 +0200)]
runtime: don't always unblock all signals
Ian proposed an improved way of handling signals masks in Go, motivated
by a problem where the Android java runtime expects certain signals to
be blocked for all JVM threads. Discussion here
A Go program always needs to have the synchronous signals enabled.
These are the signals for which _SigPanic is set in sigtable, namely
SIGSEGV, SIGBUS, SIGFPE.
A Go program that uses the os/signal package, and calls signal.Notify,
needs to have at least one thread which is not blocking that signal,
but it doesn't matter much which one.
Unix programs do not change signal mask across execve. They inherit
signal masks across fork. The shell uses this fact to some extent;
for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are
blocked for commands run due to backquote quoting or $().
Our current position on signal masks was not thought out. We wandered
into step by step, e.g., http://golang.org/cl/7323067 .
This CL does the following:
Introduce a new platform hook, msigsave, that saves the signal mask of
the current thread to m.sigsave.
Call msigsave from needm and newm.
In minit grab set up the signal mask from m.sigsave and unblock the
essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT
(for systems that have it).
In unminit, restore the signal mask from m.sigsave.
The first time that os/signal.Notify is called, start a new thread whose
only purpose is to update its signal mask to make sure signals for
signal.Notify are unblocked on at least one thread.
The effect on Go programs will be that if they are invoked with some
non-synchronous signals blocked, those signals will normally be
ignored. Previously, those signals would mostly be ignored. A change
in behaviour will occur for programs started with any of these signals
blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT,
SIGTERM. Previously those signals would always cause a crash (unless
using the os/signal package); with this change, they will be ignored
if the program is started with the signal blocked (and does not use
the os/signal package).
./all.bash completes successfully on linux/amd64.
OpenBSD is missing the implementation.
Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c
Reviewed-on: https://go-review.googlesource.com/10173 Reviewed-by: Ian Lance Taylor <iant@golang.org>
David Chase [Wed, 20 May 2015 19:16:34 +0000 (15:16 -0400)]
cmd/internal/gc: move check for large-hence-heap-allocated types into escape analysis
Before this change, the check for too-large arrays (and other large
types) occurred after escape analysis. If the data moved off stack
and onto the heap contained any pointers, it would therefore escape,
but because the too-large check occurred after escape analysis this
would not be recorded and a stack pointer would leak to the heap
(see the modified escape_array.go for an example).
Some of these appear to remain, in calls to typecheck from within walk.
Also corrected a few comments in escape_array.go about "BAD"
analysis that is now done correctly.
Enhanced to move aditional EscNone-but-large-so-heap checks into esc.c.
math/big: Simple Montgomery Multiplication to accelerate Mod-Exp
On Haswell I measure anywhere between 2X to 3.5X speedup for RSA.
I believe other architectures will also greatly improve.
In the future may be upgraded by dedicated assembly routine.
Built-in benchmarks i5-4278U turbo off:
benchmark old ns/op new ns/op delta
BenchmarkRSA2048Decrypt 66966493073769 -54.10%
Benchmark3PrimeRSA2048Decrypt 44723401669080 -62.68%
Change-Id: I17df84f85e34208f990665f9f90ea671695b2add
Reviewed-on: https://go-review.googlesource.com/9253 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Adam Langley <agl@golang.org> Reviewed-by: Vlad Krasnov <vlad@cloudflare.com>
Run-TryBot: Adam Langley <agl@golang.org>
Russ Cox [Wed, 20 May 2015 02:58:10 +0000 (22:58 -0400)]
runtime: fix callwritebarrier
Given a call frame F of size N where the return values start at offset R,
callwritebarrier was instructing heapBitsBulkBarrier to scan the block
of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R
bytes scanned might lead into the next allocated block in memory.
Because the scan was consulting the heap bitmap for type information,
scanning into the next block normally "just worked" in the sense of
not crashing.
Scanning the extra N-R bytes of memory is a problem mainly because
it causes the GC to consider pointers that might otherwise not be
considered, leading it to retain objects that should actually be freed.
This is very difficult to detect.
Luckily, juju turned up a case where the heap bitmap and the memory
were out of sync for the block immediately after the call frame, so that
heapBitsBulkBarrier saw an obvious non-pointer where it expected a
pointer, causing a loud crash.
Why is there a non-pointer in memory that the heap bitmap records as
a pointer? That is more difficult to answer. At least one way that it
could happen is that allocations containing no pointers at all do not
update the heap bitmap. So if heapBitsBulkBarrier walked out of the
current object and into a no-pointer object and consulted those bitmap
bits, it would be misled. This doesn't happen in general because all
the paths to heapBitsBulkBarrier first check for the no-pointer case.
This may or may not be what happened, but it's the only scenario
I've been able to construct.
I tried for quite a while to write a simple test for this and could not.
It does fix the juju crash, and it is clearly an improvement over the
old code.
Austin Clements [Wed, 20 May 2015 15:57:02 +0000 (11:57 -0400)]
runtime: eliminate write barrier from adjustpointers
Currently adjustpointers invokes a write barrier for every stack slot
it updates. This is safe---the write barrier always does nothing
because the new value is never a heap pointer---but it's unnecessary
overhead in performance and complexity.
Fix this by rewriting adjustpointers to work with *uintptrs instead of
*unsafe.Pointers. As an added bonus, this makes the code cleaner.
Russ Cox [Thu, 21 May 2015 17:28:17 +0000 (13:28 -0400)]
all: retire architecture letter in file names, public API
This CL removes the remaining visible uses of the "architecture letter" concept.
(They are no longer in tool names nor in source directory names.)
Because the architecture letter concept is now gone, delete GOCHAR
from "go env" output, and change go/build.ArchChar to return an
error always.
The architecture letter is still used in the compiler and linker sources
as a clumsy architecture enumeration, but that use is not visible to
Go users and can be cleaned up separately.
Change-Id: I4d97a38f372003fb610c9c5241bea440d9dbeb8d
Reviewed-on: https://go-review.googlesource.com/10289 Reviewed-by: Rob Pike <r@golang.org>
Russ Cox [Thu, 21 May 2015 17:28:10 +0000 (13:28 -0400)]
cmd/compile, cmd/link: create from 5g, 5l, etc
Trivial merging of 5g, 6g, ... into go tool compile,
and similarlly 5l, 6l, ... into go tool link.
The files compile/main.go and link/main.go are new.
Everything else in those directories is a move followed by
change of imports and package name.
This CL breaks the build. Manual fixups are in the next CL.
See golang-dev thread titled "go tool compile, etc" for background.
Change-Id: Id35ff5a5859ad9037c61275d637b1bd51df6828b
Reviewed-on: https://go-review.googlesource.com/10287 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Rob Pike <r@golang.org>
Rick Hudson [Thu, 21 May 2015 13:09:35 +0000 (09:09 -0400)]
runtime: turn work buffer tracing off by default
During development we ran with monitoring code turned
on by default. This CL turns the work buffer monitoring
off. Performance change on most go1 benchmarks is small
or insignificant.
Austin Clements [Wed, 20 May 2015 15:50:48 +0000 (11:50 -0400)]
runtime: make runtime.callers walk calling G, not g0
Currently runtime.callers invokes gentraceback with the pc and sp of
the G it is called from, but always passes g0 even if it was called
from a regular g. Right now this has no ill effects because
runtime.callers does not use either callback argument or the
_TraceJumpStack flag, but it makes the code fragile and will break
some upcoming changes.
Fix this by lifting the getg() call outside of the systemstack in
runtime.callers.
Change-Id: I4e1e927961c0e0cd4dcf28693be47df7bae9e122
Reviewed-on: https://go-review.googlesource.com/10292 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org>
Mikio Hara [Thu, 7 May 2015 10:10:29 +0000 (19:10 +0900)]
net: document that ListenMulticastUDP is for simple applications
Also mentions golang.org/x/net/ipv4 and golang.org/x/net/ipv6.
Change-Id: I653deac7a5e2b129237655a72d6c91207f1b1685
Reviewed-on: https://go-review.googlesource.com/9779 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Alan Donovan [Wed, 20 May 2015 19:49:23 +0000 (15:49 -0400)]
go/parser: parse incomplete selection "fmt." as a blank selection "fmt._"
Formerly it would return a BadExpr.
This prevents partial syntax from being discarded, and makes the error
recovery logic more consistent with other places where an identifier
was expected but not found.
+ test
Change-Id: I223c0c0589e7ceb7207ae951b8f71b9275a1eb73
Reviewed-on: https://go-review.googlesource.com/10269 Reviewed-by: Robert Griesemer <gri@golang.org>
Michael Hudson-Doyle [Mon, 18 May 2015 01:03:22 +0000 (13:03 +1200)]
cmd/go: change Package.Shlib to be the absolute path of the shared library
Makes little difference internally but makes go list output more useful.
Change-Id: I1fa1f839107de08818427382b2aef8dc4d765b36
Reviewed-on: https://go-review.googlesource.com/10192 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Mikio Hara [Wed, 13 May 2015 03:44:45 +0000 (12:44 +0900)]
net: fix the series of TestLookup and external tests
On Windows, we need to make sure that the node under test has external
connectivity.
Fixes #10795.
Change-Id: I99f2336180c7b56474fa90a4a6cdd5a6c4dd3805
Reviewed-on: https://go-review.googlesource.com/10006 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Mikio Hara [Mon, 18 May 2015 08:27:33 +0000 (17:27 +0900)]
net: fix data race in TestSocket{Conn,PacketConn}
Fixes #10891.
Change-Id: Ie432c9c5520ac29cea8fe6452628ec467567eea5
Reviewed-on: https://go-review.googlesource.com/10194 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Shenghou Ma [Tue, 19 May 2015 06:48:15 +0000 (02:48 -0400)]
misc/cgo/testshared: when checking for RPATHs also look for DT_RUNPATH
On my systems, ld -rpath sets DT_RUNPATH instead of DT_RPATH.
Change-Id: I5047e795fb7ef9336f5fa13ba24bb6245c0b0582
Reviewed-on: https://go-review.googlesource.com/10260 Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Didier Spezia [Thu, 14 May 2015 22:36:59 +0000 (22:36 +0000)]
html/template: fix string iteration in replacement operations
In css, js, and html, the replacement operations are implemented
by iterating on strings (rune by rune). The for/range
statement is used. The length of the rune is required
and added to the index to properly slice the string.
This is potentially wrong because there is a discrepancy between
the result of utf8.RuneLen and the increment of the index
(set by the for/range statement). For invalid strings,
utf8.RuneLen('\ufffd') == 3, while the index is incremented
only by 1 byte.
htmlReplacer triggers a panic at slicing time for some
invalid strings.
Use a more robust iteration mechanism based on
utf8.DecodeRuneInString, and make sure the same
pattern is used for all similar functions in this
package.
Fixes #10799
Change-Id: Ibad3857b2819435d9fa564f06fc2ca8774102841
Reviewed-on: https://go-review.googlesource.com/10105 Reviewed-by: Rob Pike <r@golang.org>
Commit 9c9e36b pushed these errors down to where the write barriers
are actually emitted, but forgot to remove the original error that was
being pushed down.
Alexander Zolotov [Sat, 9 May 2015 13:41:32 +0000 (16:41 +0300)]
cmd/go: run gofmt from current GOROOT
The existing implementation executes `gofmt` binary from PATH
environment variable on invocation `go fmt` command.
Relying on PATH might lead to confusions for users with several Go installations.
It's more appropriate to run `gofmt` from GOBIN (if defined) or GOROOT.
Fixes #10755
Change-Id: I56d42a747319c766f2911508fab3994c3a366d12
Reviewed-on: https://go-review.googlesource.com/9900 Reviewed-by: Rob Pike <r@golang.org>
Rick Hudson [Mon, 18 May 2015 20:02:37 +0000 (16:02 -0400)]
runtime: run background mark helpers only if work is available
Prior to this CL whenever the GC marking was enabled and
a P was looking for work we supplied a G to help
the GC do its marking tasks. Once this G finished all
the marking available it would release the P to find another
available G. In the case where there was no work the P would drop
into findrunnable which would execute the mark helper G which would
immediately return and the P would drop into findrunnable again repeating
the process. Since the P was always given a G to run it never blocks.
This CL first checks if the GC mark helper G has available work and if
not the P immediately falls through to its blocking logic.
Austin Clements [Sun, 17 May 2015 01:14:37 +0000 (21:14 -0400)]
runtime: minor clean up to heapminimum
Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The
real intent is to set heapminimum to a scaled multiple of a fixed
default heap minimum, not to scale heapminimum based on its current
value. This turns out to be okay because setGCPercent is only called
once and heapminimum is initially set to this default heap minimum.
However, the code as written is confusing, especially since
setGCPercent is otherwise written so it could be called again to
change GOGC. Fix this by introducing a defaultHeapMinimum constant and
using this instead of the current value of heapminimum to compute the
scaled heap minimum.
As part of this, this commit improves the documentation on
heapminimum.
Russ Cox [Mon, 18 May 2015 15:40:29 +0000 (11:40 -0400)]
runtime: add fast check for self-loop pointer in scanobject
Addresses a problem reported on the mailing list.
This will come up mainly in programs custom allocators that batch allocations,
but it still helps in our programs, which mainly do not have such allocations.
In contrast to the one in test/bench/go1 (above), the binarytree program on the
shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS
to runtime.NumCPU()*2.
Using that version, before vs after:
name old mean new mean delta
BinaryTree20 18.6s × (0.96,1.05) 11.3s × (0.98,1.02) -39.46% (p=0.000)
And Go 1.4 vs after:
name old mean new mean delta
BinaryTree20 13.0s × (0.97,1.02) 11.3s × (0.98,1.02) -13.21% (p=0.000)
There is still a scheduling problem - the raw run times are hiding the fact that
this chews up 2x the CPU - but we'll take care of that separately.
Rob Pike [Fri, 15 May 2015 20:30:42 +0000 (13:30 -0700)]
cmd/doc: put blank lines around comment for types, etc.
Better layout.
Fixes #10859.
The issue suggests rearranging so the comment comes out
after the methods. I tried this and it looks good but it is less
useful, since the stuff you're probably looking for - the methods
- are scrolled away by the comment. The most important
information should be last because that leaves it on your
screen after the print if the output is long.
David Chase [Fri, 15 May 2015 16:19:07 +0000 (12:19 -0400)]
cmd/internal/gc: extend escape analysis to pointers in slices
Modified esc.go to allow slice literals (before append)
to be non-escaping. Modified tests to account for changes
in escape behavior and to also test the two cases that
were previously not tested.
Also minor cleanups to debug-printing within esc.go
Allocation stats for running compiler
( cd src/html/template;
for i in {1..5} ; do
go tool 6g -memprofile=testzz.${i}.prof -memprofilerate=1 *.go ;
go tool pprof -alloc_objects -text testzz.${i}.prof ;
done ; )
before about 86k allocations
after about 83k allocations
Austin Clements [Fri, 15 May 2015 20:31:17 +0000 (16:31 -0400)]
runtime: use separate count and note for forEachP
Currently, forEachP reuses the stopwait and stopnote fields from
stopTheWorld to track how many Ps have not responded to the safe-point
request and to sleep until all Ps have responded.
It was assumed this was safe because both stopTheWorld and forEachP
must occur under the worlsema and hence stopwait and stopnote cannot
be used for both purposes simultaneously and callers could always
determine the appropriate use based on sched.gcwaiting (which is only
set by stopTheWorld). However, this is not the case, since it's
possible for there to be a window between when an M observes that
gcwaiting is set and when it checks stopwait during which stopwait
could have changed meanings. When this happens, the M decrements
stopwait and may wakeup stopnote, but does not otherwise participate
in the forEachP protocol. As a result, stopwait is decremented too
many times, so it may reach zero before all Ps have run the safe-point
function, causing forEachP to wake up early. It will then either
observe that some P has not run the safe-point function and panic with
"P did not run fn", or the remaining P (or Ps) will run the safe-point
function before it wakes up and it will observe that stopwait is
negative and panic with "not stopped".
Fix this problem by giving forEachP its own safePointWait and
safePointNote fields.
One known sequence of events that can cause this race is as
follows. It involves three actors:
G1 is running on M1 on P1. P1 has an empty run queue.
G2/M2 is in a blocked syscall and has lost its P. (The details of this
don't matter, it just needs to be in a position where it needs to grab
an idle P.)
GC just started on G3/M3/P3. (These aren't very involved, they just
have to be separate from the other G's, M's, and P's.)
1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.
Now G1/M1 begins to enter a syscall:
2. G1/M1 invokes reentersyscall, which sets the P1's status to
_Psyscall.
3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
entersyscall_gcwait.
5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
returns.
6. GC does stuff and then calls startTheWorld().
7. startTheWorld() calls procresize(), which sets P1's status to
_Pidle and puts P1 on the idle list.
Now G2/M2 returns from its syscall and takes over P1:
8. G2/M2 returns from its blocked syscall and gets P1 from the idle
list.
9. G2/M2 acquires P1, which sets P1's status to _Prunning.
10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
P1's status to _Psyscall.
Back on G1/M1:
11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.
At this point, G1/M1 still thinks it's running on P1. P1's status is
_Psyscall, which is consistent with what G1/M1 is doing, but it's
_Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
basically an ABA race on P1's status.
Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
entersyscall_gcwait observes the non-zero stopwait set by forEachP,
but mistakes it for a stopTheWorld. It cas's P1's status from
_Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
stopwait one more time than forEachP was expecting.
Fixes #10618. (See the issue for details on why the above race is safe
when forEachP is not involved.)
Prior to this commit, the command
stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
would reliably fail after a few hundred runs. With this commit, it
ran for over 2 million runs and never crashed.
Austin Clements [Fri, 15 May 2015 20:13:14 +0000 (16:13 -0400)]
runtime: hold worldsema while starting the world
Currently, startTheWorld releases worldsema before starting the
world. Since startTheWorld can change gomaxprocs after allowing Ps to
run, this means that gomaxprocs can change while another P holds
worldsema.
Unfortunately, the garbage collector and forEachP assume that holding
worldsema protects against changes in gomaxprocs (which it *almost*
does). In particular, this is causing somewhat frequent "P did not run
fn" crashes in forEachP in the runtime tests because gomaxprocs is
changing between the several loops that forEachP does over all the Ps.
Fix this by only releasing worldsema after the world is started.
This relates to issue #10618. forEachP still fails under stress
testing, but much less frequently.
Austin Clements [Fri, 15 May 2015 20:10:00 +0000 (16:10 -0400)]
runtime: disallow preemption during startTheWorld
Currently, startTheWorld clears preemptoff for the current M before
starting the world. A few callers increment m.locks around
startTheWorld, presumably to prevent preemption any time during
starting the world. This is almost certainly pointless (none of the
other callers do this), but there's no harm in making startTheWorld
keep preemption disabled until it's all done, which definitely lets us
drop these m.locks manipulations.
There are several steps to stopping and starting the world and
currently they're open-coded in several places. The garbage collector
is the only thing that needs to stop and start the world in a
non-trivial pattern. Replace all other uses with calls to higher-level
functions that implement the entire pattern necessary to stop and
start the world.
This is a pure refectoring and should not change any code semantics.
In the following commits, we'll make changes that are easier to do
with this abstraction in place.
This commit renames the old starttheworld to startTheWorldWithSema.
This is a slight misnomer right now because the callers release
worldsema just before calling this. However, a later commit will swap
these and I don't want to think of another name in the mean time.
Austin Clements [Fri, 15 May 2015 20:03:27 +0000 (16:03 -0400)]
runtime: don't start GC if preemptoff is set
In order to avoid deadlocks, startGC avoids kicking off GC if locks
are held by the calling M. However, it currently fails to check
preemptoff, which is the other way to disable preemption.
Austin Clements [Wed, 13 May 2015 21:08:16 +0000 (17:08 -0400)]
runtime: eliminate runqvictims and a copy from runqsteal
Currently, runqsteal steals Gs from another P into an intermediate
buffer and then copies those Gs into the current P's run queue. This
intermediate buffer itself was moved from the stack to the P in commit c4fe503 to eliminate the cost of zeroing it on every steal.
This commit follows up c4fe503 by stealing directly into the current
P's run queue, which eliminates the copy and the need for the
intermediate buffer. The update to the tail pointer is only committed
once the entire steal operation has succeeded, so the semantics of
stealing do not change.
Russ Cox [Fri, 8 May 2015 05:43:18 +0000 (01:43 -0400)]
runtime: replace GC programs with simpler encoding, faster decoder
Small types record the location of pointers in their memory layout
by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries,
and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using
a bitmap for a large type containing arrays does not make sense:
if someone refers to the type [1<<28]*byte in a program in such
a way that the type information makes it into the binary, it would be
a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB
(for 1-bit entries) bitmap full of 1s into the binary or even to keep
one in memory during the execution of the program.
For large types containing arrays, it is much more compact to describe
the locations of pointers using a notation that can express repetition
than to lay out a bitmap of pointers. Go 1.4 included such a notation,
called ``GC programs'' but it was complex, required recursion during
decoding, and was generally slow. Dmitriy measured the execution of
these programs writing directly to the heap bitmap as being 7x slower
than copying from a preunrolled 4-bit mask (and frankly that code was
not terribly fast either). For some tests, unrollgcprog1 was seen costing
as much as 3x more than the rest of malloc combined.
This CL introduces a different form for the GC programs. They use a
simple Lempel-Ziv-style encoding of the 1-bit pointer information,
in which the only operations are (1) emit the following n bits
and (2) repeat the last n bits c more times. This encoding can be
generated directly from the Go type information (using repetition
only for arrays or large runs of non-pointer data) and it can be decoded
very efficiently. In particular the decoding requires little state and
no recursion, so that the entire decoding can run without any memory
accesses other than the reads of the encoding and the writes of the
decoded form to the heap bitmap. For recursive types like arrays of
arrays of arrays, the inner instructions are only executed once, not
n times, so that large repetitions run at full speed. (In contrast, large
repetitions in the old programs repeated the individual bit-level layout
of the inner data over and over.) The result is as much as 25x faster
decoding compared to the old form.
Because the old decoder was so slow, Go 1.4 had three (or so) cases
for how to set the heap bitmap bits for an allocation of a given type:
(1) If the type had an even number of words up to 32 words, then
the 4-bit pointer mask for the type fit in no more than 16 bytes;
store the 4-bit pointer mask directly in the binary and copy from it.
(1b) If the type had an odd number of words up to 15 words, then
the 4-bit pointer mask for the type, doubled to end on a byte boundary,
fit in no more than 16 bytes; store that doubled mask directly in the
binary and copy from it.
(2) If the type had an even number of words up to 128 words,
or an odd number of words up to 63 words (again due to doubling),
then the 4-bit pointer mask would fit in a 64-byte unrolled mask.
Store a GC program in the binary, but leave space in the BSS for
the unrolled mask. Execute the GC program to construct the mask the
first time it is needed, and thereafter copy from the mask.
(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
(This is the case that was 7x slower than the other two.)
Because the new pointer masks store 1-bit entries instead of 4-bit
entries and because using the decoder no longer carries a significant
overhead, after this CL (that is, for Go 1.5) there are only two cases:
(1) If the type is 128 words or less (no condition about odd or even),
store the 1-bit pointer mask directly in the binary and use it to
initialize the heap bitmap during malloc. (Implemented in CL 9702.)
(2) There is no case 2 anymore.
(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
Executing the GC program directly into the heap bitmap (case (3) above)
was disabled for the Go 1.5 dev cycle, both to avoid needing to use
GC programs for typedmemmove and to avoid updating that code as
the heap bitmap format changed. Typedmemmove no longer uses this
type information; as of CL 9886 it uses the heap bitmap directly.
Now that the heap bitmap format is stable, we reintroduce GC programs
and their space savings.
Benchmarks for heapBitsSetType, before this CL vs this CL:
The above compares always using a cached pointer mask (and the
corresponding waste of memory) against using the programs directly.
Some slowdown is expected, in exchange for having a better general algorithm.
The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024,
along with the slice variants of those.
It is possible that the cutoff of 128 words (bits) should be raised
in a followup CL, but even with this low cutoff the GC programs are
faster than Go 1.4's "fast path" non-GC program case.
Benchmarks for heapBitsSetType, Go 1.4 vs this CL:
SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation.
Both Go 1.4 and this CL are using pointer bitmaps for this case,
so that's an overall 3x speedup for using pointer bitmaps.
SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation.
Both Go 1.4 and this CL are running the GC program for this case,
so that's an overall 17x speedup when using GC programs (and
I've seen >20x on other systems).
Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against
this CL's SetTypeNode128 (GC program), the slow path in the
code in this CL is 2x faster than the fast path in Go 1.4.
The Go 1 benchmarks are basically unaffected compared to just before this CL.
This CL is a bit larger than I would like, but the compiler, linker, runtime,
and package reflect all need to be in sync about the format of these programs,
so there is no easy way to split this into independent changes (at least
while keeping the build working at each change).
Didier Spezia [Thu, 14 May 2015 16:44:58 +0000 (16:44 +0000)]
text/template: fix race condition on function maps
The Template objects are supposed to be goroutine-safe once they
have been parsed. This includes the text and html ones.
For html/template, the escape mechanism is triggered at execution
time. It may alter the internal structures of the template, so
a mutex protects them against concurrent accesses.
The text/template package is free of any synchronization primitive.
A race condition may occur when nested templates are escaped:
the escape algorithm alters the function maps of the associated
text templates, while a concurrent template execution may access
the function maps in read mode.
The less invasive fix I have found is to introduce a RWMutex in
text/template to protect the function maps. This is unfortunate
but it should be effective.
Fixes #9945
Change-Id: I1edb73c0ed0f1fcddd2f1516230b548b92ab1269
Reviewed-on: https://go-review.googlesource.com/10101 Reviewed-by: Rob Pike <r@golang.org>
Michael Hudson-Doyle [Tue, 12 May 2015 04:07:05 +0000 (16:07 +1200)]
cmd/internal/ld: prevent creation of .dynamic and .dynsym symbols when externally linking
This allows the removal of a fudge in data.go.
We have to defer the calls to adddynlib on non-Darwin until after we have
decided whether we are externally or internally linking. The Macho/ELF
separation could do with some cleaning up, but: code freeze.
Fixing this once rather than per-arch is what inspired the previous CLs.
Alex A Skinner [Wed, 13 May 2015 03:56:56 +0000 (23:56 -0400)]
net: redo resolv.conf recheck implementation
The previous implementation spawned an extra goroutine to handle
rechecking resolv.conf for changes.
This change eliminates the extra goroutine, and has rechecking
done as part of a lookup. A side effect of this change is that the
first lookup after a resolv.conf change will now succeed, whereas
previously it would have failed. It also fixes rechecking logic to
ignore resolv.conf parsing errors as it should.
Matthew Dempsky [Tue, 7 Apr 2015 18:57:52 +0000 (11:57 -0700)]
cmd/internal/gc, cmd/yacc: merge yaccerrors.go into cmd/yacc
This extends cmd/yacc with support for
%error { tokens } : message
syntax to specify custom error messages to use instead of the default
generic ones. This allows merging go.errors into go.y and removing
the yaccerrors.go tool.
encoding/json: fix decoding of types with '[]byte' as underlying type
All slice types which have elements of kind reflect.Uint8 are marshalled
into base64 for compactness. When decoding such data into a custom type
based on []byte the decoder checked the slice kind instead of the slice
element kind, so no appropriate decoder was found.
Fixed by letting the decoder check slice element kind like the encoder.
This guarantees that already encoded data can still be successfully
decoded.
Name will be converted from an anonymous to a
named field in a subsequent, automated CL.
No functional changes. Passes toolstash -cmp.
This reduces the size of gc.Node from 424 to 400 bytes.
This in turn reduces the permanent (pprof -inuse_space)
memory usage while compiling the test/rotate?.go tests:
This CL was generated by updating Val in go.go
and then running:
sed -i "" 's/\.U\.[SBXFC]val = /.U = /' *.go
sed -i "" 's/\.U\.Sval/.U.\(string\)/g' *.go *.y
sed -i "" 's/\.U\.Bval/.U.\(bool\)/g' *.go *.y
sed -i "" 's/\.U\.Xval/.U.\(\*Mpint\)/g' *.go *.y
sed -i "" 's/\.U\.Fval/.U.\(\*Mpflt\)/g' *.go *.y
sed -i "" 's/\.U\.Cval/.U.\(\*Mpcplx\)/g' *.go *.y
No functional changes. Passes toolstash -cmp.
This reduces the size of gc.Node from 424 to 392 bytes.
This in turn reduces the permanent (pprof -inuse_space)
memory usage while compiling the test/rotate?.go tests:
CL 8445 was similar to this; gri asked that Val's implementation
be hidden first. CLs 8912, 9263, and 9267 have at least
isolated the changes to the cmd/internal/gc package.