David Crawshaw [Wed, 29 Apr 2015 14:59:22 +0000 (10:59 -0400)]
cmd/internal/ld: use a simpler cout writer
Removes the unused *bufio.Reader from the object controlling the
linker's primary output.
Change-Id: If91d9f60752f3dc4b280f35d6eb441f3c47574b2
Reviewed-on: https://go-review.googlesource.com/9362 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Wed, 29 Apr 2015 00:58:17 +0000 (17:58 -0700)]
misc/cgo/test/issue9400: fix to build with gccgo
This doesn't test much with gccgo, but at least it builds now, and the
test does, unsurprisingly, pass. A proper test would require adding
assembly files in GCC syntax for all platforms that gccgo supports,
which would be infeasible.
Also added copyright headers to the asm files.
Change-Id: Icea5af29d7d521a0681506ddb617a79705b76d33
Reviewed-on: https://go-review.googlesource.com/9417 Reviewed-by: Minux Ma <minux@golang.org>
Shenghou Ma [Thu, 2 Apr 2015 01:38:05 +0000 (21:38 -0400)]
cmd/internal/obj: do not generate data for $f32. and $f64. symbols at assemble time
When reading the object files for linking, liblink takes care of
generate the data for them.
This is a port of https://golang.org/cl/3101 to Go.
Change-Id: Ie3e2d6515bd7d253a8c1e25c70ef8fed064436d8 Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/8383 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Keith Randall [Tue, 21 Apr 2015 21:22:41 +0000 (14:22 -0700)]
runtime: tail call into memeq/cmp body implementations
There's no need to call/ret to the body implementation.
It can write the result to the right place. Just jump to
it and have it return to our caller.
Old:
call body implementation
compute result
put result in a register
return
write register to result location
return
New:
load address of result location into a register
jump to body implementation
compute result
write result to passed-in address
return
It's a bit tricky on 386 because there is no free register
with which to pass the result location. Free up a register
by keeping around blen-alen instead of both alen and blen.
Nigel Tao [Wed, 29 Apr 2015 03:51:49 +0000 (13:51 +1000)]
image/gif: check that individual frame's bounds are within the overall
GIF's bounds.
Also change the implicit Config Width and Height to be the
Rectangle.Max, not the Dx and Dy, of the first frame's bounds. For the
case where the first frame's bounds is something like (5,5)-(8,8), the
overall width should be 8, not 3.
Change-Id: I3affc484f5e32941a36f15517a92ca8d189d9c22
Reviewed-on: https://go-review.googlesource.com/9465 Reviewed-by: Rob Pike <r@golang.org>
Also, please be informed that the Write method on both connected and
unconnected-mode sockets may return a positive number of bytes written
with timeout or use of closed network connection error.
Michael Hudson-Doyle [Thu, 23 Apr 2015 02:38:05 +0000 (14:38 +1200)]
cmd/internal/obj: Delete Link.Symmorestack
This started out as trying to remove Bool2int calls, which it does a bit, but
mostly it ended up being removing the Link.Symmorestack array which seemed a
pointless bit of caching.
Change-Id: I91a51eb08cb4b08f3f9f093b575306499267b67a
Reviewed-on: https://go-review.googlesource.com/9239 Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Shenghou Ma [Wed, 29 Apr 2015 03:02:49 +0000 (23:02 -0400)]
cmd/internal/ld, runtime: unify stack reservation in PE header and runtime
With 128KB stack reservation, on 32-bit Windows, the maximum number
threads is ~9000.
The original 65535-byte stack commit is causing problem on Windows
XP where it makes the stack reservation to be 1MB despite the fact
that the runtime specified 128KB.
While we're at here, also fix the extra spacings in the unable to
create more OS thread error message: println will insert a space
between each argument.
See #9457 for more information.
Change-Id: I3a82f7d9717d3d55211b6eb1c34b00b0eaad83ed
Reviewed-on: https://go-review.googlesource.com/2237 Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Minux Ma <minux@golang.org>
Ian Lance Taylor [Tue, 28 Apr 2015 20:58:32 +0000 (13:58 -0700)]
runtime/cgo: use PTHREAD_{MUTEX,COND}_INITIALIZER
Technically you must initialize static pthread_mutex_t and
pthread_cond_t variables with the appropriate INITIALIZER macro. In
practice the default initializers are zero anyhow, but it's still good
code hygiene.
Change-Id: I517304b16c2c7943b3880855c1b47a9a506b4bdf
Reviewed-on: https://go-review.googlesource.com/9433 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Fixes #10366 (how to set custom headers)
Fixes #9836 (PATCH in PostForm)
Fixes #9276 (generating a server-side Request for testing)
Update #8991 (clarify Response.Write for now; export ReverseProxy's copy later?)
Change-Id: I95a11bf3bb3eeeeb72775b6ebfbc761641addc35
Reviewed-on: https://go-review.googlesource.com/9410 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Rob Pike [Tue, 28 Apr 2015 19:35:06 +0000 (12:35 -0700)]
cmd/asm: add comments back for aliases on jumps for x86
These were lost in the transition from 8a/6a to asm.
Also, in the process, discover more aliases. I'm betting the missing
ones were a casualty of the recent merge of 386 and amd64.
Richard Barnes [Thu, 12 Feb 2015 07:35:16 +0000 (23:35 -0800)]
encoding/asn1: Improved control of flags and times
This change corrects the serialization of asn1.Flag values, so that
when set, they serialize to an empty value, and when unset, they are
omitted. It also adds a format parameter that allows calling code
to control whether time.Time values are serialized as UTCTime or
GeneralizedTime.
Change-Id: I6d97abf009ea317338dab30c80f35a2de7e07104
Reviewed-on: https://go-review.googlesource.com/5970 Reviewed-by: Adam Langley <agl@golang.org>
Run-TryBot: Adam Langley <agl@golang.org>
Adam Langley [Sun, 26 Apr 2015 22:18:41 +0000 (15:18 -0700)]
crypto/x509: allow parsing of certificates with unknown critical extensions.
Previously, unknown critical extensions were a parse error. However, for
some cases one wishes to parse and use a certificate that may contain
these extensions. For example, when using a certificate in a TLS server:
it's the client's concern whether it understands the critical extensions
but the server still wishes to parse SNI values out of the certificate
etc.
This change moves the rejection of unknown critical extensions from
ParseCertificate to Certificate.Verify. The former will now record the
OIDs of unknown critical extensions in the Certificate and the latter
will fail to verify certificates with them. If a user of this package
wishes to handle any unknown critical extensions themselves, they can
extract the extensions from Certificate.Extensions, process them and
remove known OIDs from Certificate.UnknownCriticalExtensions.
See discussion at
https://groups.google.com/forum/#!msg/golang-nuts/IrzoZlwalTQ/qdK1k-ogeHIJ
and in the linked bug.
runtime: eliminate one heapBitsForObject from scanobject
scanobject with ptrmask!=nil is only ever called with the base
pointer of a heap object. Currently, scanobject calls
heapBitsForObject, which goes to a great deal of trouble to check
that the pointer points into the heap and to find the base of the
object it points to, both of which are completely unnecessary in
this case.
Replace this call to heapBitsForObject with much simpler logic to
fetch the span and compute the heap bits.
cmd/internal/gc: emit typedmemmove write barrier from sgen
Emitting it here instead of rewriting the tree earlier sets us up
to generate an inline check, like we do for single pointers.
But even without the inline check, generating at this level lets
us generate significantly more efficient code, probably due to
having fewer temporaries and less complex high-level code
for the compiler to churn through.
Revcomp is worse, almost certainly due to register pressure.
cmd/internal/gc: inline writeBarrierEnabled check before calling writebarrierptr
I believe the benchmarks that get slower are under register pressure,
and not making the call unconditionally makes the pressure worse,
and the register allocator doesn't do a great job. But part of the point
of this sequence is to get the write barriers out of the way so I can work
on the register allocator, so that's okay.
runtime: replace needwb() with writeBarrierEnabled
Reduce the write barrier check to a single load and compare
so that it can be inlined into write barrier use sites.
Makes the standard write barrier a little faster too.
runtime: change unused argument in fat write barriers from pointer to scalar
The argument is unused, only present for alignment of the
following argument. The compiler today always passes a zero
but I'd rather not write anything there during the call sequence,
so mark it as a scalar so the garbage collector won't look at it.
cmd/internal/gc: accept comma-separated list of name=value for -d
This should obviously have no performance impact.
Listing numbers just as a sanity check for the benchmark
comparison program: it should (and does) find nothing
to report.
cmd/internal/gc: use MOV R0, R1 instead of LEA 0(R0), R1 in Agen
Minor code generation optimization I've been meaning to do
for a while and noticed while working on the emitted write
barrier code. Using MOV lets the compiler and maybe the
processor do copy propagation.
Change-Id: Ib870e7b9d26eb118eefdaa3e76dcec4a4d459584
Reviewed-on: https://go-review.googlesource.com/9398 Reviewed-by: Ian Lance Taylor <iant@golang.org>
net: don't miss testing server teardowns when test fails early
Change-Id: I9fa678e43b4ae3970323cac474b5f86d4d933997
Reviewed-on: https://go-review.googlesource.com/9382 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Rob Pike [Fri, 24 Apr 2015 19:28:18 +0000 (12:28 -0700)]
cmd/go,cmd/doc: add "go doc"
Add the new go doc command to the go command, installed in
the tool directory.
(Still to do: tests)
Fix cmd/dist to remove old "package documentation" code that was
stopping it from including cmd/go/doc.go in the build.
Implement the doc command. Here is the help info from "go help doc":
===
usage: go doc [-u] [package|[package.]symbol[.method]]
Doc accepts at most one argument, indicating either a package, a symbol within a
package, or a method of a symbol.
go doc
go doc <pkg>
go doc <sym>[.<method>]
go doc [<pkg>].<sym>[.<method>]
Doc interprets the argument to see what it represents, determined by its syntax
and which packages and symbols are present in the source directories of GOROOT and
GOPATH.
The first item in this list that succeeds is the one whose documentation is printed.
For packages, the order of scanning is determined by the file system, however the
GOROOT tree is always scanned before GOPATH.
If there is no package specified or matched, the package in the current directory
is selected, so "go doc" shows the documentation for the current package and
"go doc Foo" shows the documentation for symbol Foo in the current package.
Doc prints the documentation comments associated with the top-level item the
argument identifies (package, type, method) followed by a one-line summary of each
of the first-level items "under" that item (package-level declarations for a
package, methods for a type, etc.)
The package paths must be either a qualified path or a proper suffix of a path
(see examples below). The go tool's usual package mechanism does not apply: package
path elements like . and ... are not implemented by go doc.
When matching symbols, lower-case letters match either case but upper-case letters
match exactly.
Examples:
go doc
Show documentation for current package.
go doc Foo
Show documentation for Foo in the current package.
(Foo starts with a capital letter so it cannot match a package path.)
go doc json
Show documentation for the encoding/json package.
go doc json
Shorthand for encoding/json assuming only one json package
is present in the tree.
go doc json.Number (or go doc json.number)
Show documentation and method summary for json.Number.
go doc json.Number.Int64 (or go doc json.number.int64)
Show documentation for the Int64 method of json.Number.
Flags:
-u
Show documentation for unexported as well as exported
symbols and methods.
===
Still to do:
Tests.
Disambiguation when there is both foo and Foo.
Flag for case-sensitive matching.
Some race tests were sensitive to the goroutine scheduling order.
When this changed in commit e870f06, these tests started to fail.
Fix TestRaceHeapParam by ensuring that the racing goroutine has
run before the test exits. Fix TestRaceRWMutexMultipleReaders by
adding a third reader to ensure that two readers wind up on the
same side of the writer (and race with each other) regardless of
the schedule. Fix TestRaceRange by ensuring that the racing
goroutine runs before the main goroutine exits the loop it races
with.
ReadMemStats accounts for stacks slightly differently than the runtime
does internally. Internally, only stacks allocated by newosproc0 are
accounted in memstats.stacks_sys and other stacks are accounted in
heap_sys. readmemstats_m shuffles the statistics so all stacks are
accounted in StackSys rather than HeapSys.
However, currently, readmemstats_m assumes StackSys will be zero when
it does this shuffle. This was true until commit 6ad33be. If it isn't
(e.g., if something called newosproc0), StackSys+HeapSys will be
different before and after this shuffle, and the Sys sum that was
computed earlier will no longer agree with the sum of its components.
Fix this by making the shuffle in readmemstats_m not assume that
StackSys is zero.
David Crawshaw [Mon, 27 Apr 2015 14:56:38 +0000 (10:56 -0400)]
runtime: remove unnecessary noescape to fix netbsd
I introduced this build failure in golang.org/cl/9302 but failed to
notice due to the other failures on the dashboard.
Change-Id: I84bf00f664ba572c1ca722e0136d8a2cf21613ca
Reviewed-on: https://go-review.googlesource.com/9363 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Minux Ma <minux@golang.org>
Currently TestRaceCrawl fails to wg.Done for every wg.Adds if the
depth ever reaches 0. This causes the test to deadlock. Under the race
detector, this deadlock is not detected, so the test eventually times
out.
This only recently became a problem. Prior to commit e870f06 the depth
would never reach 0 because the strict round-robin goroutine schedule
ensured that all of the URLs were already "seen" by depth 2. Now that
the runtime prefers scheduling the most recently started goroutine,
the test is able to reach depth 0 and trigger this deadlock.
regexp: trivial change in comments to update code.google.com link
Replaced code.google.com/p/re2/ with github.com/google/re2/ and
updated the file names (re2-exhaustive.txt.bz2 not re2.txt.gz)
as well as the re2 make command (make log).
The master goroutine was returning before
the child goroutine had done its final i < b.N
(the one that fails and causes it to exit the loop)
and then the benchmark harness was updating
b.N, causing a read+write race on b.N.
runtime: remove a modulus calculation from pollorder
This is a follow-up to CL 9269, as suggested
by dvyukov.
There is probably even more that can be done
to speed up this shuffle. It will matter more
once CL 7570 (fine-grained locking in select)
is in and can be revisited then, with benchmarks.
Austin Clements [Fri, 27 Mar 2015 21:01:53 +0000 (17:01 -0400)]
runtime: replace STW for enabling write barriers with ragged barrier
Currently, we use a full stop-the-world around enabling write
barriers. This is to ensure that all Gs have enabled write barriers
before any blackening occurs (either in gcBgMarkWorker() or in
gcAssistAlloc()).
However, there's no need to bring the whole world to a synchronous
stop to ensure this. This change replaces the STW with a ragged
barrier that ensures each P has individually observed that write
barriers should be enabled before GC performs any blackening.
Austin Clements [Fri, 27 Mar 2015 20:49:12 +0000 (16:49 -0400)]
runtime: add ragged global barrier function
This adds forEachP, which performs a general-purpose ragged global
barrier. forEachP takes a callback and invokes it for every P at a GC
safe point.
Ps that are idle or in a syscall are considered to be at a continuous
safe point. forEachP ensures that these Ps do not change state by
forcing all syscall Ps into idle and holding the sched.lock.
To ensure that Ps do not enter syscall or idle without running the
safe-point function, this adds checks for a pending callback every
place there is currently a gcwaiting check.
We'll use forEachP to replace the STW around enabling the write
barrier and to replace the current asynchronous per-M wbuf cache with
a cooperatively managed per-P gcWork cache.
runtime: reset spinning in mspinning if work was ready()ed
This fixes a bug where the runtime ready()s a goroutine while setting
up a new M that's initially marked as spinning, causing the scheduler
to later panic when it finds work in the run queue of a P associated
with a spinning M. Specifically, the sequence of events that can lead
to this is:
1) sysmon calls handoffp to hand off a P stolen from a syscall.
2) handoffp sees no pending work on the P, so it calls startm with
spinning set.
3) startm calls newm, which in turn calls allocm to allocate a new M.
4) allocm "borrows" the P we're handing off in order to do allocation
and performs this allocation.
5) This allocation may assist the garbage collector, and this assist
may detect the end of concurrent mark and ready() the main GC
goroutine to signal this.
6) This ready()ing puts the GC goroutine on the run queue of the
borrowed P.
7) newm starts the OS thread, which runs mstart and subsequently
mstart1, which marks the M spinning because startm was called with
spinning set.
8) mstart1 enters the scheduler, which panics because there's work on
the run queue, but the M is marked spinning.
To fix this, before marking the M spinning in step 7, add a check to
see if work was been added to the P's run queue. If this is the case,
undo the spinning instead.
Jonathan Rudenberg [Sun, 26 Apr 2015 16:05:37 +0000 (12:05 -0400)]
crypto/tls: add OCSP response to ConnectionState
The OCSP response is currently only exposed via a method on Conn,
which makes it inaccessible when using wrappers like net/http. The
ConnectionState structure is typically available even when using
wrappers and contains many of the other handshake details, so this
change exposes the stapled OCSP response in that structure.
Change-Id: If8dab49292566912c615d816321b4353e711f71f
Reviewed-on: https://go-review.googlesource.com/9361 Reviewed-by: Adam Langley <agl@golang.org>
Run-TryBot: Adam Langley <agl@golang.org>
David Leon Gil [Wed, 7 Jan 2015 05:07:24 +0000 (21:07 -0800)]
crypto/elliptic: don't unmarshal points that are off the curve
At present, Unmarshal does not check that the point it unmarshals
is actually *on* the curve. (It may be on the curve's twist.)
This can, as Daniel Bernstein has pointed out at great length,
lead to quite devastating attacks. And 3 out of the 4 curves
supported by crypto/elliptic have twists with cofactor != 1;
P-224, in particular, has a sufficiently large cofactor that it
is likely that conventional dlog attacks might be useful.
This closes #2445, filed by Watson Ladd.
To explain why this was (partially) rejected before being accepted:
In the general case, for curves with cofactor != 1, verifying subgroup
membership is required. (This is expensive and hard-to-implement.)
But, as recent discussion during the CFRG standardization process
has brought out, small-subgroup attacks are much less damaging than
a twist attack.
Change-Id: I284042eb9954ff9b7cde80b8b693b1d468c7e1e8
Reviewed-on: https://go-review.googlesource.com/2421 Reviewed-by: Adam Langley <agl@golang.org>
This implements a method for x509.CertificateRequest to prevent
certain attacks and to allow a CA/RA to properly check the validity
of the binding between an end entity and a key pair, to prove that
it has possession of (i.e., is able to use) the private key
corresponding to the public key for which a certificate is requested.
RFC 2986 section 3 states:
"A certification authority fulfills the request by authenticating the
requesting entity and verifying the entity's signature, and, if the
request is valid, constructing an X.509 certificate from the
distinguished name and public key, the issuer name, and the
certification authority's choice of serial number, validity period,
and signature algorithm."
Change-Id: I37795c3b1dfdfdd455d870e499b63885eb9bda4f
Reviewed-on: https://go-review.googlesource.com/7371 Reviewed-by: Adam Langley <agl@golang.org>
Jonathan Rudenberg [Sat, 18 Apr 2015 01:32:11 +0000 (21:32 -0400)]
crypto/tls: add support for session ticket key rotation
This change adds a new method to tls.Config, SetSessionTicketKeys, that
changes the key used to encrypt session tickets while the server is
running. Additional keys may be provided that will be used to maintain
continuity while rotating keys. If a ticket encrypted with an old key is
provided by the client, the server will resume the session and provide
the client with a ticket encrypted using the new key.
Fixes #9994
Change-Id: Idbc16b10ff39616109a51ed39a6fa208faad5b4e
Reviewed-on: https://go-review.googlesource.com/9072 Reviewed-by: Jonathan Rudenberg <jonathan@titanous.com> Reviewed-by: Adam Langley <agl@golang.org>
Jonathan Rudenberg [Thu, 16 Apr 2015 18:59:22 +0000 (14:59 -0400)]
crypto/tls: add support for Certificate Transparency
This change adds support for serving and receiving Signed Certificate
Timestamps as described in RFC 6962.
The server is now capable of serving SCTs listed in the Certificate
structure. The client now asks for SCTs and, if any are received,
they are exposed in the ConnectionState structure.
Fixes #10201
Change-Id: Ib3adae98cb4f173bc85cec04d2bdd3aa0fec70bb
Reviewed-on: https://go-review.googlesource.com/8988 Reviewed-by: Adam Langley <agl@golang.org>
Run-TryBot: Adam Langley <agl@golang.org> Reviewed-by: Jonathan Rudenberg <jonathan@titanous.com>
Currently parseRecord will always start with a nil
slice and then resize the slice on append. For input
with a fixed number of fields per record we can preallocate
the slice to avoid having to resize the slice.
This change implements this optimization by using
FieldsPerRecord as capacity if it's > 0 and also adds a
benchmark to better show the differences.
benchmark old ns/op new ns/op delta
BenchmarkRead 19741 17909 -9.28%
benchmark old allocs new allocs delta
BenchmarkRead 59 41 -30.51%
benchmark old bytes new bytes delta
BenchmarkRead 6276 5844 -6.88%
David Crawshaw [Fri, 24 Apr 2015 16:47:46 +0000 (12:47 -0400)]
runtime: signal forwarding for darwin/amd64
Follows the linux signal forwarding semantics from
http://golang.org/cl/8712, sharing the implementation of sigfwdgo.
Forwarding for 386, arm, and arm64 will follow.
Change-Id: I6bf30d563d19da39b6aec6900c7fe12d82ed4f62
Reviewed-on: https://go-review.googlesource.com/9302 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Matt T. Proud [Sun, 12 Apr 2015 17:50:52 +0000 (19:50 +0200)]
testing/quick: align tests with reflect.Kind.
This commit is largely cosmetic in the sense that it is the remnants
of a change proposal I had prepared for testing/quick, until I
discovered that 3e9ed27 already implemented the feature I was looking
for: quick.Value() for reflect.Kind Array. What you see is a merger
and manual cleanup; the cosmetic cleanups are as follows:
(1.) Keeping the TestCheckEqual and its associated input functions
in the same order as type kinds defined in reflect.Kind. Since 3e9ed27 was committed, the test case began to diverge from the
constant's ordering.
(2.) The `Intptr` derivatives existed to exercise quick.Value with
reflect.Kind's `Ptr` constant. All `Intptr` (unrelated to `uintptr`)
in the test have been migrated to ensure the parallelism of the
listings and to convey that `Intptr` is not special.
(3.) Correct a misspelling (transposition) of "alias", whereby it is
named as "Alais".
Michael Hudson-Doyle [Thu, 23 Apr 2015 09:53:48 +0000 (21:53 +1200)]
cmd/8l, cmd/internal/ld, cmd/internal/obj/x86: stop incorrectly using the term "inital exec"
The long comment block in obj6.go:progedit talked about the two code sequences
for accessing g as "local exec" and "initial exec", but really they are both forms
of local exec. This stuff is confusing enough without using the wrong words for
things, so rewrite it to talk about 2-instruction and 1-instruction sequences.
Unfortunately the confusion has made it into code, with the R_TLS_IE relocation
now doing double duty as meaning actual initial exec when externally linking and
boring old local exec when linking internally (half of this is my fault). So this
stops using R_TLS_IE in the local exec case. There is a chance this might break
plan9 or windows, but I don't think so. Next step is working out what the heck is
going on on ARM...
Change-Id: I09da4388210cf49dbc99fd25f5172bbe517cee57
Reviewed-on: https://go-review.googlesource.com/9273 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
runtime: replace per-M workbuf cache with per-P gcWork cache
Currently, each M has a cache of the most recently used *workbuf. This
is used primarily by the write barrier so it doesn't have to access
the global workbuf lists on every write barrier. It's also used by
stack scanning because it's convenient.
This cache is important for write barrier performance, but this
particular approach has several downsides. It's faster than no cache,
but far from optimal (as the benchmarks below show). It's complex:
access to the cache is sprinkled through most of the workbuf list
operations and it requires special care to transform into and back out
of the gcWork cache that's actually used for scanning and marking. It
requires atomic exchanges to take ownership of the cached workbuf and
to return it to the M's cache even though it's almost always used by
only the current M. Since it's per-M, flushing these caches is O(# of
Ms), which may be high. And it has some significant subtleties: for
example, in general the cache shouldn't be used after the
harvestwbufs() in mark termination because it could hide work from
mark termination, but stack scanning can happen after this and *will*
use the cache (but it turns out this is okay because it will always be
followed by a getfull(), which drains the cache).
This change replaces this cache with a per-P gcWork object. This
gcWork cache can be used directly by scanning and marking (as long as
preemption is disabled, which is a general requirement of gcWork).
Since it's per-P, it doesn't require synchronization, which simplifies
things and means the only atomic operations in the write barrier are
occasionally fetching new work buffers and setting a mark bit if the
object isn't already marked. This cache can be flushed in O(# of Ps),
which is generally small. It follows a simple flushing rule: the cache
can be used during any phase, but during mark termination it must be
flushed before allowing preemption. This also makes the dispose during
mutator assist no longer necessary, which eliminates the vast majority
of gcWork dispose calls and reduces contention on the global workbuf
lists. And it's a lot faster on some benchmarks:
(Best out of 10 runs. The delta of averages is similar.)
This also puts us in a good position to flush these caches when
nearing the end of concurrent marking, which will let us increase the
size of the work buffers while still controlling mark termination
pause time.
When findRunnable considers running a fractional mark worker, it first
checks if there's any work to be done; if there isn't there's no point
in running the worker because it will just reschedule immediately.
However, currently findRunnable just checks work.full and
work.partial, whereas getfull can *also* draw work from m.currentwbuf.
As a result, findRunnable may not start a worker even though there
actually is work.
This problem manifests itself in occasional failures of the
test/init1.go test. This test is unusual because it performs a large
amount of allocation without executing any write barriers, which means
there's nothing to force the pointers in currentwbuf out to the
work.partial/full lists where findRunnable can see them.
This change fixes this problem by making findRunnable also check for a
currentwbuf. This aligns findRunnable with trygetfull's notion of
whether or not there's work.
runtime: start dedicated mark workers even if there's no work
Currently, findRunnable only considers running a mark worker if
there's work in the work queue. In principle, this can delay the start
of the desired number of dedicated mark workers if there's no work
pending. This is unlikely to occur in practice, since there should be
work queued from the scan phase, but if it were to come up, a CPU hog
mutator could slow down or delay garbage collection.
This check makes sense for fractional mark workers, since they'll just
return to the scheduler immediately if there's no work, but we want
the scheduler to start all of the dedicated mark workers promptly,
even if there's currently no queued work. Hence, this change moves the
pending work check after the check for starting a dedicated worker.
Change-Id: I52b851cc9e41f508a0955b3f905ca80f109ea101
Reviewed-on: https://go-review.googlesource.com/9298 Reviewed-by: Rick Hudson <rlh@golang.org>
Hyang-Ah Hana Kim [Fri, 24 Apr 2015 18:11:29 +0000 (14:11 -0400)]
misc/cgo/testcshared: make test.bash resilient against noise.
Instead of comparing against the entire output that may include
verbose warning messages, use the last line of the output and check
it includes the expected success message (PASS).
Change-Id: Iafd583ee5529a8aef5439b9f1f6ce0185e4b1331
Reviewed-on: https://go-review.googlesource.com/9304 Reviewed-by: David Crawshaw <crawshaw@golang.org>
runtime: implement xadduintptr and update system mstats using it
The motivation is that sysAlloc/Free() currently aren't safe to be
called without a valid G, because arm's xadd64() uses locks that require
a valid G.
The solution here was proposed by Dmitry Vyukov: use xadduintptr()
instead of xadd64(), until arm can support xadd64 on all of its
architectures (not a trivial task for arm).
Hyang-Ah Hana Kim [Thu, 23 Apr 2015 21:27:38 +0000 (17:27 -0400)]
misc/cgo/testcshared: add a c-shared test for android/arm.
- main3.c tests main.main is exported when compiled for GOOS=android.
- wait longer for main2.c (it's slow on android/arm)
- rearranged test.bash
Fixes #10070.
Change-Id: I6e5a98d1c5fae776afa54ecb5da633b59b269316
Reviewed-on: https://go-review.googlesource.com/9296 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
Michael Hudson-Doyle [Fri, 17 Apr 2015 20:14:08 +0000 (08:14 +1200)]
cmd/internal/gc, cmd/internal/ld, cmd/internal/obj: teach compiler about local symbols
This lets us avoid loading string constants via the GOT and (together with
http://golang.org/cl/9102) results in the fannkuch benchmark having very similar
register usage with -dynlink as without.
Change-Id: Ic3892b399074982b76773c3e547cfbba5dabb6f9
Reviewed-on: https://go-review.googlesource.com/9103 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
runtime: simplify process for starting GC goroutine
Currently, when allocation reaches the GC trigger, the runtime uses
readyExecute to start the GC goroutine immediately rather than wait
for the scheduler to get around to the GC goroutine while the mutator
continues to grow the heap.
Now that the scheduler runs the most recently readied goroutine when a
goroutine yields its time slice, this rigmarole is no longer
necessary. The runtime can simply ready the GC goroutine and yield
from the readying goroutine.
runtime: use park/ready to wake up GC at end of concurrent mark
Currently, the main GC goroutine sleeps on a note during concurrent
mark and the first background mark worker or assist to finish marking
use wakes up that note to let the main goroutine proceed into mark
termination. Unfortunately, the latency of this wakeup can be quite
high, since the GC goroutine will typically have lost its P while in
the futex sleep, meaning it will be placed on the global run queue and
will wait there until some P is kind enough to pick it up. This delay
gives the mutator more time to allocate and create floating garbage,
growing the heap unnecessarily. Worse, it's likely that background
marking has stopped at this point (unless GOMAXPROCS>4), so anything
that's allocated and published to the heap during this window will
have to be scanned during mark termination while the world is stopped.
This change replaces the note sleep/wakeup with a gopark/ready
scheme. This keeps the wakeup inside the Go scheduler and lets the
garbage collector take advantage of the new scheduler semantics that
run the ready()d goroutine immediately when the ready()ing goroutine
sleeps.
For the json benchmark from x/benchmarks with GOMAXPROCS=4, this
reduces the delay in waking up the GC goroutine and entering mark
termination once concurrent marking is done from ~100ms to typically
<100µs.
runtime: use timer for GC control revise rather than timeout
Currently, we use a note sleep with a timeout in a loop in func gc to
periodically revise the GC control variables. Replace this with a
fully blocking note sleep and use a periodic timer to trigger the
revise instead. This is a step toward replacing the note sleep in func
gc.
runtime: yield time slice to most recently readied G
Currently, when the runtime ready()s a G, it adds it to the end of the
current P's run queue and continues running. If there are many other
things in the run queue, this can result in a significant delay before
the ready()d G actually runs and can hurt fairness when other Gs in
the run queue are CPU hogs. For example, if there are three Gs sharing
a P, one of which is a CPU hog that never voluntarily gives up the P
and the other two of which are doing small amounts of work and
communicating back and forth on an unbuffered channel, the two
communicating Gs will get very little CPU time.
Change this so that when G1 ready()s G2 and then blocks, the scheduler
immediately hands off the remainder of G1's time slice to G2. In the
above example, the two communicating Gs will now act as a unit and
together get half of the CPU time, while the CPU hog gets the other
half of the CPU time.
This fixes the problem demonstrated by the ping-pong benchmark added
in the previous commit:
benchmark old ns/op new ns/op delta
BenchmarkPingPongHog 684287 825 -99.88%
On the x/benchmarks suite, this change improves the performance of
garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for
GOMAXPROCS=1 and 4. It has negligible effect on heap size.
This has no effect on the go1 benchmark suite since those benchmarks
are mostly single-threaded.
runtime: benchmark for ping-pong in the presence of a CPU hog
This benchmark demonstrates a current problem with the scheduler where
a set of frequently communicating goroutines get very little CPU time
in the presence of another goroutine that hogs that CPU, even if one
of those communicating goroutines is always runnable.
Currently it takes about 0.5 milliseconds to switch between
ping-ponging goroutines in the presence of a CPU hog:
There are a variety of places where we check if a P's run queue is
empty. This test is about to get slightly more complicated, so factor
it out into a new function, runqempty. This function is inlinable, so
this has no effect on performance.
Shenghou Ma [Thu, 23 Apr 2015 06:16:31 +0000 (02:16 -0400)]
cmd/dist: allow $GO_TEST_TIMEOUT_SCALE to override timeoutScale
Some machines are so slow that even with the default timeoutScale,
they still timeout some tests. For example, currently some linux/arm
builders and the openbsd/arm builder are timing out the runtime
test and CL 8397 was proposed to skip some tests on openbsd/arm
to fix the build.
Instead of increasing timeoutScale or skipping tests, this CL
introduces an environment variable $GO_TEST_TIMEOUT_SCALE that
could be set to manually set a larger timeoutScale for those
machines/builders.
Forward signals to signal handlers installed before Go installs its own,
under certain circumstances. In particular, as iant@ suggests, signals are
forwarded iff:
(1) a non-SIG_DFL signal handler existed before Go, and
(2) signal is synchronous (i.e., one of SIGSEGV, SIGBUS, SIGFPE), and
(3a) signal occured on a non-Go thread, or
(3b) signal occurred on a Go thread but in CGo code.
Supported only on Linux, for now.
Change-Id: I403219ee47b26cf65da819fb86cf1ec04d3e25f5
Reviewed-on: https://go-review.googlesource.com/8712 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Change-Id: I7b5b282f61b11aab587402c2d302697e76666376
Reviewed-on: https://go-review.googlesource.com/9222 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently, it's possible for the next_gc calculation to underflow.
Since next_gc is unsigned, this wraps around and effectively disables
GC for the rest of the program's execution. Besides being obviously
wrong, this is causing test failures on 32-bit because some tests are
running out of heap.
This underflow happens for two reasons, both having to do with how we
estimate the reachable heap size at the end of the GC cycle.
One reason is that this calculation depends on the value of heap_live
at the beginning of the GC cycle, but we currently only record that
value during a concurrent GC and not during a forced STW GC. Fix this
by moving the recorded value from gcController to work and recording
it on a common code path.
The other reason is that we use the amount of allocation during the GC
cycle as an approximation of the amount of floating garbage and
subtract it from the marked heap to estimate the reachable heap.
However, since this is only an approximation, it's possible for the
amount of allocation during the cycle to be *larger* than the marked
heap size (since the runtime allocates white and it's possible for
these allocations to never be made reachable from the heap). Currently
this causes wrap-around in our estimate of the reachable heap size,
which in turn causes wrap-around in next_gc. Fix this by bottoming out
the reachable heap estimate at 0, in which case we just fall back to
triggering GC at heapminimum (which is okay since this only happens on
small heaps).