Keith Randall [Fri, 22 Apr 2016 20:09:18 +0000 (13:09 -0700)]
cmd/compile: get rid of most byte and word insns for amd64
Now that we're using 32-bit ops for 8/16-bit logical operations
(to avoid partial register stalls), there's really no need to
keep track of the 8/16-bit ops at all. Convert everything we
can to 32-bit ops.
This CL is the obvious stuff. I might think a bit more about
whether we can get rid of weirder stuff like HMULWU.
The only downside to this CL is that we lose some information
about constants. If we had source like:
var a byte = ...
a += 128
a += 128
We will convert that to a += 256, when we could get rid of the
add altogether. This seems like a fairly unusual scenario and
I'm happy with forgoing that optimization.
Keith Randall [Wed, 20 Apr 2016 22:02:48 +0000 (15:02 -0700)]
cmd/compile: combine stores into larger widths
Combine stores into larger widths when it is safe to do so.
Add clobber() function so stray dead uses do not impede the
above rewrites.
Fix bug in loads where all intermediate values depending on
a small load (not just the load itself) must have no other uses.
We really need the small load to be dead after the rewrite..
runtime: use per-goroutine sequence numbers in tracer
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.
If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.
On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)
Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.
Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.
Matthew Dempsky [Fri, 22 Apr 2016 19:27:29 +0000 (12:27 -0700)]
cmd/compile: replace Ctype switches with type switches
Instead of switching on Ctype (which internally uses a type switch)
and then scattering lots of type assertions throughout the CTFOO case
clauses, just use type switches directly on the underlying constant
value.
Ian Lance Taylor [Fri, 22 Apr 2016 17:17:06 +0000 (10:17 -0700)]
cmd/dist: skip misc/cgo/test with internal linking on ppc64le
CL 22372 changed ppc64le to use normal cgo initialization on ppc64le.
Doing this uncovered a cmd/link error using internal linking.
Opened issue 15409 for the problem. This CL disables the test.
Update #15409.
Change-Id: Ia1bb6b874c1b5a4df1a0436c8841c145142c30f7
Reviewed-on: https://go-review.googlesource.com/22379
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
David Chase [Fri, 22 Apr 2016 16:15:08 +0000 (12:15 -0400)]
cmd/compile: in a Tarjan algorithm, DFS should really be DFS
Replaced incorrect recursion-free rendering of DFS with
something that was correct. Enhanced test with all
permutations of IF successors to ensure that all possible
DFS traversals are exercised.
Test is improved version of
https://go-review.googlesource.com/#/c/22334
Update 15084.
Change-Id: I6e944c41244e47fe5f568dfc2b360ff93b94079e
Reviewed-on: https://go-review.googlesource.com/22347 Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Chris Zou [Mon, 18 Apr 2016 23:30:17 +0000 (19:30 -0400)]
hash/crc32: use vector instructions on s390x
The input buffer is aligned to a doubleword boundary to
improve performance of the vector instructions. The pure
Go implementation is used to align the input data, and is
also used when the vector instructions are not available
or the data length is less than 64 bytes.
Change-Id: Ie259a5f2f1562bcc17961c99e5776c99091d6bed
Reviewed-on: https://go-review.googlesource.com/22201 Reviewed-by: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bill O'Farrell <billotosyr@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I0adbb95685e28be92e8548741df0e11daa0a9b5f
Reviewed-on: https://go-review.googlesource.com/21777 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Rob Pike [Thu, 21 Apr 2016 19:43:22 +0000 (12:43 -0700)]
encoding/gob: document compatibility
Fixes #13808.
Change-Id: Ifbd5644da995a812438a405485c9e08b4503a313
Reviewed-on: https://go-review.googlesource.com/22352 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Rob Pike [Thu, 21 Apr 2016 21:53:19 +0000 (14:53 -0700)]
time: print zero duration as 0s, not 0
There should be a unit, and s is the SI unit name, so use that.
The other obvious possibility is ns (nanosecond), but the fact
that durations are measured in nanoseconds is an internal detail.
Fixes #14058.
Change-Id: Id1f8f3c77088224d9f7cd643778713d5cc3be5d9
Reviewed-on: https://go-review.googlesource.com/22357 Reviewed-by: Robert Griesemer <gri@golang.org>
Matthew Dempsky [Mon, 18 Apr 2016 21:02:08 +0000 (14:02 -0700)]
cmd/compile: split TSLICE into separate Type kind
Instead of using TARRAY for both arrays and slices, create a new
TSLICE kind to handle slices.
Also, get rid of the "DDDArray" distinction. While kinda ugly, it
seems likely we'll need to defer evaluating the constant bounds
expressions for golang.org/issue/13890.
Austin Clements [Wed, 30 Mar 2016 21:15:15 +0000 (17:15 -0400)]
runtime: eliminate floating garbage estimate
Currently when we compute the trigger for the next GC, we do it based
on an estimate of the reachable heap size at the start of the GC
cycle, which is itself based on an estimate of the floating garbage.
This was introduced by 4655aad to fix a bad feedback loop that allowed
the heap to grow to many times the true reachable size.
However, this estimate gets easily confused by rapidly allocating
applications, and, worse it's different than the heap size the trigger
controller uses to compute the trigger itself. This results in the
trigger controller often thinking that GC finished before it started.
Since this would be a pretty great outcome from it's perspective, it
sets the trigger for the next cycle as close to the next goal as
possible (which is limited to 95% of the goal).
Furthermore, the bad feedback loop this estimate originally fixed
seems not to happen any more, suggesting it was fixed more correctly
by some other change in the mean time. Finally, with the change to
allocate black, it shouldn't even be theoretically possible for this
bad feedback loop to occur.
Hence, eliminate the floating garbage estimate and simply consider the
reachable heap to be the marked heap. This harms overall throughput
slightly for allocation-heavy benchmarks, but significantly improves
mutator availability.
Fixes #12204. This brings the average trigger in this benchmark from
0.95 (the cap) to 0.7 and the active GC utilization from ~90% to ~45%.
Updates #14951. This makes the trigger controller much better behaved,
so it pulls the trigger lower if assists are consuming a lot of CPU
like it's supposed to, increasing mutator availability.
name old time/op new time/op delta
XBenchGarbage-12 2.21ms ± 1% 2.28ms ± 3% +3.29% (p=0.000 n=17+17)
Some of this slow down we paid for in earlier commits. Relative to the
start of the series to switch to allocate-black (the parent of "count
black allocations toward scan work"), the garbage benchmark is 2.62%
slower.
Austin Clements [Wed, 30 Mar 2016 21:02:23 +0000 (17:02 -0400)]
runtime: allocate black during GC
Currently we allocate white for most of concurrent marking. This is
based on the classical argument that it produces less floating
garbage, since allocations during GC may not get linked into the heap
and allocating white lets us reclaim these. However, it's not clear
how often this actually happens, especially since our write barrier
shades any pointer as soon as it's installed in the heap regardless of
the color of the slot.
On the other hand, allocating black has several advantages that seem
to significantly outweigh this downside.
1) It naturally bounds the total scan work to the live heap size at
the start of a GC cycle. Allocating white does not, and thus depends
entirely on assists to prevent the heap from growing faster than it
can be scanned.
2) It reduces the total amount of scan work per GC cycle by the size
of newly allocated objects that are linked into the heap graph, since
objects allocated black never need to be scanned.
3) It reduces total write barrier work since more objects will already
be black when they are linked into the heap graph.
This gives a slight overall improvement in benchmarks.
name old time/op new time/op delta
XBenchGarbage-12 2.24ms ± 0% 2.21ms ± 1% -1.32% (p=0.000 n=18+17)
Currently allocating black switches to the system stack (which is
probably a historical accident) and atomically updates the global
bytes marked stat. Since we're about to depend on this much more,
optimize it a bit by putting it back on the regular stack and updating
the per-P bytes marked stat, which gets lazily folded into the global
bytes marked stat.
Currently we count black allocations toward the scannable heap size,
but not toward the scan work we've done so far. This is clearly
inconsistent (we have, in effect, scanned these allocations and since
they're already black, we're not going to scan them again). Worse, it
means we don't count black allocations toward the scannable heap size
as of the *next* GC because this is based on the amount of scan work
we did in this cycle.
Fix this by counting black allocations as scan work. Currently the GC
spends very little time in allocate-black mode, so this probably
hasn't been a problem, but this will become important when we switch
to always allocating black.
Marcel van Lohuizen [Fri, 29 Jan 2016 15:16:03 +0000 (16:16 +0100)]
testing: add matching of subtest
Allows passing regexps per subtest to --test.run and --test.bench
Note that the documentation explicitly states that the split regular
expressions match the correpsonding parts (path components) of
the bench/test identifier. This is intended and slightly different
from the i'th RE matching the subtest/subbench at the respective
level. Picking this semantics allows guaranteeing that a test or
benchmark identifier as printed by go test can be passed verbatim
(possibly quoted) to, respectively, -run or -bench: subtests and
subbenches might have a '/' in their name, causing a misaligment if
their ID is passed to -run or -bench as is.
This semantics has other benefits, but this is the main motivation.
cmd/compile/internal/gc: fix return value offset for SSA backend on ARM
Progress on SSA backend for ARM. Still not complete. It compiles a
Fibonacci function, but the caller picked the return value from an
incorrect offset. This CL adjusts it to match the stack frame layout
for architectures with link register.
Alex Brainman [Thu, 21 Apr 2016 01:44:05 +0000 (11:44 +1000)]
debug/pe: introduce Section.Relocs
cmd/link reads PE object files when building programs with cgo.
cmd/link accesses object relocations. Add new Section.Relocs that
provides similar functionality in debug/pe.
Robert Griesemer [Wed, 20 Apr 2016 20:40:55 +0000 (13:40 -0700)]
math/big: more tests, documentation for Flot gob marshalling
Follow-up to https://golang.org/cl/21755.
This turned out to be a bit more than just a few nits
as originally expected in that CL.
1) The actual mantissa may be shorter than required for the
given precision (because of trailing 0's): no need to
allocate space for it (and transmit 0's). This can save
a lot of space when the precision is high: E.g., for
prec == 1000, 16 words or 128 bytes are required at the
most, but if the actual number is short, it may be much
less (for the test cases present, it's significantly less).
2) The actual mantissa may be longer than the number of
words required for the given precision: make sure to
not overflow when encoding in bytes.
3) Add more documentation.
4) Add more tests.
Change-Id: I9f40c408cfdd9183a8e81076d2f7d6c75e7a00e9
Reviewed-on: https://go-review.googlesource.com/22324 Reviewed-by: Alan Donovan <adonovan@google.com>
Keith Randall [Tue, 19 Apr 2016 15:31:04 +0000 (08:31 -0700)]
cmd/compile: change the way we handle large map values
mapaccess{1,2} returns a pointer to the value. When the key
is not in the map, it returns a pointer to zeroed memory.
Currently, for large map values we have a complicated scheme which
dynamically allocates zeroed memory for this purpose. It is ugly
code and requires an atomic.Load in a bunch of places we'd rather
not have it.
Switch to a scheme where callsites of mapaccess{1,2} which expect
large return values pass in a pointer to zeroed memory that
mapaccess can return if the key is not found. This avoids the
atomic.Load on all map accesses with a few extra instructions only
for the large value acccesses, plus a bit of bss space.
There was a time (1.4 & 1.5?) where we did something like this but
all the tricks to make the right size zero value were done by the
linker. That scheme broke in the presence of dyamic linking.
The scheme in this CL works even when dynamic linking.
Fixes #12337
Change-Id: Ic2d0319944af33bbb59785938d9ab80958d1b4b1
Reviewed-on: https://go-review.googlesource.com/22221
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Michael Munday [Fri, 15 Apr 2016 22:45:17 +0000 (18:45 -0400)]
crypto/aes: add s390x assembly implementation
Adds support for single block encryption using the cipher message
(KM) instruction. KM handles key expansion internally and
therefore it is not done up front when using the assembly
implementation on s390x.
Change-Id: I69954b8ae36d549e1dc40d7acd5a10bedfaaef9c
Reviewed-on: https://go-review.googlesource.com/22194
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bill O'Farrell <billotosyr@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Alex Brainman [Mon, 18 Apr 2016 06:12:48 +0000 (16:12 +1000)]
debug/pe: introduce StringTable type
PE specification requires that long section and symbol names
are stored in PE string table. Introduce StringTable that
implements this functionality. Only string table reading is
implemented.
Keith Randall [Tue, 19 Apr 2016 22:38:59 +0000 (15:38 -0700)]
cmd/compile,runtime: pass elem type to {make,grow}slice
No point in passing the slice type to these functions.
All they need is the element type. One less indirection,
maybe a few less []T type descriptors in the binary.
There's no need for Eiota, Eindir, Eaddr, or Eproc; the values are
threaded through to denote various typechecking contexts, but they
don't actually influence typechecking behavior at all.
Also, while here, switch the Efoo const declarations to use iota.
Julia Hansbrough [Mon, 18 Apr 2016 22:53:29 +0000 (15:53 -0700)]
runtime: updated SIGSYS to cause a panic + stacktrace
On GNU/Linux, SIGSYS is specified to cause the process to terminate
without a core dump. In https://codereview.appspot.com/3749041 , it
appears that Golang accidentally introduced incorrect behavior for
this signal, which caused Golang processes to keep running after
receiving SIGSYS. This change reverts it to the old/correct behavior.
Updates #15204
Change-Id: I3aa48a9499c1bc36fa5d3f40c088fdd7599e0db5
Reviewed-on: https://go-review.googlesource.com/22202 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Previously, isStaticCompositeLiteral would
return the wrong value for literals like:
[1]struct{ b []byte }{b: []byte{1}}
Note that the outermost component is an array,
but once we recurse into isStaticCompositeLiteral,
we never check again that arrays are actually arrays.
Instead of adding more logic to the guts of
isStaticCompositeLiteral, allow it to accept
any Node and return the correct answer.
Change-Id: I6af7814a9037bbc7043da9a96137fbee067bbe0e
Reviewed-on: https://go-review.googlesource.com/22247 Reviewed-by: Keith Randall <khr@golang.org>
Michael Munday [Fri, 15 Apr 2016 20:56:37 +0000 (16:56 -0400)]
crypto/aes: de-couple asm and go implementations
There is currently only one assembly implementation of AES
(amd64). While it is possible to fit other implementations to the
same pattern it complicates the code. For example s390x does not
use expanded keys, so having enc and dec in the aesCipher struct
is confusing.
By separating out the asm implementations we can more closely
match the data structures to the underlying implementation. This
also opens the door for AES implementations that support block
cipher modes other than GCM (e.g. CTR and CBC).
This commit changes BenchmarkExpandKey to test the go
implementation of key expansion. It might be better to have some
sort of 'initialisation' benchmark instead to cover the startup
costs of the assembly implementations (which might be doing
key expansion in a different way, or not at all).
CL 21891 was too clever in its attempts to avoid spills.
Storing newlen too early caused uses of append in the runtime
itself to receive an inconsistent view of a slice,
leading to corruption.
This CL makes the generate code much more similar to
the old backend. It spills more than before,
but those spills have been contained to the grow path.
It recalculates newlen unnecessarily on the fast path,
but that's measurably cheaper than spilling it.
CL 21891 caused runtime failures in 6 of 2000 runs
of net/http and crypto/x509 in my test setup.
This CL has gone 6000 runs without a failure.
Robert Griesemer [Fri, 26 Feb 2016 23:52:13 +0000 (15:52 -0800)]
spec: refine rules about terminating statements
Per a suggestion from mdempsky.
Both gc and gccgo consider a statement list as terminating if the
last _non_empty_ statement is terminating; i.e., trailing semis are
ok. Only gotype followed the current stricter rule in the spec.
This change adjusts the spec to match gc and gccgo behavior. In
support of this change, the spec has a matching rule for fallthrough,
which in valid positions may be followed by trailing semis as well.
For details and examples, see the issue below.
Fixes #14422.
Change-Id: Ie17c282e216fc40ecb54623445c17be111e17ade
Reviewed-on: https://go-review.googlesource.com/19981 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
Michael Munday [Fri, 15 Apr 2016 19:20:24 +0000 (15:20 -0400)]
crypto/aes: delete TestEncryptBlock and TestDecryptBlock
The encryptBlock and decryptBlock functions are already tested
(via the public API) by TestCipherEncrypt and TestCipherDecrypt
respectively. Both sets of tests check the output of the two
functions against the same set of FIPS 197 examples. I therefore
think it is safe to delete these two tests without losing any
coverage.
Deleting these two tests will make it easier to modify the
internal API, which I am hoping to do in future CLs.
Change-Id: I0dd568bc19f47b70ab09699b507833e527d39ba7
Reviewed-on: https://go-review.googlesource.com/22115 Reviewed-by: Adam Langley <agl@golang.org>
Run-TryBot: Adam Langley <agl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Mikio Hara [Sun, 28 Feb 2016 00:37:36 +0000 (09:37 +0900)]
net: add support for Zone of IPNet
This change adds Zone field to IPNet structure for making it possible to
determine which network interface is associated with IPv6 link-local
address. Also makes ParseCIDR and IPNet.String capable handling literal
IPv6 address prefixes with zone identifier.
Fixes #14518.
Change-Id: I8f8a40d3b4f500ffef25728d4995651379d8408a
Reviewed-on: https://go-review.googlesource.com/19946 Reviewed-by: Ian Lance Taylor <iant@golang.org>
David du Colombier [Tue, 19 Apr 2016 01:19:12 +0000 (03:19 +0200)]
net: enable TestDialParallel, TestDialerFallbackDelay and TestDialCancel on Plan 9
TestDialParallel, TestDialerFallbackDelay and TestDialCancel
require dialTCP to support cancellation, which has been
implemented for Plan 9 in CL 22144.
Updates #11225.
Updates #11932.
Change-Id: I3b30a645ef79227dfa519cde8d46c67b72f2485c
Reviewed-on: https://go-review.googlesource.com/22203
Run-TryBot: David du Colombier <0intro@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>