* Make budget an int32 to avoid needless conversions.
* Introduce some temporary variables to reduce repetition.
* If ... args are present, they will be the last argument
to the function. No need to scan all arguments.
Note that this is only safe because
the compiler generates multiple distinct
gc.Types. If we switch to having canonical
gc.Types, then this will need to be updated
to handle the case in which the user uses both
map[T]S and also map[[8]T]S. In that case,
the runtime needs algs for [8]T, but this could
mark the sole [8]T type as Noalg. This is a general
problem with having a single bool to represent
whether alg generation is needed for a type.
Cuts 5k off cmd/go and 22k off golang.org/x/tools/cmd/godoc,
approx 0.04% and 0.12% respectively.
Note that this is only safe because
the compiler generates multiple distinct
gc.Types. If we switch to having canonical
gc.Types, then this will need to be updated
to handle the case in which the user uses both
map[[n]T]S and also calls a function f(...T) with n arguments.
In that case, the runtime needs algs for [n]T, but this could
mark the sole [n]T type as Noalg. This is a general
problem with having a single bool to represent
whether alg generation is needed for a type.
Cuts 17k off cmd/go and 13k off golang.org/x/tools/cmd/godoc,
approx 0.14% and 0.07% respectively.
Keith Randall [Sun, 24 Apr 2016 05:59:01 +0000 (22:59 -0700)]
cmd/compile: reorder how slicelit initializes a slice
func f(x, y, z *int) {
a := []*int{x,y,z}
...
}
We used to use:
var tmp [3]*int
a := tmp[:]
a[0] = x
a[1] = y
a[2] = z
Now we do:
var tmp [3]*int
tmp[0] = x
tmp[1] = y
tmp[2] = z
a := tmp[:]
Doesn't sound like a big deal, but the compiler has trouble
eliminating write barriers when using the former method because it
doesn't know that the slice points to the stack. In the latter
method, the compiler knows the array is on the stack and as a result
doesn't emit any write barriers.
This turns out to be extremely common when building ... args, like
for calls fmt.Printf.
Makes go binaries ~1% smaller.
Doesn't have a measurable effect on the go1 fmt benchmarks,
unfortunately.
runtime/trace: test detection of broken timestamps
On some processors cputicks (used to generate trace timestamps)
produce non-monotonic timestamps. It is important that the parser
distinguishes logically inconsistent traces (e.g. missing, excessive
or misordered events) from broken timestamps. The former is a bug
in tracer, the latter is a machine issue.
Test that (1) parser does not return a logical error in case of
broken timestamps and (2) broken timestamps are eventually detected
and reported.
Alex Brainman [Thu, 21 Apr 2016 06:51:36 +0000 (16:51 +1000)]
debug/pe: introduce File.COFFSymbols and (*COFFSymbol).FullName
Reloc.SymbolTableIndex is an index into symbol table. But
Reloc.SymbolTableIndex cannot be used as index into File.Symbols,
because File.Symbols slice has Aux lines removed as it is built.
We cannot change the way File.Symbols works, so I propose we
introduce new File.COFFSymbols that does not have that limitation.
Also unlike File.Symbols, File.COFFSymbols will consist of
COFFSymbol. COFFSymbol matches PE COFF specification exactly,
and it is simpler to use.
Updates #15345
Change-Id: Icbc265853a472529cd6d64a76427b27e5459e373
Reviewed-on: https://go-review.googlesource.com/22336 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Keith Randall [Fri, 22 Apr 2016 20:09:18 +0000 (13:09 -0700)]
cmd/compile: get rid of most byte and word insns for amd64
Now that we're using 32-bit ops for 8/16-bit logical operations
(to avoid partial register stalls), there's really no need to
keep track of the 8/16-bit ops at all. Convert everything we
can to 32-bit ops.
This CL is the obvious stuff. I might think a bit more about
whether we can get rid of weirder stuff like HMULWU.
The only downside to this CL is that we lose some information
about constants. If we had source like:
var a byte = ...
a += 128
a += 128
We will convert that to a += 256, when we could get rid of the
add altogether. This seems like a fairly unusual scenario and
I'm happy with forgoing that optimization.
Keith Randall [Wed, 20 Apr 2016 22:02:48 +0000 (15:02 -0700)]
cmd/compile: combine stores into larger widths
Combine stores into larger widths when it is safe to do so.
Add clobber() function so stray dead uses do not impede the
above rewrites.
Fix bug in loads where all intermediate values depending on
a small load (not just the load itself) must have no other uses.
We really need the small load to be dead after the rewrite..
runtime: use per-goroutine sequence numbers in tracer
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.
If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.
On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)
Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.
Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.
Matthew Dempsky [Fri, 22 Apr 2016 19:27:29 +0000 (12:27 -0700)]
cmd/compile: replace Ctype switches with type switches
Instead of switching on Ctype (which internally uses a type switch)
and then scattering lots of type assertions throughout the CTFOO case
clauses, just use type switches directly on the underlying constant
value.
Ian Lance Taylor [Fri, 22 Apr 2016 17:17:06 +0000 (10:17 -0700)]
cmd/dist: skip misc/cgo/test with internal linking on ppc64le
CL 22372 changed ppc64le to use normal cgo initialization on ppc64le.
Doing this uncovered a cmd/link error using internal linking.
Opened issue 15409 for the problem. This CL disables the test.
Update #15409.
Change-Id: Ia1bb6b874c1b5a4df1a0436c8841c145142c30f7
Reviewed-on: https://go-review.googlesource.com/22379
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
David Chase [Fri, 22 Apr 2016 16:15:08 +0000 (12:15 -0400)]
cmd/compile: in a Tarjan algorithm, DFS should really be DFS
Replaced incorrect recursion-free rendering of DFS with
something that was correct. Enhanced test with all
permutations of IF successors to ensure that all possible
DFS traversals are exercised.
Test is improved version of
https://go-review.googlesource.com/#/c/22334
Update 15084.
Change-Id: I6e944c41244e47fe5f568dfc2b360ff93b94079e
Reviewed-on: https://go-review.googlesource.com/22347 Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Chris Zou [Mon, 18 Apr 2016 23:30:17 +0000 (19:30 -0400)]
hash/crc32: use vector instructions on s390x
The input buffer is aligned to a doubleword boundary to
improve performance of the vector instructions. The pure
Go implementation is used to align the input data, and is
also used when the vector instructions are not available
or the data length is less than 64 bytes.
Change-Id: Ie259a5f2f1562bcc17961c99e5776c99091d6bed
Reviewed-on: https://go-review.googlesource.com/22201 Reviewed-by: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bill O'Farrell <billotosyr@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I0adbb95685e28be92e8548741df0e11daa0a9b5f
Reviewed-on: https://go-review.googlesource.com/21777 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Rob Pike [Thu, 21 Apr 2016 19:43:22 +0000 (12:43 -0700)]
encoding/gob: document compatibility
Fixes #13808.
Change-Id: Ifbd5644da995a812438a405485c9e08b4503a313
Reviewed-on: https://go-review.googlesource.com/22352 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Rob Pike [Thu, 21 Apr 2016 21:53:19 +0000 (14:53 -0700)]
time: print zero duration as 0s, not 0
There should be a unit, and s is the SI unit name, so use that.
The other obvious possibility is ns (nanosecond), but the fact
that durations are measured in nanoseconds is an internal detail.
Fixes #14058.
Change-Id: Id1f8f3c77088224d9f7cd643778713d5cc3be5d9
Reviewed-on: https://go-review.googlesource.com/22357 Reviewed-by: Robert Griesemer <gri@golang.org>
Matthew Dempsky [Mon, 18 Apr 2016 21:02:08 +0000 (14:02 -0700)]
cmd/compile: split TSLICE into separate Type kind
Instead of using TARRAY for both arrays and slices, create a new
TSLICE kind to handle slices.
Also, get rid of the "DDDArray" distinction. While kinda ugly, it
seems likely we'll need to defer evaluating the constant bounds
expressions for golang.org/issue/13890.
Austin Clements [Wed, 30 Mar 2016 21:15:15 +0000 (17:15 -0400)]
runtime: eliminate floating garbage estimate
Currently when we compute the trigger for the next GC, we do it based
on an estimate of the reachable heap size at the start of the GC
cycle, which is itself based on an estimate of the floating garbage.
This was introduced by 4655aad to fix a bad feedback loop that allowed
the heap to grow to many times the true reachable size.
However, this estimate gets easily confused by rapidly allocating
applications, and, worse it's different than the heap size the trigger
controller uses to compute the trigger itself. This results in the
trigger controller often thinking that GC finished before it started.
Since this would be a pretty great outcome from it's perspective, it
sets the trigger for the next cycle as close to the next goal as
possible (which is limited to 95% of the goal).
Furthermore, the bad feedback loop this estimate originally fixed
seems not to happen any more, suggesting it was fixed more correctly
by some other change in the mean time. Finally, with the change to
allocate black, it shouldn't even be theoretically possible for this
bad feedback loop to occur.
Hence, eliminate the floating garbage estimate and simply consider the
reachable heap to be the marked heap. This harms overall throughput
slightly for allocation-heavy benchmarks, but significantly improves
mutator availability.
Fixes #12204. This brings the average trigger in this benchmark from
0.95 (the cap) to 0.7 and the active GC utilization from ~90% to ~45%.
Updates #14951. This makes the trigger controller much better behaved,
so it pulls the trigger lower if assists are consuming a lot of CPU
like it's supposed to, increasing mutator availability.
name old time/op new time/op delta
XBenchGarbage-12 2.21ms ± 1% 2.28ms ± 3% +3.29% (p=0.000 n=17+17)
Some of this slow down we paid for in earlier commits. Relative to the
start of the series to switch to allocate-black (the parent of "count
black allocations toward scan work"), the garbage benchmark is 2.62%
slower.
Austin Clements [Wed, 30 Mar 2016 21:02:23 +0000 (17:02 -0400)]
runtime: allocate black during GC
Currently we allocate white for most of concurrent marking. This is
based on the classical argument that it produces less floating
garbage, since allocations during GC may not get linked into the heap
and allocating white lets us reclaim these. However, it's not clear
how often this actually happens, especially since our write barrier
shades any pointer as soon as it's installed in the heap regardless of
the color of the slot.
On the other hand, allocating black has several advantages that seem
to significantly outweigh this downside.
1) It naturally bounds the total scan work to the live heap size at
the start of a GC cycle. Allocating white does not, and thus depends
entirely on assists to prevent the heap from growing faster than it
can be scanned.
2) It reduces the total amount of scan work per GC cycle by the size
of newly allocated objects that are linked into the heap graph, since
objects allocated black never need to be scanned.
3) It reduces total write barrier work since more objects will already
be black when they are linked into the heap graph.
This gives a slight overall improvement in benchmarks.
name old time/op new time/op delta
XBenchGarbage-12 2.24ms ± 0% 2.21ms ± 1% -1.32% (p=0.000 n=18+17)
Currently allocating black switches to the system stack (which is
probably a historical accident) and atomically updates the global
bytes marked stat. Since we're about to depend on this much more,
optimize it a bit by putting it back on the regular stack and updating
the per-P bytes marked stat, which gets lazily folded into the global
bytes marked stat.
Currently we count black allocations toward the scannable heap size,
but not toward the scan work we've done so far. This is clearly
inconsistent (we have, in effect, scanned these allocations and since
they're already black, we're not going to scan them again). Worse, it
means we don't count black allocations toward the scannable heap size
as of the *next* GC because this is based on the amount of scan work
we did in this cycle.
Fix this by counting black allocations as scan work. Currently the GC
spends very little time in allocate-black mode, so this probably
hasn't been a problem, but this will become important when we switch
to always allocating black.
Marcel van Lohuizen [Fri, 29 Jan 2016 15:16:03 +0000 (16:16 +0100)]
testing: add matching of subtest
Allows passing regexps per subtest to --test.run and --test.bench
Note that the documentation explicitly states that the split regular
expressions match the correpsonding parts (path components) of
the bench/test identifier. This is intended and slightly different
from the i'th RE matching the subtest/subbench at the respective
level. Picking this semantics allows guaranteeing that a test or
benchmark identifier as printed by go test can be passed verbatim
(possibly quoted) to, respectively, -run or -bench: subtests and
subbenches might have a '/' in their name, causing a misaligment if
their ID is passed to -run or -bench as is.
This semantics has other benefits, but this is the main motivation.
cmd/compile/internal/gc: fix return value offset for SSA backend on ARM
Progress on SSA backend for ARM. Still not complete. It compiles a
Fibonacci function, but the caller picked the return value from an
incorrect offset. This CL adjusts it to match the stack frame layout
for architectures with link register.
Alex Brainman [Thu, 21 Apr 2016 01:44:05 +0000 (11:44 +1000)]
debug/pe: introduce Section.Relocs
cmd/link reads PE object files when building programs with cgo.
cmd/link accesses object relocations. Add new Section.Relocs that
provides similar functionality in debug/pe.
Robert Griesemer [Wed, 20 Apr 2016 20:40:55 +0000 (13:40 -0700)]
math/big: more tests, documentation for Flot gob marshalling
Follow-up to https://golang.org/cl/21755.
This turned out to be a bit more than just a few nits
as originally expected in that CL.
1) The actual mantissa may be shorter than required for the
given precision (because of trailing 0's): no need to
allocate space for it (and transmit 0's). This can save
a lot of space when the precision is high: E.g., for
prec == 1000, 16 words or 128 bytes are required at the
most, but if the actual number is short, it may be much
less (for the test cases present, it's significantly less).
2) The actual mantissa may be longer than the number of
words required for the given precision: make sure to
not overflow when encoding in bytes.
3) Add more documentation.
4) Add more tests.
Change-Id: I9f40c408cfdd9183a8e81076d2f7d6c75e7a00e9
Reviewed-on: https://go-review.googlesource.com/22324 Reviewed-by: Alan Donovan <adonovan@google.com>
Keith Randall [Tue, 19 Apr 2016 15:31:04 +0000 (08:31 -0700)]
cmd/compile: change the way we handle large map values
mapaccess{1,2} returns a pointer to the value. When the key
is not in the map, it returns a pointer to zeroed memory.
Currently, for large map values we have a complicated scheme which
dynamically allocates zeroed memory for this purpose. It is ugly
code and requires an atomic.Load in a bunch of places we'd rather
not have it.
Switch to a scheme where callsites of mapaccess{1,2} which expect
large return values pass in a pointer to zeroed memory that
mapaccess can return if the key is not found. This avoids the
atomic.Load on all map accesses with a few extra instructions only
for the large value acccesses, plus a bit of bss space.
There was a time (1.4 & 1.5?) where we did something like this but
all the tricks to make the right size zero value were done by the
linker. That scheme broke in the presence of dyamic linking.
The scheme in this CL works even when dynamic linking.
Fixes #12337
Change-Id: Ic2d0319944af33bbb59785938d9ab80958d1b4b1
Reviewed-on: https://go-review.googlesource.com/22221
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Michael Munday [Fri, 15 Apr 2016 22:45:17 +0000 (18:45 -0400)]
crypto/aes: add s390x assembly implementation
Adds support for single block encryption using the cipher message
(KM) instruction. KM handles key expansion internally and
therefore it is not done up front when using the assembly
implementation on s390x.
Change-Id: I69954b8ae36d549e1dc40d7acd5a10bedfaaef9c
Reviewed-on: https://go-review.googlesource.com/22194
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bill O'Farrell <billotosyr@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Alex Brainman [Mon, 18 Apr 2016 06:12:48 +0000 (16:12 +1000)]
debug/pe: introduce StringTable type
PE specification requires that long section and symbol names
are stored in PE string table. Introduce StringTable that
implements this functionality. Only string table reading is
implemented.
Keith Randall [Tue, 19 Apr 2016 22:38:59 +0000 (15:38 -0700)]
cmd/compile,runtime: pass elem type to {make,grow}slice
No point in passing the slice type to these functions.
All they need is the element type. One less indirection,
maybe a few less []T type descriptors in the binary.
There's no need for Eiota, Eindir, Eaddr, or Eproc; the values are
threaded through to denote various typechecking contexts, but they
don't actually influence typechecking behavior at all.
Also, while here, switch the Efoo const declarations to use iota.
Julia Hansbrough [Mon, 18 Apr 2016 22:53:29 +0000 (15:53 -0700)]
runtime: updated SIGSYS to cause a panic + stacktrace
On GNU/Linux, SIGSYS is specified to cause the process to terminate
without a core dump. In https://codereview.appspot.com/3749041 , it
appears that Golang accidentally introduced incorrect behavior for
this signal, which caused Golang processes to keep running after
receiving SIGSYS. This change reverts it to the old/correct behavior.
Updates #15204
Change-Id: I3aa48a9499c1bc36fa5d3f40c088fdd7599e0db5
Reviewed-on: https://go-review.googlesource.com/22202 Reviewed-by: Ian Lance Taylor <iant@golang.org>