Currently we always zero objects when we allocate them. We used to
have an optimization that would not zero objects that had not been
allocated since the whole span was last zeroed (either by getting it
from the system or by getting it from the heap, which does a bulk
zero), but this depended on the sweeper clobbering the first two words
of each object. Hence, we lost this optimization when the bitmap
sweeper went away.
Re-introduce this optimization using a different mechanism. Each span
already keeps a flag indicating that it just came from the OS or was
just bulk zeroed by the mheap. We can simply use this flag to know
when we don't need to zero an object. This is slightly less efficient
than the old optimization: if a span gets allocated and partially
used, then GC happens and the span gets returned to the mcentral, then
the span gets re-acquired, the old optimization knew that it only had
to re-zero the objects that had been reclaimed, whereas this
optimization will re-zero everything. However, in this case, you're
already paying for the garbage collection, and you've only wasted one
zeroing of the span, so in practice there seems to be little
difference. (If we did want to revive the full optimization, each span
could keep track of a frontier beyond which all free slots are zeroed.
I prototyped this and it didn't obvious do any better than the much
simpler approach in this commit.)
This significantly improves BinaryTree17, which is allocation-heavy
(and runs first, so most pages are already zeroed), and slightly
improves everything else.
name old time/op new time/op delta
XBenchGarbage-12 2.15ms ± 1% 2.14ms ± 1% -0.80% (p=0.000 n=17+17)
[dev.garbage] runtime: use s.base() everywhere it makes sense
Currently we have lots of (s.start << _PageShift) and variants. We now
have an s.base() function that returns this. It's faster and more
readable, so use it.
Rick Hudson [Thu, 31 Mar 2016 14:45:36 +0000 (10:45 -0400)]
[dev.garbage] runtime: use sys.Ctz64 intrinsic
Our compilers now provides instrinsics including
sys.Ctz64 that support CTZ (count trailing zero)
instructions. This CL replaces the Go versions
of CTZ with the compiler intrinsic.
Count trailing zeros CTZ finds the least
significant 1 in a word and returns the number
of less significant 0s in the word.
Allocation uses the bitmap created by the garbage
collector to locate an unmarked object. The logic
takes a word of the bitmap, complements, and then
caches it. It then uses CTZ to locate an available
unmarked object. It then shifts marked bits out of
the bitmap word preparing it for the next search.
Once all the unmarked objects are used in the
cached work the bitmap gets another word and
repeats the process.
Rick Hudson [Mon, 14 Mar 2016 16:17:48 +0000 (12:17 -0400)]
[dev.garbage] runtime: restructure alloc and mark bits
Two changes are included here that are dependent on the other.
The first is that allocBits and gcamrkBits are changed to
a *uint8 which points to the first byte of that span's
mark and alloc bits. Several places were altered to
perform pointer arithmetic to locate the byte corresponding
to an object in the span. The actual bit corresponding
to an object is indexed in the byte by using the lower three
bits of the objects index.
The second change avoids the redundant calculation of an
object's index. The index is returned from heapBitsForObject
and then used by the functions indexing allocBits
and gcmarkBits.
Finally we no longer allocate the gc bits in the span
structures. Instead we use an arena based allocation scheme
that allows for a more compact bit map as well as recycling
and bulk clearing of the mark bits.
Rick Hudson [Mon, 14 Mar 2016 16:17:48 +0000 (12:17 -0400)]
[dev.garbage] runtime: add gc work buffer tryGet and put fast paths
The complexity of the GC work buffers put and tryGet
prevented them from being inlined. This CL simplifies
the fast path thus enabling inlining. If the fast
path does not succeed the previous put and tryGet
functions are called.
Rick Hudson [Mon, 14 Mar 2016 16:02:02 +0000 (12:02 -0400)]
[dev.garbage] runtime: cleanup and optimize span.base()
Prior to this CL the base of a span was calculated in various
places using shifts or calls to base(). This CL now
always calls base() which has been optimized to calculate the
base of the span when the span is initialized and store that
value in the span structure.
Rick Hudson [Wed, 2 Mar 2016 17:15:02 +0000 (12:15 -0500)]
[dev.garbage] runtime: remove heapBitsSweepSpan
Prior to this CL the sweep phase was responsible for locating
all objects that were about to be freed and calling a function
to process the object. This was done by the function
heapBitsSweepSpan. Part of processing included calls to
tracefree and msanfree as well as counting how many objects
were freed.
The calls to tracefree and msanfree have been moved into the
gcmalloc routine and called when the object is about to be
reallocated. The counting of free objects has been optimized
using an array based popcnt algorithm and if all the objects
in a span are free then span is freed.
Similarly the code to locate the next free object has been
optimized to use an array based ctz (count trailing zero).
Various hot paths in the allocation logic have been optimized.
At this point the garbage benchmark is within 3% of the 1.6
release.
Rick Hudson [Wed, 24 Feb 2016 19:36:30 +0000 (14:36 -0500)]
[dev.garbage] runtime: add bit and cache ctz64 (count trailing zero)
Add to each span a 64 bit cache (allocCache) of the allocBits
at freeindex. allocCache is shifted such that the lowest bit
corresponds to the bit freeindex. allocBits uses a 0 to
indicate an object is free, on the other hand allocCache
uses a 1 to indicate an object is free. This facilitates
ctz64 (count trailing zero) which counts the number of 0s
trailing the least significant 1. This is also the index of
the least significant 1.
Each span maintains a freeindex indicating the boundary
between allocated objects and unallocated objects. allocCache
is shifted as freeindex is incremented such that the low bit
in allocCache corresponds to the bit a freeindex in the
allocBits array.
Currently ctz64 is written in Go using a for loop so it is
not very efficient. Use of the hardware instruction will
follow. With this in mind comparisons of the garbage
benchmark are as follows.
Rick Hudson [Wed, 17 Feb 2016 16:27:52 +0000 (11:27 -0500)]
[dev.garbage] runtime: logic that uses count trailing zero (ctz)
Most (all?) processors that Go supports supply a hardware
instruction that takes a byte and returns the number
of zeros trailing the first 1 encountered, or 8
if no ones are found. This is the index within the
byte of the first 1 encountered. CTZ should improve the
performance of the nextFreeIndex function.
Since nextFreeIndex wants the next unmarked (0) bit
a bit-wise complement is needed before calling ctz.
Furthermore unmarked bits associated with previously
allocated objects need to be ignored. Instead of writing
a 1 as we allocate the code masks all bits less than the
freeindex after loading the byte.
While this CL does not actual execute a CTZ instruction
it supplies a ctz function with the appropiate signature
along with the logic to execute it.
Rick Hudson [Tue, 16 Feb 2016 22:16:43 +0000 (17:16 -0500)]
[dev.garbage] runtime: replace ref with allocCount
This is a renaming of the field ref to the
more appropriate allocCount. The field
holds the number of objects in the span
that are currently allocated. Some throws
strings were adjusted to more accurately
convey the meaning of allocCount.
Rick Hudson [Thu, 11 Feb 2016 18:57:58 +0000 (13:57 -0500)]
[dev.garbage] runtime: allocate directly from GC mark bits
Instead of building a freelist from the mark bits generated
by the GC this CL allocates directly from the mark bits.
The approach moves the mark bits from the pointer/no pointer
heap structures into their own per span data structures. The
mark/allocation vectors consist of a single mark bit per
object. Two vectors are maintained, one for allocation and
one for the GC's mark phase. During the GC cycle's sweep
phase the interpretation of the vectors is swapped. The
mark vector becomes the allocation vector and the old
allocation vector is cleared and becomes the mark vector that
the next GC cycle will use.
Marked entries in the allocation vector indicate that the
object is not free. Each allocation vector maintains a boundary
between areas of the span already allocated from and areas
not yet allocated from. As objects are allocated this boundary
is moved until it reaches the end of the span. At this point
further allocations will be done from another span.
Since we no longer sweep a span inspecting each freed object
the responsibility for maintaining pointer/scalar bits in
the heapBitMap containing is now the responsibility of the
the routines doing the actual allocation.
This CL is functionally complete and ready for performance
tuning.
The gcmarkBits is a bit vector used by the GC to mark
reachable objects. Once a GC cycle is complete the gcmarkBits
swap places with the allocBits. allocBits is then used directly
by malloc to locate free objects, thus avoiding the
construction of a linked free list. This CL introduces a set
of helper functions for manipulating gcmarkBits and allocBits
that will be used by later CLs to realize the actual
algorithm. Minimal attempts have been made to optimize these
helper routines.
Rick Hudson [Mon, 8 Feb 2016 14:53:14 +0000 (09:53 -0500)]
[dev.garbage] runtime: add stackfreelist
The freelist for normal objects and the freelist
for stacks share the same mspan field for holding
the list head but are operated on by different code
sequences. This overloading complicates the use of bit
vectors for allocation of normal objects. This change
refactors the use of the stackfreelist out from the
use of freelist.
Rick Hudson [Thu, 4 Feb 2016 16:41:48 +0000 (11:41 -0500)]
[dev.garbage] runtime: bitmap allocation data structs
The bitmap allocation data structure prototypes. Before
this is released these underlying data structures need
to be more performant but the signatures of helper
functions utilizing these structures will remain stable.
Austin Clements [Fri, 18 Mar 2016 15:27:59 +0000 (11:27 -0400)]
runtime: don't rescan globals
Currently the runtime rescans globals during mark 2 and mark
termination. This costs as much as 500µs/MB in STW time, which is
enough to surpass the 10ms STW limit with only 20MB of globals.
It's also basically unnecessary. The compiler already generates write
barriers for global -> heap pointer updates and the regular write
barrier doesn't check whether the slot is a global or in the heap.
Some less common write barriers do cause problems.
heapBitsBulkBarrier, which is used by typedmemmove and related
functions, currently depends on having access to the pointer bitmap
and as a result ignores writes to globals. Likewise, the
reflect-related write barriers reflect_typedmemmovepartial and
callwritebarrier ignore non-heap destinations; though it appears they
can never be called with global pointers anyway.
This commit makes heapBitsBulkBarrier issue write barriers for writes
to global pointers using the data and BSS pointer bitmaps, removes the
inheap checks from the reflection write barriers, and eliminates the
rescans during mark 2 and mark termination. It also adds a test that
writes to globals have write barriers.
Programs with large data+BSS segments (with pointers) aren't common,
but for programs that do have large data+BSS segments, this
significantly reduces pause time:
name \ 95%ile-time/markTerm old new delta
LargeBSS/bss:1GB/gomaxprocs:4 148200µs ± 6% 302µs ±52% -99.80% (p=0.008 n=5+5)
David Crawshaw [Wed, 27 Apr 2016 17:10:49 +0000 (13:10 -0400)]
reflect: fix strings of SliceOf-created types
The new type was inheriting the tflagExtraStar from its prototype.
Fixes #15467
Change-Id: Ic22c2a55cee7580cb59228d52b97e1c0a1e60220
Reviewed-on: https://go-review.googlesource.com/22501 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
David Crawshaw [Wed, 27 Apr 2016 16:49:27 +0000 (12:49 -0400)]
reflect: unnamed interface types have no name
Fixes #15468
Change-Id: I8723171f87774a98d5e80e7832ebb96dd1fbea74
Reviewed-on: https://go-review.googlesource.com/22524 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Robert Griesemer [Fri, 15 Apr 2016 21:14:04 +0000 (14:14 -0700)]
cmd/compile: switch to compact export format by default
builtin.go was auto-generated via go generate; all other
changes were manual.
The new format reduces the export data size by ~65% on average
for the std library packages (and there is still quite a bit of
room for improvement).
The average time to write export data is reduced by (at least)
62% as measured in one run over the std lib, it is likely more.
The average time to read import data is reduced by (at least)
37% as measured in one run over the std lib, it is likely more.
There is also room to improve this time.
The compiler transparently handles both packages using the old
and the new format.
Comparing the -S output of the go build for each package via
the cmp.bash script (added) shows identical assembly code for
all packages, but 6 files show file:line differences:
The following files have differences because they use cgo
and cgo uses different temp. directories for different builds.
Harmless.
The following files have file:line differences that are not yet
fully explained; however the differences exist w/ and w/o new export
format (pre-existing condition). See issue #15453.
In summary, switching to the new export format produces the same
package files as before for all practical purposes.
How can you tell which one you have (if you care): Open a package
(.a) file in an editor. Textual export data starts with a $$ after
the header and is more or less legible; binary export data starts
with a $$B after the header and is mostly unreadable. A stand-alone
decoder (for debugging) is in the works.
In case of a problem, please first try reverting back to the old
textual format to determine if the cause is the new export format:
For a stand-alone compiler invocation:
- go tool compile -newexport=0 <files>
For a single package:
- go build -gcflags="-newexport=0" <pkg>
For make/all.bash:
- (export GO_GCFLAGS="-newexport=0"; sh make.bash)
Fixes #13241.
Change-Id: I2588cb463be80af22446bf80c225e92ab79878b8
Reviewed-on: https://go-review.googlesource.com/22123 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
cmd/link: remove absolute address for c-archive on darwin/arm
Now it is possible to build a c-archive as PIC on darwin/arm (this is
now the default). Then the system linker can link the binary using
the archive as PIE.
Fixes #12896.
Change-Id: Iad84131572422190f5fa036e7d71910dc155f155
Reviewed-on: https://go-review.googlesource.com/22461 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Robert Griesemer [Wed, 27 Apr 2016 05:31:02 +0000 (22:31 -0700)]
cmd/compile: don't write pos info for builtin packages
TestBuiltin will fail if run on Windows and builtin.go was generated
on a non-Windows machine (or vice versa) because path names have
different separators. Avoid problem altogether by not writing pos
info for builtin packages. It's not needed.
Affects -newexport only.
Change-Id: I8944f343452faebaea9a08b5fb62829bed77c148
Reviewed-on: https://go-review.googlesource.com/22498
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Dmitry Vyukov [Fri, 26 Feb 2016 12:02:42 +0000 (13:02 +0100)]
runtime/race: improve TestNoRaceIOHttp test
TestNoRaceIOHttp does all kinds of bad things:
1. Binds to a fixed port, so concurrent tests fail.
2. Registers HTTP handler multiple times, so repeated tests fail.
3. Relies on sleep to wait for listen.
net/http: remove idle transport connections from Transport when server closes
Previously the Transport would cache idle connections from the
Transport for later reuse, but if a peer server disconnected
(e.g. idle timeout), we would not proactively remove the *persistConn
from the Transport's idle list, leading to a waste of memory
(potentially forever).
Instead, when the persistConn's readLoop terminates, remote it from
the idle list, if present.
This also adds the beginning of accounting for the total number of
idle connections, which will be needed for Transport.MaxIdleConns
later.
Updates #15461
Change-Id: Iab091f180f8dd1ee0d78f34b9705d68743b5557b
Reviewed-on: https://go-review.googlesource.com/22492 Reviewed-by: Andrew Gerrand <adg@golang.org>
cmd/go: add Package.StaleReason for debugging with go list
It comes up every few months that we can't understand why
the go command is rebuilding some package.
Add diagnostics so that the go command can explain itself
if asked.
For #2775, #3506, #12074.
Change-Id: I1c73b492589b49886bf31a8f9d05514adbd6ed70
Reviewed-on: https://go-review.googlesource.com/22432 Reviewed-by: Rob Pike <r@golang.org>
Michael Munday [Mon, 25 Apr 2016 21:58:34 +0000 (17:58 -0400)]
crypto/sha256: add s390x assembly implementation
Renames block to blockGeneric so that it can be called when the
assembly feature check fails. This means making block a var on
platforms without an assembly implementation (similar to the sha1
package).
Also adds a test to check that the fallback path works correctly
when the feature check fails.
Austin Clements [Fri, 4 Mar 2016 16:58:26 +0000 (11:58 -0500)]
runtime: make stack re-scan O(# dirty stacks)
Currently the stack re-scan during mark termination is O(# stacks)
because we enqueue a root marking job for every goroutine. It takes
~34ns to process this root marking job for a valid (clean) stack, so
at around 300k goroutines we exceed the 10ms pause goal. A non-trivial
portion of this time is spent simply taking the cache miss to check
the gcscanvalid flag, so simply optimizing the path that handles clean
stacks can only improve this so much.
Fix this by keeping an explicit list of goroutines with dirty stacks
that need to be rescanned. When a goroutine first transitions to
running after a stack scan and marks its stack dirty, it adds itself
to this list. We enqueue root marking jobs only for the goroutines in
this list, so this improves stack re-scanning asymptotically by
completely eliminating time spent on clean goroutines.
This reduces mark termination time for 500k idle goroutines from 15ms
to 238µs. Overall performance effect is negligible.
name \ 95%ile-time/markTerm old new delta
IdleGs/gs:500000/gomaxprocs:12 15000µs ± 0% 238µs ± 5% -98.41% (p=0.000 n=10+10)
name old time/op new time/op delta
XBenchGarbage-12 2.30ms ± 3% 2.29ms ± 1% -0.43% (p=0.049 n=17+18)
Austin Clements [Fri, 11 Mar 2016 19:08:10 +0000 (14:08 -0500)]
runtime: don't clear gcscanvalid in casfrom_Gscanstatus
Currently we clear gcscanvalid in both casgstatus and
casfrom_Gscanstatus if the new status is _Grunning. This is very
important to do in casgstatus. However, this is potentially wrong in
casfrom_Gscanstatus because in this case the caller doesn't own gp and
hence the write is racy. Unlike the other _Gscan statuses, during
_Gscanrunning, the G is still running. This does not indicate that
it's transitioning into a running state. The scan simply hasn't
happened yet, so it's neither valid nor invalid.
Conveniently, this also means clearing gcscanvalid is unnecessary in
this case because the G was already in _Grunning, so we can simply
remove this code. What will happen instead is that the G will be
preempted to scan itself, that scan will set gcscanvalid to true, and
then the G will return to _Grunning via casgstatus, clearing
gcscanvalid.
This fix will become necessary shortly when we start keeping track of
the set of G's with dirty stacks, since it will no longer be
idempotent to simply set gcscanvalid to false.
Austin Clements [Wed, 2 Mar 2016 22:55:45 +0000 (17:55 -0500)]
runtime: remove stack barriers during sweep
This adds a best-effort pass to remove stack barriers immediately
after the end of mark termination. This isn't necessary for the Go
runtime, but should help external tools that perform stack walks but
aren't aware of Go's stack barriers such as GDB, perf, and VTune.
(Though clearly they'll still have trouble unwinding stacks during
mark.)
Austin Clements [Wed, 2 Mar 2016 19:52:08 +0000 (14:52 -0500)]
runtime: remove stack barriers during concurrent mark
Currently we remove stack barriers during STW mark termination, which
has a non-trivial per-goroutine cost and means that we have to touch
even clean stacks during mark termination. However, there's no problem
with leaving them in during the sweep phase. They just have to be out
by the time we install new stack barriers immediately prior to
scanning the stack such as during the mark phase of the next GC cycle
or during mark termination in a STW GC.
Hence, move the gcRemoveStackBarriers from STW mark termination to
just before we install new stack barriers during concurrent mark. This
removes the cost from STW. Furthermore, this combined with concurrent
stack shrinking means that the mark termination scan of a clean stack
is a complete no-op, which will make it possible to skip clean stacks
entirely during mark termination.
This has the downside that it will mess up anything outside of Go that
tries to walk Go stacks all the time instead of just some of the time.
This includes tools like GDB, perf, and VTune. We'll improve the
situation shortly.
Austin Clements [Fri, 11 Mar 2016 22:00:46 +0000 (17:00 -0500)]
runtime: free dead G stacks concurrently
Currently we free cached stacks of dead Gs during STW stack root
marking. We do this during STW because there's no way to take
ownership of a particular dead G, so attempting to free a dead G's
stack during concurrent stack root marking could race with reusing
that G.
However, we can do this concurrently if we take a completely different
approach. One way to prevent reuse of a dead G is to remove it from
the free G list. Hence, this adds a new fixed root marking task that
simply removes all Gs from the list of dead Gs with cached stacks,
frees their stacks, and then adds them to the list of dead Gs without
cached stacks.
This is also a necessary step toward rescanning only dirty stacks,
since it eliminates another task from STW stack marking.
Matthew Dempsky [Tue, 26 Apr 2016 17:55:32 +0000 (10:55 -0700)]
cmd/compile: lazily initialize litbuf
Instead of eagerly creating strings like "literal 2.01" for every
lexed number in case we need to mention it in an error message, defer
this work to (*parser).syntax_error.
Alan Donovan [Mon, 25 Apr 2016 22:31:36 +0000 (18:31 -0400)]
gc: use AbsFileLine for deterministic binary export data
This version of the file name honors the -trimprefix flag,
which strips off variable parts like $WORK or $PWD.
The TestCgoConsistentResults test now passes.
Change-Id: If93980b054f9b13582dd314f9d082c26eaac4f41
Reviewed-on: https://go-review.googlesource.com/22444 Reviewed-by: Robert Griesemer <gri@golang.org>
Michael Munday [Wed, 13 Apr 2016 17:34:41 +0000 (13:34 -0400)]
cmd/link: fix gdb backtrace on architectures using a link register
Also adds TestGdbBacktrace to the runtime package.
Dwarf modifications written by Bryan Chan (@bryanpkc) who is also
at IBM and covered by the same CLA.
Fixes #14628
Change-Id: I106a1f704c3745a31f29cdadb0032e3905829850
Reviewed-on: https://go-review.googlesource.com/20193 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
cmd/compile/internal/gc: rewrite comment to avoid automated meaning
The comment says 'DΟ NΟT SUBMIT', and that text being in a file can cause
automated errors or warnings when trying to check the Go sources into other
source control systems.
(We reject that string in CL commit messages, which I've avoided here
by changing the O's to Ο's above.)
Michael Munday [Mon, 25 Apr 2016 20:17:42 +0000 (16:17 -0400)]
crypto/sha512: add s390x assembly implementation
Renames block to blockGeneric so that it can be called when the
assembly feature check fails. This means making block a var on
platforms without an assembly implementation (similar to the sha1
package).
Also adds a test to check that the fallback path works correctly
when the feature check fails.
David Crawshaw [Tue, 26 Apr 2016 14:53:25 +0000 (10:53 -0400)]
cmd/link: correctly decode name length
The linker was incorrectly decoding type name lengths, causing
typelinks to be sorted out of order and in cases where the name was
the exact right length, linker panics.
Added a test to the reflect package that causes TestTypelinksSorted
to fail before this CL. It's not the exact failure seen in #15448
but it has the same cause: decodetype_name calculating the wrong
length.
The equivalent decoders in reflect/type.go and runtime/type.go
have the parenthesis in the right place.
Robert Griesemer [Mon, 25 Apr 2016 22:59:42 +0000 (15:59 -0700)]
cmd/compile: sort import strings for canonical obj files
This is not necessary for reproduceability but it removes
differences due to imported package order between compiles
using textual vs binary export format. The packages list
tends to be very short, so it's ok doing it always for now.
Guarded with a documented (const) flag so it's trivial to
disable and remove eventually.
Also, use the same flag now to enforce parameter numbering.
Change-Id: Ie05d2490df770239696ecbecc07532ed62ccd5c0
Reviewed-on: https://go-review.googlesource.com/22445
Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
The code sequence for large-offset floating-point stores
includes adding the base pointer to r11. Make sure we
can interpret that instruction correctly.
Fixes build.
Fixes #15440
Change-Id: I7fe5a4a57e08682967052bf77c54e0ec47fcb53e
Reviewed-on: https://go-review.googlesource.com/22440 Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Robert Griesemer [Mon, 25 Apr 2016 21:39:51 +0000 (14:39 -0700)]
cmd/compile: for now, keep parameter numbering in binary export format
The numbering is only required for parameters of functions/methods
with exported inlineable bodies. For now, always export parameter names
with internal numbering to minimize the diffs between assembly code
dumps of code compiled with the textual vs the binary format.
To be disabled again once the new export format is default.
Change-Id: I6d14c564e734cc5596c7e995d8851e06d5a35013
Reviewed-on: https://go-review.googlesource.com/22441 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Robert Griesemer [Fri, 22 Apr 2016 23:35:29 +0000 (16:35 -0700)]
spec: be more explicit about equivalence of empty string and absent field tags
Note that the spec already makes that point with a comment in the very first
example for struct field tags. This change is simply stating this explicitly
in the actual spec prose.
- gccgo and go/types already follow this rule
- the current reflect package API doesn't distinguish between absent tags
and empty tags (i.e., there is no discoverable difference)
As a nice side-effect, this allows us to
unify several code paths.
The terminology (low, high, max, simple slice expr,
full slice expr) is taken from the spec and
the examples in the spec.
This is a trial run. The plan, probably for Go 1.8,
is to change slice expressions to use Node.List
instead of OKEY, and to do some similar
tree structure changes for other ops.
Passes toolstash -cmp. No performance change.
all.bash passes with GO_GCFLAGS=-newexport.
Updates #15350
Change-Id: Ic1efdc36e79cdb95ae1636e9817a3ac8f83ab1ac
Reviewed-on: https://go-review.googlesource.com/22425 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
* Make budget an int32 to avoid needless conversions.
* Introduce some temporary variables to reduce repetition.
* If ... args are present, they will be the last argument
to the function. No need to scan all arguments.
Note that this is only safe because
the compiler generates multiple distinct
gc.Types. If we switch to having canonical
gc.Types, then this will need to be updated
to handle the case in which the user uses both
map[T]S and also map[[8]T]S. In that case,
the runtime needs algs for [8]T, but this could
mark the sole [8]T type as Noalg. This is a general
problem with having a single bool to represent
whether alg generation is needed for a type.
Cuts 5k off cmd/go and 22k off golang.org/x/tools/cmd/godoc,
approx 0.04% and 0.12% respectively.
Note that this is only safe because
the compiler generates multiple distinct
gc.Types. If we switch to having canonical
gc.Types, then this will need to be updated
to handle the case in which the user uses both
map[[n]T]S and also calls a function f(...T) with n arguments.
In that case, the runtime needs algs for [n]T, but this could
mark the sole [n]T type as Noalg. This is a general
problem with having a single bool to represent
whether alg generation is needed for a type.
Cuts 17k off cmd/go and 13k off golang.org/x/tools/cmd/godoc,
approx 0.14% and 0.07% respectively.
Keith Randall [Sun, 24 Apr 2016 05:59:01 +0000 (22:59 -0700)]
cmd/compile: reorder how slicelit initializes a slice
func f(x, y, z *int) {
a := []*int{x,y,z}
...
}
We used to use:
var tmp [3]*int
a := tmp[:]
a[0] = x
a[1] = y
a[2] = z
Now we do:
var tmp [3]*int
tmp[0] = x
tmp[1] = y
tmp[2] = z
a := tmp[:]
Doesn't sound like a big deal, but the compiler has trouble
eliminating write barriers when using the former method because it
doesn't know that the slice points to the stack. In the latter
method, the compiler knows the array is on the stack and as a result
doesn't emit any write barriers.
This turns out to be extremely common when building ... args, like
for calls fmt.Printf.
Makes go binaries ~1% smaller.
Doesn't have a measurable effect on the go1 fmt benchmarks,
unfortunately.
runtime/trace: test detection of broken timestamps
On some processors cputicks (used to generate trace timestamps)
produce non-monotonic timestamps. It is important that the parser
distinguishes logically inconsistent traces (e.g. missing, excessive
or misordered events) from broken timestamps. The former is a bug
in tracer, the latter is a machine issue.
Test that (1) parser does not return a logical error in case of
broken timestamps and (2) broken timestamps are eventually detected
and reported.
Alex Brainman [Thu, 21 Apr 2016 06:51:36 +0000 (16:51 +1000)]
debug/pe: introduce File.COFFSymbols and (*COFFSymbol).FullName
Reloc.SymbolTableIndex is an index into symbol table. But
Reloc.SymbolTableIndex cannot be used as index into File.Symbols,
because File.Symbols slice has Aux lines removed as it is built.
We cannot change the way File.Symbols works, so I propose we
introduce new File.COFFSymbols that does not have that limitation.
Also unlike File.Symbols, File.COFFSymbols will consist of
COFFSymbol. COFFSymbol matches PE COFF specification exactly,
and it is simpler to use.
Updates #15345
Change-Id: Icbc265853a472529cd6d64a76427b27e5459e373
Reviewed-on: https://go-review.googlesource.com/22336 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Keith Randall [Fri, 22 Apr 2016 20:09:18 +0000 (13:09 -0700)]
cmd/compile: get rid of most byte and word insns for amd64
Now that we're using 32-bit ops for 8/16-bit logical operations
(to avoid partial register stalls), there's really no need to
keep track of the 8/16-bit ops at all. Convert everything we
can to 32-bit ops.
This CL is the obvious stuff. I might think a bit more about
whether we can get rid of weirder stuff like HMULWU.
The only downside to this CL is that we lose some information
about constants. If we had source like:
var a byte = ...
a += 128
a += 128
We will convert that to a += 256, when we could get rid of the
add altogether. This seems like a fairly unusual scenario and
I'm happy with forgoing that optimization.
Keith Randall [Wed, 20 Apr 2016 22:02:48 +0000 (15:02 -0700)]
cmd/compile: combine stores into larger widths
Combine stores into larger widths when it is safe to do so.
Add clobber() function so stray dead uses do not impede the
above rewrites.
Fix bug in loads where all intermediate values depending on
a small load (not just the load itself) must have no other uses.
We really need the small load to be dead after the rewrite..
runtime: use per-goroutine sequence numbers in tracer
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.
If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.
On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)
Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.
Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.
Matthew Dempsky [Fri, 22 Apr 2016 19:27:29 +0000 (12:27 -0700)]
cmd/compile: replace Ctype switches with type switches
Instead of switching on Ctype (which internally uses a type switch)
and then scattering lots of type assertions throughout the CTFOO case
clauses, just use type switches directly on the underlying constant
value.