Ben Shi [Mon, 19 Feb 2018 13:13:13 +0000 (13:13 +0000)]
cmd/compile: optimize ARM64 code with MNEG
A pair of MUL/NEG instructions can be combined to a single MNEG on ARM64.
This CL implements this optimization.
1. A special test case gets big improvement.
(https://github.com/benshi001/ugo1/blob/master/mneg_test.go)
name old time/op new time/op delta
MNEG-4 315µs ± 0% 260µs ± 0% -17.39% (p=0.000 n=24+25)
Richard Miller [Mon, 19 Feb 2018 12:34:53 +0000 (12:34 +0000)]
syscall: ensure Mkdir(path) on Plan 9 fails if path exists
On Plan 9, the underlying create() syscall with DMDIR flag, which is
used to implement Mkdir, will fail silently if the path exists and
is not a directory. Work around this by checking for existence
first and rejecting Mkdir with error EEXIST if the path is found.
Fixes #23918
Change-Id: I439115662307923c9f498d3e7b1f32c6d205e1ad
Reviewed-on: https://go-review.googlesource.com/94777 Reviewed-by: David du Colombier <0intro@gmail.com>
Mikio Hara [Tue, 20 Feb 2018 03:57:51 +0000 (12:57 +0900)]
net: adjust the test for IPv4 loopback address block
We live in the era of virtualization and isolation.
There is no reason to hesitate to use IPv4 loopback address block for
umbrella-type customer accommodating services.
philhofer [Sun, 13 Aug 2017 22:36:47 +0000 (22:36 +0000)]
cmd/compile/internal/ssa: emit csel on arm64
Introduce a new SSA pass to generate CondSelect intstrutions,
and add CondSelect lowering rules for arm64.
In order to make the CSEL instruction easier to optimize,
and to simplify the introduction of CSNEG, CSINC, and CSINV
in the future, modify the CSEL instruction to accept a condition
code in the aux field.
Notably, this change makes the go1 Gzip benchmark
more than 10% faster.
David Url [Tue, 13 Feb 2018 21:03:05 +0000 (22:03 +0100)]
net/http: use RFC 723x as normative reference in docs
Replace references to the obsoleted RFC 2616 with references to RFC
7230 through 7235, to avoid unnecessary confusion.
Obvious inconsistencies are marked with todo comments.
Keith Randall [Mon, 22 Jan 2018 17:43:27 +0000 (09:43 -0800)]
cmd/compile: reset branch prediction when deleting a branch
When we go from a branch block to a plain block, reset the
branch prediction bit. Downstream passes asssume that if the
branch prediction is set, then the block has 2 successors.
Alberto Donizetti [Mon, 8 Jan 2018 12:03:57 +0000 (13:03 +0100)]
encoding/xml: simplify slice-growing logic in rawToken
It appears that old code (from 2009) in xml.(*Decoder).rawToken
replicates append's slice-growing functionality by allocating a new,
bigger backing array and then calling copy.
Simplifying the code by replacing it with a single append call does
not seem to hurt performance:
name old time/op new time/op delta
Marshal-4 11.2µs ± 1% 11.3µs ±10% ~ (p=0.069 n=19+17)
Unmarshal-4 28.6µs ± 1% 28.4µs ± 1% -0.60% (p=0.000 n=20+18)
name old alloc/op new alloc/op delta
Marshal-4 5.78kB ± 0% 5.78kB ± 0% ~ (all equal)
Unmarshal-4 8.61kB ± 0% 8.27kB ± 0% -3.90% (p=0.000 n=20+20)
name old allocs/op new allocs/op delta
Marshal-4 23.0 ± 0% 23.0 ± 0% ~ (all equal)
Unmarshal-4 189 ± 0% 190 ± 0% +0.53% (p=0.000 n=20+20)
Keith Randall [Sun, 7 Jan 2018 21:23:59 +0000 (13:23 -0800)]
cmd/compile: add | operator to make rewrite rules more succinct
Instead of
(And64 x x) -> x
(And32 x x) -> x
(And16 x x) -> x
(And8 x x) -> x
we can now do:
(And(64|32|16|8) x x) -> x
Any part of an opcode can have a parenthesized, |-separated list of possibilites.
The rule is then expanded using each piece of the | combo.
If there are multiple | clauses, they get expanded in tandem.
(All the first positions, then all the second positions, etc.)
All places | opcodes appear must have the same count.
This meta-rule generates 2 rules, a MOVL and a MOVSS rule.
This CL is carefully orchestrated to not change the generated rules file at all.
In some cases, this means we can't align the rules nicely because it changes
the whitespace in the generated code. I'll clean that up as a separate step.
There are many more opportunites to compactify rules using this new mechanism.
I've just done some examples, there's more to do.
Daniel Martí [Mon, 13 Nov 2017 09:43:17 +0000 (09:43 +0000)]
all: add more uses of stringer
By grepping for ]string{$, one can find many manual implementations of
stringer. The debug/dwarf ones needed the new -trimprefix flag, too.
html/template was fairly simple, just implementing the fallback as
stringer would. The changes there are trivial.
The ones in debug/dwarf needed a bit of extra logic since the GoString
wants to use its own format, depending on whether or not the value is
one of the known constants.
Change-Id: I501ea7deaa538fa425c8e9c2bb895f480169273f
Reviewed-on: https://go-review.googlesource.com/77253
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Daniel Martí [Mon, 19 Feb 2018 20:48:58 +0000 (20:48 +0000)]
text/template: differentiate nil from missing arg
reflect.Value is a struct and does not have a kind nor any flag for
untyped nils. As a result, it is tricky to differentiate when we're
missing a value, from when we have one but it is untyped nil.
We could start using *reflect.Value instead, to add one level of
indirection, using nil for missing values and new(reflect.Value) for
untyped nils. However, that is a fairly invasive change, and would also
mean unnecessary allocations.
Instead, use a special reflect.Value that depicts when a value is
missing. This is the case for the "final" reflect.Value in multiple
scenarios, such as the start of a pipeline. Give it a specific,
unexported type too, to make sure it cannot be mistaken for any other
valid value.
Finally, replace "final.IsValid()" with "final != missingVal", since
final.IsValid() will be false when final is an untyped nil.
Also add a few test cases, all different variants of the untyped nil
versus missing value scenario.
Fixes #18716.
Change-Id: Ia9257a84660ead5a7007fd1cced7782760b62d9d
Reviewed-on: https://go-review.googlesource.com/95215
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>
Martin Möhrmann [Sun, 18 Feb 2018 13:12:52 +0000 (14:12 +0100)]
runtime: avoid clearing memory during byte slice allocation in gobytes
Avoid using make in gobytes which clears the byte slice backing
array unnecessarily since the content is overwritten immediately again.
Check that the user provided length is positive and below the maximum
allowed allocation size explicitly in gobytes as this was done in makeslice
before this change.
Fixes #23634
Change-Id: Id852619e932aabfc468871c42ad07d34da91f45c
Reviewed-on: https://go-review.googlesource.com/94760
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Yazen2017 [Sat, 17 Feb 2018 22:24:21 +0000 (17:24 -0500)]
doc: improve clarity of map index examples
The fourth example for map indexing states you have a map of type [K]V
and attempts to read in a variable of type T. Further, the example
is meant to showcase the boolean return variable saying whether the
map contained a key, but overrides to type T. This will not compile.
Changed last updated date to February 18
Fixes: #23895
Change-Id: I63c52adbcd989afd4855e329e6c727f4c01f7881
Reviewed-on: https://go-review.googlesource.com/94906 Reviewed-by: Robert Griesemer <gri@golang.org>
Mansour Rahimi [Sun, 28 Jan 2018 21:19:13 +0000 (22:19 +0100)]
os: make MkdirAll support path in extended-length form
Calling MkdirAll on paths in extended-length form (\\?\-prefixed)
failed.
MkdirAll calls itself recursively with parent directory of given path in
its parameter. It finds parent directory by looking for delimiter in
the path, and taking the left part. When path is in extended-length form,
it finds empty path at the end.
Here is a sample of path in extended-length form:
\\?\c:\foo\bar
This change fixes that by passing trailing path separator to MkdirAll (so
it works for path like \\?\c:\).
Fixes #22230
Change-Id: I363660b262588c5382ea829773d3b6005ab8df3c
Reviewed-on: https://go-review.googlesource.com/86295 Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Yury Smolsky [Fri, 16 Feb 2018 12:56:03 +0000 (14:56 +0200)]
cmd/go: document 'go run' exit codes
Updated docs that go run does not return the exit code of
the compiled binary.
Fixes #23716
Change-Id: Ib85459974c4c6d2760ddba957ef711628098661f
Reviewed-on: https://go-review.googlesource.com/94795 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Martin Möhrmann [Sun, 21 Jan 2018 09:25:07 +0000 (10:25 +0100)]
cmd/compile: replace misleading variable name
One of the variables declared in cleantempnopop named 'kill'
does not hold a OVARKILL node but an OVARLIVE node.
Rename that variable to 'live' to differentiate it from the other
variable named kill that holds a OVARKILL node.
Passes toolstash -cmp.
Change-Id: I34c8729e5c303b8cdabe44c9af980d4f16000e4b
Reviewed-on: https://go-review.googlesource.com/88816
Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Martin Möhrmann [Sat, 27 Jan 2018 11:48:15 +0000 (12:48 +0100)]
runtime: rename map implementation and test files to use a common prefix
Rename all map implementation and test files to use "map"
as a file name prefix instead of "hashmap" for the implementation
and "map" for the test file names.
Change-Id: I7b317c1f7a660b95c6d1f1a185866f2839e69446
Reviewed-on: https://go-review.googlesource.com/90336
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Arthur Khashaev [Mon, 12 Feb 2018 00:28:12 +0000 (03:28 +0300)]
cmd/go: fix command injection in VCS path
Fixes #23867, CVE-2018-7187
Change-Id: I5d0ba4923c9ed354ef76290e149c182447f9dfe2
Reviewed-on: https://go-review.googlesource.com/94656
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Thu, 15 Feb 2018 23:57:13 +0000 (15:57 -0800)]
cmd/go: restrict meta imports to valid schemes
Before this change, when using -insecure, we permitted any meta import
repo root as long as it contained "://". When not using -insecure, we
restrict meta import repo roots to be valid URLs. People may depend on
that somehow, so permit meta import repo roots to be invalid URLs, but
require them to have valid schemes per RFC 3986.
Richard Miller [Fri, 16 Feb 2018 17:01:52 +0000 (17:01 +0000)]
net/http: increase timeout length for TestOnlyWriteTimeout
This test was sometimes timing out on the plan9/arm builder
(raspberry pi) when run in parallel with other network intensive
tests. It appears that tcp on the loopback interface could do
with some tuning for better performance on Plan 9, but until
that's done, increasing the timeout from 5 to 10 seconds allows
this test to pass. This should have no effect on other platforms
where 5 seconds was already enough.
Richard Miller [Fri, 16 Feb 2018 15:20:04 +0000 (15:20 +0000)]
runtime: don't ignore address hint for sysReserve in Plan 9
On Plan 9, sysReserve was ignoring the address hint and allocating
memory wherever it is available. This causes the new
TestArenaCollision test to fail on 32-bit Plan 9. We now use the
address hint in the specific case where sysReserve is extending the
process address space at its end, and similarly we contract the
address space in the case where sysFree is releasing memory at
the end.
Fixes #23860
Change-Id: Ia5254779ba8f1698c999832720a88de400b5f91a
Reviewed-on: https://go-review.googlesource.com/94776 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David du Colombier <0intro@gmail.com>
It's used on Solaris to import symbols from shared libraries, e.g., in
golang.org/x/sys/unix and golang.org/x/net/internal/socket.
We could use a different directive but that would require build tags
in all the places that use it.
Austin Clements [Tue, 2 Jan 2018 02:51:47 +0000 (21:51 -0500)]
runtime: replace _MaxMem with maxAlloc
Now that we have memLimit, also having _MaxMem is a bit confusing.
Replace it with maxAlloc, which better conveys what it limits. We also
define maxAlloc slightly differently: since it's now clear that it
limits allocation size, we can account for a subtle difference between
32-bit and 64-bit.
Austin Clements [Mon, 1 Jan 2018 22:53:59 +0000 (17:53 -0500)]
runtime: move comment about address space sizes to malloc.go
Currently there's a detailed comment in lfstack_64bit.go about address
space limitations on various architectures. Since that's now relevant
to malloc, move it to a more prominent place in the documentation for
memLimitBits.
Austin Clements [Sun, 31 Dec 2017 00:35:46 +0000 (19:35 -0500)]
runtime: remove non-reserved heap logic
Currently large sysReserve calls on some OSes don't actually reserve
the memory, but just check that it can be reserved. This was important
when we called sysReserve to "reserve" many gigabytes for the heap up
front, but now that we map memory in small increments as we need it,
this complication is no longer necessary.
This has one curious side benefit: currently, on Linux, allocations
that are large enough to be rejected by mmap wind up freezing the
application for a long time before it panics. This happens because
sysReserve doesn't reserve the memory, so sysMap calls mmap_fixed,
which calls mmap, which fails because the mapping is too large.
However, mmap_fixed doesn't inspect *why* mmap fails, so it falls back
to probing every page in the desired region individually with mincore
before performing an (otherwise dangerous) MAP_FIXED mapping, which
will also fail. This takes a long time for a large region. Now this
logic is gone, so the mmap failure leads to an immediate panic.
Austin Clements [Wed, 20 Dec 2017 06:05:23 +0000 (22:05 -0800)]
runtime: use sparse mappings for the heap
This replaces the contiguous heap arena mapping with a potentially
sparse mapping that can support heap mappings anywhere in the address
space.
This has several advantages over the current approach:
* There is no longer any limit on the size of the Go heap. (Currently
it's limited to 512GB.) Hence, this fixes #10460.
* It eliminates many failures modes of heap initialization and
growing. In particular it eliminates any possibility of panicking
with an address space conflict. This can happen for many reasons and
even causes a low but steady rate of TSAN test failures because of
conflicts with the TSAN runtime. See #16936 and #11993.
* It eliminates the notion of "non-reserved" heap, which was added
because creating huge address space reservations (particularly on
64-bit) led to huge process VSIZE. This was at best confusing and at
worst conflicted badly with ulimit -v. However, the non-reserved
heap logic is complicated, can race with other mappings in non-pure
Go binaries (e.g., #18976), and requires that the entire heap be
either reserved or non-reserved. We currently maintain the latter
property, but it's quite difficult to convince yourself of that, and
hence difficult to keep correct. This logic is still present, but
will be removed in the next CL.
* It fixes problems on 32-bit where skipping over parts of the address
space leads to mapping huge (and never-to-be-used) metadata
structures. See #19831.
This also completely rewrites and significantly simplifies
mheap.sysAlloc, which has been a source of many bugs. E.g., #21044,
#20259, #18651, and #13143 (and maybe #23222).
This change also makes it possible to allocate individual objects
larger than 512GB. As a result, a few tests that expected huge
allocations to fail needed to be changed to make even larger
allocations. However, at the moment attempting to allocate a humongous
object may cause the program to freeze for several minutes on Linux as
we fall back to probing every page with addrspace_free. That logic
(and this failure mode) will be removed in the next CL.
Fixes #10460.
Fixes #22204 (since it rewrites the code involved).
This slightly slows down compilebench and the x/benchmarks garbage
benchmark.
Relative to the start of the sparse heap changes (starting at and
including "runtime: fix various contiguous bitmap assumptions"),
overall slowdown is roughly 1% on GC-intensive benchmarks:
Austin Clements [Tue, 19 Dec 2017 04:35:34 +0000 (20:35 -0800)]
runtime: eliminate most uses of mheap_.arena_*
This replaces all uses of the mheap_.arena_* fields outside of
mallocinit and sysAlloc. These fields fundamentally assume a
contiguous heap between two bounds, so eliminating these is necessary
for a sparse heap.
Many of these are replaced with checks for non-nil spans at the test
address (which in turn checks for a non-nil entry in the heap arena
array). Some of them are just for debugging and somewhat meaningless
with a sparse heap, so those we just delete.
Austin Clements [Sat, 9 Dec 2017 03:57:53 +0000 (22:57 -0500)]
runtime: make the heap bitmap sparse
This splits the heap bitmap into separate chunks for every 64MB of the
heap and introduces an index mapping from virtual address to metadata.
It modifies the heapBits abstraction to use this two-level structure.
Finally, it modifies heapBitsSetType to unroll the bitmap into the
object itself and then copy it out if the bitmap would span
discontiguous bitmap chunks.
This is a step toward supporting general sparse heaps, which will
eliminate address space conflict failures as well as the limit on the
heap size.
It's also advantageous for 32-bit. 32-bit already supports
discontiguous heaps by always starting the arena at address 0.
However, as a result, with a contiguous bitmap, if the kernel chooses
a high address (near 2GB) for a heap mapping, the runtime is forced to
map up to 128MB of heap bitmap. Now the runtime can map sections of
the bitmap for just the parts of the address space used by the heap.
Updates #10460.
This slightly slows down the x/garbage and compilebench benchmarks.
However, I think the slowdown is acceptably small.
Austin Clements [Sat, 9 Dec 2017 03:24:59 +0000 (22:24 -0500)]
runtime: fix various contiguous bitmap assumptions
There are various places that assume the heap bitmap is contiguous and
scan it sequentially. We're about to split up the heap bitmap. This
commit modifies all of these except heapBitsSetType to use the
heapBits abstractions so they can transparently switch to a
discontiguous bitmap.
Updates #10460. This is a step toward supporting sparse heaps.
Austin Clements [Thu, 23 Jun 2016 20:25:50 +0000 (14:25 -0600)]
runtime: lay out heap bitmap forward in memory
Currently the heap bitamp is laid in reverse order in memory relative
to the heap itself. This was originally done out of "excessive
cleverness" so that computing a bitmap pointer could load only the
arena_start field and so that heaps could be more contiguous by
growing the arena and the bitmap out from a common center point.
However, this appears to have no actual performance benefit, it
complicates nearly every use of the bitmap, and it makes already
confusing code more confusing. Furthermore, it's still possible to use
a single field (the new bitmap_delta) for the bitmap pointer
computation by employing slightly different excessive cleverness.
Hence, this CL puts the bitmap into forward order.
Austin Clements [Mon, 4 Dec 2017 15:58:15 +0000 (10:58 -0500)]
runtime: consolidate mheap.lookup* and spanOf*
I think we'd forgotten about the mheap.lookup APIs when we introduced
spanOf*, but, at any rate, the spanOf* functions are used far more
widely at this point, so this CL eliminates the mheap.lookup*
functions in favor of spanOf*.
Austin Clements [Tue, 12 Dec 2017 00:40:12 +0000 (19:40 -0500)]
runtime: split object finding out of heapBitsForObject
heapBitsForObject does two things: it finds the base of the object and
it creates the heapBits for the base of the object. There are several
places where we just care about the base of the object. Furthermore,
greyobject only needs the heapBits in the checkmark path and can
easily compute them only when needed. Once we eliminate passing the
heap bits to grayobject, almost all uses of heapBitsForObject don't
need the heap bits.
Hence, this splits heapBitsForObject into findObject and
heapBitsForAddr (the latter already exists), removes the hbits
argument to grayobject, and replaces all heapBitsForObject calls with
calls to findObject.
In addition to making things cleaner overall, heapBitsForAddr is going
to get more expensive shortly, so it's important that we don't do it
needlessly.
Note that there's an interesting performance pitfall here. I had
originally moved findObject to mheap.go, since it made more sense
there. However, that leads to a ~2% slow down and a whopping 11%
increase in L1 icache misses on both the x/garbage and compilebench
benchmarks. This suggests we may want to be more principled about
this, but, for now, let's just leave findObject in mbitmap.go.
(I tried to make findObject small enough to inline by splitting out
the error case, but, sadly, wasn't quite able to get it under the
inlining budget.)
Austin Clements [Mon, 4 Dec 2017 15:43:11 +0000 (10:43 -0500)]
runtime: replace mlookup and findObject with heapBitsForObject
These functions all serve essentially the same purpose. mlookup is
used in only one place and findObject in only three. Use
heapBitsForObject instead, which is the most optimized implementation.
(This may seem slightly silly because none of these uses care about
the heap bits, but we're about to split up the functionality of
heapBitsForObject anyway. At that point, findObject will rise from the
ashes.)
Austin Clements [Sun, 3 Dec 2017 23:08:57 +0000 (18:08 -0500)]
runtime: expand/update lfstack address space assumptions
I was spelunking Linux's address space code and found that some of the
information about maximum virtual addresses in lfstack's comments was
out of date. This expands and updates the comment.
Chad Rosier [Thu, 15 Feb 2018 19:49:03 +0000 (14:49 -0500)]
cmd/compile: improve absorb shifts optimization for arm64
Current absorb shifts optimization can generate dead Value nodes which increase
use count of other live nodes. It will impact other optimizations (such as
combined loads) which are enabled based on specific use count. This patch fixes
the issue by decreasing the use count of nodes referenced by dead Value nodes
generated by absorb shifts optimization.
Performance impacts on go1 benchmarks (data collected on A57@2GHzx8):
Than McIntosh [Tue, 6 Feb 2018 14:36:13 +0000 (09:36 -0500)]
compiler: honor //line directives in DWARF variable file/line attrs
During DWARF debug generation, the DW_AT_decl_line / DW_AT_decl_file
attributes for variable DIEs were being computed without taking into
account the possibility of "//line" directives. Fix things up to use
the correct src.Pos methods to pick up this info.
Hana Kim [Tue, 5 Dec 2017 23:02:10 +0000 (18:02 -0500)]
runtime/trace: implement annotation API
This implements the annotation API proposed in golang.org/cl/63274.
traceString is updated to protect the string map with trace.stringsLock
because the assumption that traceString is called by a single goroutine
(either at the beginning of tracing and at the end of tracing when
dumping all the symbols and function names) is no longer true.
traceString is used by the annotation apis (NewContext, StartSpan, Log)
to register frequently appearing strings (task and span names, and log
keys) after this change.
NewContext -> one or two records (EvString, EvUserTaskCreate)
end function -> one record (EvUserTaskEnd)
StartSpan -> one or two records (EvString, EvUserSpan)
span end function -> one or two records (EvString, EvUserSpan)
Log -> one or two records (EvString, EvUserLog)
EvUserLog record is of the typical record format written by traceEvent
except that it is followed by bytes that represents the value string.
In addition to runtime/trace change, this change includes
corresponding changes in internal/trace to parse the new record types.
Future work to improve efficiency:
More efficient unique task id generation instead of atomic. (per-P
counter).
Instead of a centralized trace.stringsLock, consider using per-P
string cache or something more efficient.
Hana Kim [Thu, 9 Nov 2017 16:39:10 +0000 (11:39 -0500)]
runtime/trace: user annotation API
This CL presents the proposed user annotation API skeleton.
This CL bumps up the trace version to 1.11.
Design doc https://goo.gl/iqJfJ3
Implementation CLs are followed.
The API introduces three basic building blocks. Log, Span, and Task.
Log is for basic logging. When called, the message will be recorded
to the trace along with timestamp, goroutine id, and stack info.
trace.Log(ctx, messageType message)
Span can be thought as an extension of log to record interesting
time interval during a goroutine's execution. A span is local to a
goroutine by definition.
trace.WithSpan(ctx, "doVeryExpensiveOp", func(ctx context) {
/* do something very expensive */
})
Task is higher-level concept that aids tracing of complex operations
that encompass multiple goroutines or are asynchronous.
For example, an RPC request, a HTTP request, a file write, or a
batch job can be traced with a Task.
Note we chose to design the API around context.Context so it allows
easier integration with other tracing tools, often designed around
context.Context as well. Log and WithSpan APIs recognize the task
information embedded in the context and record it in the trace as
well. That allows the Go execution tracer to associate and group
the spans and log messages based on the task information.
In order to create a Task,
ctx, end := trace.NewContext(ctx, "myTask")
defer end()
The Go execution tracer measures the time between the task created
and the task ended for the task latency.
Carlos Eduardo Seo [Wed, 3 Jan 2018 19:55:40 +0000 (17:55 -0200)]
cmd/asm, cmd/internal/obj/ppc64: add Immediate Shifted opcodes for ppc64x
This change adds ADD/AND/OR/XOR Immediate Shifted instructions for
ppc64x so they are usable in Go asm code. These instructions were
originally present in asm9.go, but they were only usable in that
file (as -AADD, -AANDCC, -AOR, -AXOR). These old mnemonics are now
removed.
Mikio Hara [Tue, 13 Feb 2018 20:33:15 +0000 (05:33 +0900)]
all: drop support for Windows Vista or below (Windows XP)
Per the notice in the Go 1.10 release notes, this change drops the
support for Windows Vista or below (including Windows XP) and
simplifies the code for the sake of maintenance.
There is one exception to the above. The code related to DLL and
system calls still remains in the runtime package. The remaining code
will be refined and used for supporting upcoming Windows versions in
future.
Tobias Klauser [Thu, 15 Feb 2018 11:20:27 +0000 (12:20 +0100)]
net, internal/poll, net/internal/socktest: set SOCK_{CLOEXEC,NONBLOCK} atomically on NetBSD
NetBSD supports the SOCK_CLOEXEC and SOCK_NONBLOCK flags to the socket
syscall since version 6.0. The same version also introduced the paccept
syscall which can be used to implement syscall.Accept4.
Robert Griesemer [Thu, 15 Feb 2018 04:54:28 +0000 (20:54 -0800)]
cmd/compile/internal/syntax: don't assume (operator) ~ means operator ^
The scanner assumed that ~ really meant ^, which may be helpful when
coming from C. But ~ is not a valid Go token, and pretending that it
should be ^ can lead to confusing error messages. Better to be upfront
about it and complain about the invalid character in the first place.
This was code "inherited" from the original yacc parser which was
derived from a C compiler. It's 10 years later and we can probably
assume that people are less confused about C and Go.
Fixes #23587.
Change-Id: I8d8f9b55b0dff009b75c1530d729bf9092c5aea6
Reviewed-on: https://go-review.googlesource.com/94160 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Martin Möhrmann [Sat, 3 Feb 2018 15:29:54 +0000 (16:29 +0100)]
runtime: use new instead of newobject to create hmap in makemap
The runtime.hmap type is known at compile time.
Using new(hmap) avoids loading the hmap type from the maptype
supplied as an argument to makemap which is only known at runtime.
This change makes makemap consistent with makemap_small
by using new(hmap) instead of newobject in both functions.
Change-Id: Ia47acfda527e8a71d15a1a7a4c2b54fb923515eb
Reviewed-on: https://go-review.googlesource.com/91775
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Martin Möhrmann [Mon, 29 Jan 2018 21:57:54 +0000 (22:57 +0100)]
runtime: improve test file naming
The runtime builtin functions that are tested in append_test.go
are defined in slice.go. Renaming the test file to slice_test.go
makes this relation explicit with a common file name prefix.
Change-Id: I2f89ec23a6077fe6b80d2161efc760df828c8cd4
Reviewed-on: https://go-review.googlesource.com/90655
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Robert Griesemer [Thu, 15 Feb 2018 00:57:28 +0000 (16:57 -0800)]
cmd/compile/internal/syntax: more tolerant handling of missing function invocation in go/defer
Assume that an expression that is not a function call in a defer/go
statement is indeed a function that is just missing its invocation.
Report the error but continue with a sane syntax tree.
Fixes #23586.
Change-Id: Ib45ebac57c83b3e39ae4a1b137ffa291dec5b50d
Reviewed-on: https://go-review.googlesource.com/94156 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Matthew Dempsky [Wed, 14 Feb 2018 22:08:17 +0000 (14:08 -0800)]
cmd/compile: fix typechecking of untyped boolean expressions
Previously, if we typechecked a statement like
var x bool = p1.f == p2.f && p1.g == p2.g
we would correctly update the '&&' node's type from 'untyped bool' to
'bool', but the '==' nodes would stay 'untyped bool'. This is
inconsistent, and caused consistency checks during walk to fail.
This CL doesn't pass toolstash because it seems to slightly affect the
register allocator's heuristics. (Presumably 'untyped bool's were
previously making it all the way through SSA?)
Fixes #23414.
Change-Id: Ia85f8cfc69b5ba35dfeb157f4edf57612ecc3285
Reviewed-on: https://go-review.googlesource.com/94022
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Joe Tsai [Wed, 6 Dec 2017 06:53:48 +0000 (22:53 -0800)]
encoding/json: make error capture logic in recover more type safe
Rather than only ignoring runtime.Error panics, which are a very
narrow set of possible panic values, switch it such that the json
package only captures panic values that have been properly wrapped
in a jsonError struct. This ensures that only intentional panics
originating from the json package are captured.
Robert Griesemer [Wed, 14 Feb 2018 00:37:51 +0000 (16:37 -0800)]
go/types: make gotype continue after syntax errors
This avoids odd behavior where sometimes a lot of useful
errors are not reported simply because of a small syntax
error.
Tested manually with non-existing files. (We cannot easily
add an automatic test because this is a stand-alone binary
in this directory that must be built manually.)
Fixes #23593.
Change-Id: Iff90f95413bed7d1023fa0a5c9eb0414144428a9
Reviewed-on: https://go-review.googlesource.com/93815 Reviewed-by: Alan Donovan <adonovan@google.com>
Ilya Tocar [Mon, 4 Dec 2017 20:24:16 +0000 (14:24 -0600)]
cmd/compile/internal/ssa: don't spill register offsets on amd64
Transform (ADDQconst SP) into (LEA SP), because lea is rematerializeable,
so this avoids register spill. We can't mark ADDQconst as rematerializeable,
because it clobbers flags. This makes go binary ~2kb smaller.
For reference here is generated code for function from bug report.
Before:
CALL "".g(SB)
MOVBLZX (SP), AX
LEAQ 8(SP), DI
TESTB AX, AX
JEQ 15
MOVQ "".p(SP), SI
DUFFCOPY $196
MOVQ $0, (SP)
PCDATA $0, $1
CALL "".h(SB)
RET
MOVQ DI, ""..autotmp_2-8(SP) // extra spill
PCDATA $0, $2
CALL "".g(SB)
MOVQ ""..autotmp_2-8(SP), DI // extra register fill
MOVQ "".p(SP), SI
DUFFCOPY $196
MOVQ $1, (SP)
PCDATA $0, $1
CALL "".h(SB)
JMP 14
END
After:
CALL "".g(SB)
MOVBLZX (SP), AX
TESTB AX, AX
JEQ 15
LEAQ 8(SP), DI
MOVQ "".p(SP), SI
DUFFCOPY $196
MOVQ $0, (SP)
PCDATA $0, $1
CALL "".h(SB)
RET
PCDATA $0, $0 // no spill
CALL "".g(SB)
LEAQ 8(SP), DI // rematerialized instead
MOVQ "".p(SP), SI
DUFFCOPY $196
MOVQ $1, (SP)
PCDATA $0, $1
CALL "".h(SB)
JMP 14
END
Popcnt has false dependency on output register and generates
MOVQ $0, reg to break it. But recently we switched MOVQ $0, reg
encoding from xor reg, reg to actual mov $0, reg. This CL updates
code generation for popcnt to use actual XOR.
Heschi Kreinick [Mon, 29 Jan 2018 22:01:41 +0000 (17:01 -0500)]
cmd/compile/internal: reuse memory for valueToProgAfter
Not a big improvement, but does help edge cases like the SSA package.
Change-Id: I40e531110b97efd5f45955be477fd0f4faa8d545
Reviewed-on: https://go-review.googlesource.com/92396
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
Heschi Kreinick [Thu, 26 Oct 2017 19:40:17 +0000 (15:40 -0400)]
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Heschi Kreinick [Tue, 23 Jan 2018 19:10:08 +0000 (14:10 -0500)]
cmd/compile/internal: decouple scope tracking from location lists
We're trying to enable location lists by default, and it's easier to do
that if we don't have to worry about scope tracking at the same time.
We can evaluate their performance impact separately.
However, that does mean that "err" is ambiguous in the test case, so
rename it to err2 for now.
Ian Lance Taylor [Mon, 27 Nov 2017 23:40:28 +0000 (15:40 -0800)]
runtime: use private futexes on Linux
By default futexes are permitted in shared memory regions, which
requires the kernel to translate the memory address. Since our futexes
are never in shared memory, set FUTEX_PRIVATE_FLAG, which makes futex
operations slightly more efficient.
fanzha02 [Thu, 30 Nov 2017 08:30:53 +0000 (08:30 +0000)]
cmd/asm: add PRFM instruction on ARM64
The current assembler cannot handle PRFM(immediate) instruciton.
The fix creates a prfopfield struct that contains the eight
prefetch operations and the value to use in instruction. And add
the test cases.