cmd/compile: correct comment for len check when make slice
CL 226737 optimizes len check when make slice. The comment that cap is
constrainted to [0, 2^31) is not quite true, it's 31 or 63 depends on
whether it's 32/64-bit systems.
Change-Id: I6f54e41827ffe4d0b67a44975da3ce07b2fabbad
Reviewed-on: https://go-review.googlesource.com/c/go/+/227803
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Keith Randall [Fri, 10 Apr 2020 21:07:13 +0000 (14:07 -0700)]
cmd/link: turn ASLR off for netbsd+race
The race detector can't handle ASLR (adddress space layout randomization).
On some platforms it can re-exec the binary with ASLR off. But not NetBSD.
For NetBSD we have to introduce a special ELF header note that tells
the kernel not to use ASLR.
This works fine for internal linking. For external linking it also works,
but "readelf -n" shows multiple notes in the resulting binary. Maybe the
last one wins? Not sure, but it appears to work.
David Chase [Mon, 11 Nov 2019 18:18:38 +0000 (13:18 -0500)]
cmd/compile: add explanations to escape-analysis JSON/LSP logging
For 1.15.
From the test:
{"range":{"start":{"line":7,"character":13},"end":{...},"severity":3,"code":"leaks","source":"go compiler","message":"parameter z leaks to ~r2 with derefs=0","relatedInformation":[
{"location":{"uri":"file://T/file.go","range":{"start":{"line":9,"character":13},"end":{...}},"message":"escflow: flow: y = z:"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":9,"character":13},"end":{...}},"message":"escflow: from y = \u003cN\u003e (assign-pair)"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":9,"character":13},"end":{...}},"message":"escflow: flow: ~r1 = y:"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":4,"character":11},"end":{...}},"message":"inlineLoc"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":9,"character":13},"end":{...}},"message":"escflow: from y.b (dot of pointer)"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":4,"character":11},"end":{...}},"message":"inlineLoc"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":9,"character":13},"end":{...}},"message":"escflow: from \u0026y.b (address-of)"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":4,"character":9},"end":...}},"message":"inlineLoc"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":9,"character":13},"end":{...}},"message":"escflow: from ~r1 = \u003cN\u003e (assign-pair)"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":9,"character":3},"end":...}},"message":"escflow: flow: ~r2 = ~r1:"},
{"location":{"uri":"file://T/file.go","range":{"start":{"line":9,"character":3},"end":...}},"message":"escflow: from return (*int)(~r1) (return)"}]}
Lynn Boger [Tue, 31 Mar 2020 14:08:29 +0000 (10:08 -0400)]
cmd/internal/obj/ppc64: add support for pcalign 32 on ppc64x
Previous PCALIGN support on ppc64x only accepted 8 and 16 byte
alignment since the default function alignment was 16. Now that
the function's alignment can be set to a larger value when needed,
PCALIGN can accept 32. When this happens then the function's
alignment will be changed to 32.
Test has been updated to recognized this new value.
Andy Pan [Fri, 10 Apr 2020 02:45:58 +0000 (10:45 +0800)]
runtime: replace the type of netpollWakeSig from a uintptr to a uint32
There's no need for netpollWakeSig to use a uintptr type, a uint32 is enough.
Relevant CL: CL 212737
Change-Id: Ide24478b217a02bad62f7e000a9680c26a8c5366
Reviewed-on: https://go-review.googlesource.com/c/go/+/227798
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Keith Randall [Thu, 19 Mar 2020 23:25:08 +0000 (16:25 -0700)]
cmd/compile: start implementing strongly typed aux and auxint fields
Right now the Aux and AuxInt fields of ssa.Values are typed as
interface{} and int64, respectively. Each rule that uses these values
must cast them to the type they actually are (*obj.LSym, or int32, or
ValAndOff, etc.), use them, and then cast them back to interface{} or
int64.
We know for each opcode what the types of the Aux and AuxInt fields
should be. So let's modify the rule generator to declare the types to
be what we know they should be, autoconverting to and from the generic
types for us. That way we can make the rules more type safe.
It's difficult to make a single CL for this, so I've coopted the "=>"
token to indicate a rule that is strongly typed. "->" rules are
processed as before. That will let us migrate a few rules at a time in
separate CLs. Hopefully we can reach a state where all rules are
strongly typed and we can drop the distinction.
This CL changes just a few rules to get a feel for what this
transition would look like.
I've decided not to put explicit types in the rules. I think it
makes the rules somewhat clearer, but definitely more verbose.
In particular, the passthrough rules that don't modify the fields
in question are verbose for no real reason.
Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/190197
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
Than McIntosh [Fri, 13 Mar 2020 15:09:05 +0000 (11:09 -0400)]
test: deflaking measures for runtime gdb test
Tweak the runtime's GDB python test to try to reduce flake failures.
Background: the intent of the testpoint in question is to make sure
that python-supported commands like "info goroutines" or "goroutine 1
backtrace" work properly. The Go code being run under the debugger as
part of the test is single-threaded, but the test is written assuming
that in addition to the primary goroutine there will be other
background goroutines available (owned by the runtime). The flakiness
seems to crop up the most when requesting a backtrace for one of these
background goroutines; the speculation is that if we catch a
runtime-owned goroutine in an odd state, this could interfere with the
test.
The change in this patch is to explicitly start an additional
goroutine from the main thread, so that when the debugger stops the
main thread we can be sure that there is some other non-main goroutine
in a known state.
This change authored by Josh Bleecher Snyder <josharian@gmail.com>.
net: convert many Close tests to use parallel subtests
Also set a deadline in TestCloseWrite so that we can more easily
determine which kind of connection is getting stuck on the
darwin-arm64-corellium builder (#34837).
Change-Id: I8ccacbf436e8e493fb2298a79b17e0af8fc6eb81
Reviewed-on: https://go-review.googlesource.com/c/go/+/227588
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd/compile: do not allocate bucket for non-escaping map
For map with hint larger than BUCKETSIZE, makemap ignore allocated
bucket and allocate buckets itself. So do not allocate bucket in
this case, save us the cost of zeroing+assignment to the bucket.
The previous change moved code around to create slicesym.
This change simplifies slicesym and its callsites
by accepting an int64 for lencap instead of a node,
and by removing all the calls to gdata.
It also stops modifying n,
which avoids the need to make a copy of it.
cmd/compile,runtime: pass only ptr and len to some runtime calls
Some runtime calls accept a slice, but only use ptr and len.
This change modifies most such routines to accept only ptr and len.
After this change, the only runtime calls that accept an unnecessary
cap arg are concatstrings and slicerunetostring.
Neither is particularly common, and both are complicated to modify.
Negligible compiler performance impact. Shrinks binaries a little.
There are only a few regressions; the one I investigated was
due to register allocation fluctuation.
Passes 'go test -race std cmd', modulo #38265 and #38266.
Wow, does that take a long time to run.
cmd/compile: handle some additional phis in shortcircuit
Prior to this change, the shortcircuit pass could only
handle blocks containing only a single phi control value,
possibly wrapped in some OpNot and OpCopy values.
This change partially lifts this limitation.
It handles some cases in which the block contains other phi values.
This appears to happen most commonly in cases in which
the conditionals being checked involve the memory state,
in which case there is a phi memory value in the block.
The general idea here is to use the information we have about
the CFG to (1) move the other phi values into other blocks
and/or (2) rewrite uses of the other phi values in other blocks.
For example, consider this CFG:
p q
\ /
b
/ \
t u
And consider a phi value v in block b.
We'll write v = Phi(p: x, q: y) to say that v has value x corresponding
to inbound block p, and value y for block q.
We will rewrite this CFG to:
p q
| /
| b
|/ \
t u
What should we do with v?
Any uses of v in u can be replaced with y. Why?
If we are in block u, we came from b, and before that from q.
If prior to b we came from p, then we would have gone to t, not u.
Since we came from q, we know that v took the value y.
Uses of v in t are a bit more complicated.
It is going to end up being a phi value: Phi(p: ?, b: ?).
Suppose, after the rewrite, we came from block p.
Then, before the rewrite, we would have gone to b,
where v would have the value x.
So we have Phi(p: x, b: ?).
Suppose, after the rewrite, we came from block b.
Then we must have come from block q.
If we come from block q, v has value y.
So we have Phi(p: x, b: y).
Uses of v in t can thus be replaced with a new phi value,
with the same values as v, but with altered predecessors.
Similar reasoning can be employed to rewrite or replace
other uses of v elsewhere in the CFG, so that v itself can be eliminated,
and the CFG rewrite can proceed.
This change sets up the infrastructure for such optimizations
and adds a few cheap ones. All optimizations in this change depend
only on the shape of the CFG; future changes may also depend on where
v's uses are. That analysis is more powerful but more expensive,
and should be done incrementally.
The use of closures here is perhaps a bit unusual,
but during development it proved critical to having readable code.
We must decide early on whether we can safely do the CFG modifications,
and then later fix up the phis if so.
Safely storing state and decisions across these two phases is hard to do readably.
Closures solve the problem neatly.
I manually instrumented the code paths in shortcircuitPhiPlan.
During make.bash there are nearly 6000 invocations.
The least-visited code path gets run 85 times,
so all the code in this CL is reasonably well-exercised.
Here is a concrete example of code improved by this change:
func f(e interface{}) int {
if x, ok := e.(int); ok {
return x
}
return 0
}
Omitting PCDATA, FUNCDATA, and the like, it used to compile to:
I noticed a timeout in TestIgnore in
https://build.golang.org/log/52d83a72f3a5ea9a16eb5d670c729694144f9624,
which suggests that the settle time is currently set too low.
I've also added a check for the same GO_TEST_TIMEOUT_SCALE used in
TestTerminalSignal, so that if this builder remains too slow we can
increase the builder's scale factor rather than the test's baseline
running time.
Updates #33174
Change-Id: I18b10eaa3bb5ae2f604300aedaaf6f79ee7ad567
Reviewed-on: https://go-review.googlesource.com/c/go/+/227649
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
net/url: add URL.Redacted to return a password scrubbed string
Returning an URL.String() without the password is very useful for
situations where the URL is supposed to be logged and the password is
not useful to be shown.
This method re-uses URL.String() but with the password scrubbed and
substituted for a "xxxxx" in order to make it obvious that there was a
password. If the URL had no password then no "xxxxx" will be shown.
Ruixin(Peter) Bao [Thu, 21 Nov 2019 15:44:23 +0000 (10:44 -0500)]
cmd/compile: allow floating point Ops to produce flags on s390x
On s390x, some floating point arithmetic instructions (FSUB, FADD) generate flag.
This patch allows those related SSA ops to return a tuple, where the second argument of
the tuple is the generated flag. We can use the flag and remove the
subsequent comparison instruction (e.g: LTDBR).
This CL also reduces the .text section for math.test binary by 0.4KB.
Rémy Oudompheng [Thu, 5 Mar 2020 06:59:00 +0000 (07:59 +0100)]
math/big: correct off-by-one access in divBasic
The divBasic function computes the quotient of big nats u/v word by word.
It estimates each word qhat by performing a long division (top 2 words of u
divided by top word of v), looks at the next word to correct the estimate,
then perform a full multiplication (qhat*v) to catch any inaccuracy in the
estimate.
In the latter case, "negative" values appear temporarily and carries
must be carefully managed, and the recursive division refactoring
introduced a case where qhat*v has the same length as v, triggering an
out-of-bounds write in the case it happens when computing the top word
of the quotient.
Fixes #37499
Change-Id: I15089da4a4027beda43af497bf6de261eb792f94
Reviewed-on: https://go-review.googlesource.com/c/go/+/221980 Reviewed-by: Robert Griesemer <gri@golang.org>
The cgo build tag is not necessary for root_darwin_arm64.go. We can't
build for darwin/arm64 without cgo, and even if we did 1) this code
would work fine 2) the no-cgo code that shells out to
/usr/bin/security would not work.
Michael Matloob [Fri, 14 Feb 2020 22:44:00 +0000 (17:44 -0500)]
cmd/go: allow configuring module cache directory with GOMODCACHE
Adds a GOMODCACHE environment variable that's used by cmd/go to determine the
location of the module cache. The default value of GOMODCACHE will be
GOPATH[0]/pkg/mod, the default location of the module cache before this change.
Replace the cmd/go/internal/modfetch.PkgMod variable which previously held the
location of the module cache with the new cmd/go/internal/cfg.GOMODCACHE
variable, for consistency with many of the other environment variables that
affect the behavior of cmd/go. (Most of the changes in this CL are due to
moving/renaming the variable.)
The value of cfg.GOMODCACHE is now set using a variable initializer. It was
previously set in cmd/go/internal/modload.Init.
The location of GOPATH/pkg/sumdb is unchanged by this CL. While it was
previously determined using the value of PkgMod, it now is determined
independently dirctly from the value of GOPATH[0].
Fixes #34527
Change-Id: Id4d31d217b3507d6057c8ef7c52af1a0606603e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/219538
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Jay Conrod <jayconrod@google.com>
Michael Pratt [Tue, 7 Apr 2020 16:12:44 +0000 (12:12 -0400)]
runtime/pprof: try to use real stack in TestTryAdd
TestTryAdd is particularly brittle because it tests some real cases by
constructing fake sample stack frames. If those frames don't correctly
represent what the runtime would generate then they may fail to catch
regressions.
Instead, call runtime.Callers at the bottom of real function calls to
generate real frames as a base for truncation, etc in tests. Several of
these tests still have to fake parts of the frames to test the right
thing, but this is a bit less fragile.
Change-Id: I62522a9ded5544b06d1bf28550af5400f3af667b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227484
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Xiangdong Ji [Fri, 27 Mar 2020 11:04:21 +0000 (11:04 +0000)]
runtime: fix infinite callstack of cgo on arm64
This change adds CFA information to the assembly function 'crosscall1'
and reorgnizes its code to establish well-formed prologue and epilogue.
It will fix an infinite callstack issue when debugging cgo program with
GDB on arm64.
Brief root cause analysis:
GDB's aarch64 unwinder parses prologue to determine current frame's size
and previous PC&SP if CFA information is not available.
The unwinder parses the prologue of 'crosscall1' to determine a frame size
of 0x10, then turns to its next frame trying to compute its previous PC&SP
as they are not saved on current frame's stack as per its 'traditional frame
unwind' rules, which ends up getting an endless frame chain like:
[callee] : pc:<pc0>, sp:<sp0>
crosscall1: pc:<pc1>, sp:<sp0>+0x10
[caller] : pc:<pc1>, sp:<sp0>+0x10+0x10
[caller] : pc:<pc1>, sp:<sp0>+0x10+0x10+0x10
...
GDB fails to detect the 'caller' frame is same as 'crosscall1' and terminate
unwinding since SP increases everytime.
Fixes #37238
Change-Id: Ia6bd8555828541a3a61f7dc9b94dfa00775ec52a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226999
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Dan Scales [Thu, 14 Nov 2019 01:34:47 +0000 (17:34 -0800)]
runtime: static lock ranking for the runtime (enabled by GOEXPERIMENT)
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.
Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.
Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.
See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.
I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.
I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).
Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.
For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.
Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.
To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.
Fixes #38029
Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619 Reviewed-by: Dan Scales <danscales@google.com> Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Michael Munday [Fri, 6 Mar 2020 22:20:45 +0000 (22:20 +0000)]
cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.
Fixes #37316.
Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
cmd/compile: use MOVBQZX for OpAMD64LoweredHasCPUFeature
In the commit message of CL 212360, I wrote:
> This new intrinsic ... generates MOVB+TESTB+NE.
> (It is possible that MOVBQZX+TESTQ+NE would be better.)
I should have tested. MOVBQZX+TESTQ+NE does in fact appear to be better.
For the benchmark in #36196, on my machine:
name old time/op new time/op delta
FMA-8 0.86ns ± 6% 0.70ns ± 5% -18.79% (p=0.000 n=98+97)
NonFMA-8 0.61ns ± 5% 0.60ns ± 4% -0.74% (p=0.001 n=100+97)
Interestingly, these are both considerably faster than
the measurements I took a couple of months ago (1.4ns/2ns).
It appears that CL 219131 (clearing VZEROUPPER in asyncPreempt) helped a lot.
And FMA is now once again slower than NonFMA, although this change
helps it regain some ground.
Ruixin(Peter) Bao [Tue, 7 Apr 2020 13:20:08 +0000 (09:20 -0400)]
cmd/internal: add MVCIN instruction to s390x assembler
On s390x, we already have MVCIN opcode in asmz.go,
but we did not use it. This CL uses that opcode and adds MVCIN
instruction.
MVCIN instruction can be used to move data from one storage location
to another while reversing the order of bytes within the field. This
could be useful when transforming data from little-endian to big-endian.
Change-Id: Ifa1a911c0d3442f4a62f91f74ed25b196d01636b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227478 Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The darwin/arm port is removed in Go 1.15. Setting GOOS=darwin
GOARCH=arm will fail, therefore "go test cmd/link" on macOS will
fail (in non -short mode). Remove this test point.
Updates #37611.
Change-Id: Ia9531c4b4a6692a0c49153517af9fdddd1f3e0bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/227341 Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd/compile: lay out exit post-dominated blocks at the end
Complete a long-standing TODO in the code.
Exit blocks are cold code, so we lay them out at the end of the function.
Blocks that are post-dominated by exit blocks are also ipso facto exit blocks.
Treat them as such.
Implement using a simple loop, because there are generally very few exit blocks.
In addition to improved instruction cache, this empirically yields
better register allocation.
This also appears to improve compiler performance
(-0.15% geomean time/op, -1.20% geomean user time/op),
but that could just be alignment effects.
Compiler benchmarking hasn't been super reliably recently,
and there's no particular reason to think this should
speed up the compiler that much.
Multiple instances of testDWARF run in parallel, with a shared
backing store of the env input slice. Do modification of the
environment locally, instead of on the shared slice.
'go get' will now check absolute paths without wildcards the same way
it checks relative paths. modload.DirImportPath may be used for both
without converting path separators.
Fixes #38038
Change-Id: I453299898ece58f3b5002a5e80021d6bfe815fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/226857
Run-TryBot: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Michael Matloob <matloob@golang.org>
Lynn Boger [Mon, 30 Mar 2020 19:23:19 +0000 (15:23 -0400)]
cmd/compile: improve lowered moves and zeros for ppc64le
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.
This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:
If the final pass(es) are identical during ssa.html generation,
they are persisted in-memory as "pendingPhases" but never get
written as a column in the html. This change flushes those
in-memory phases.
Change-Id: I05c73d848b8f40dc864a18c733ca3f47b1eab54d
Reviewed-on: https://go-review.googlesource.com/c/go/+/227004 Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd/compile: refactor around HTMLWriter removing logger in favor of Func
Replace HTMLWriter's Logger field with a *Func. Implement Fatalf method
for HTMLWriter which gets the Frontend() from the Func and calls down
into it's Fatalf method, passing the msg and args along. Replace
remaining calls to the old Logger with calls to logging methods on
the Func.
cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.
Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.
This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.
One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)
This CL does only amd64. It is easy to extend to other architectures.
For the benchmark in #36196, on my machine, this offers a mild speedup.
name old time/op new time/op delta
FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96)
NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98)
Updates #15808
Updates #36196
Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
Dan Scales [Wed, 1 Apr 2020 03:24:05 +0000 (20:24 -0700)]
cmd/compile: allow mid-stack inlining when there is a cycle of recursion
We still disallow inlining for an immediately-recursive function, but allow
inlining if a function is in a recursion chain.
If all functions in the recursion chain are simple, then we could inline
forever down the recursion chain (eventually running out of stack on the
compiler), so we add a map to keep track of the functions we have
already inlined at a call site. We stop inlining when we reach a
function that we have already inlined in the recursive chain. Of course,
normally the inlining will have stopped earlier, because of the cost
function.
We could also limit the depth of inlining by a simple count (say, limit
max inlining of 10 at any given site). Would that limit other
opportunities too much?
Added a test in test/inline.go. runtime.BenchmarkStackCopyNoCache() is
also already a good test that triggers the check to stop inlining
when we reach the start of the recursive chain again.
For the bent benchmark suite, the performance improvement was mostly not
statistically significant, but the geomean averaged out to: -0.68%. The text size
increase was less than .1% for all bent benchmarks. The cmd/go text size increase
was 0.02% and the cmd/compile text size increase was .1%.
Fixes #29737
Change-Id: I892fa84bb07a947b3125ec8f25ed0e508bf2bdf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/226818
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Xiangdong Ji [Fri, 13 Dec 2019 10:30:29 +0000 (10:30 +0000)]
crypto/sha512: optimize sha512 by removing function literal
The function 'block' called indirectly via function literal 'blockGeneric' prevents
'gc' performing an accurate escape analysis to its arguments, that will result in
unnecessary heap object allocation and GC cost.
Consistent performance improvement to sha512 and its dependency packages are
observed on various arm64 servers if eliminating the function literal, especially for
small-sized benchmarks.
Jay Conrod [Fri, 3 Apr 2020 14:42:55 +0000 (10:42 -0400)]
cmd/go: report original module path in error parsing replaced go.mod
MVS reports an error when a go.mod file declares a module path that
doesn't match the path it was required with. If the module is a
replacement, its declared path may be the original path (preferred) or
the replacement path.
This CL makes the reported error a little more clear: the "required as"
path should be the original required path, not the replacement path.
Fixes #38220
Change-Id: I08b50a100679a447c8803cca1d1b32bc115ec1b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227097
Run-TryBot: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
David Chase [Sun, 3 Nov 2019 03:57:11 +0000 (23:57 -0400)]
cmd/compile: add logging for large (>= 128 byte) copies
For 1.15, unless someone really wants it in 1.14.
A performance-sensitive user thought this would be useful,
though "large" was not well-defined. If 128 is large,
there are 139 static instances of "large" copies in the compiler
itself.
Generate a CALLIND relocation only for indirect calls, not for
indirect jumps. In particular, the RET instruction is lowered to
JMP (LR), an indirect jump, and occurs frequently. The large
amount of spurious relocations causes the linker to do a lot of
extra work.
Change-Id: Ie0edc04609788f5a687fd00c22558c3f83867697
Reviewed-on: https://go-review.googlesource.com/c/go/+/227079
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: David Chase <drchase@google.com>
Change-Id: I93784e9a9efdd1531e3c342aa0899bf059da0ae1
Reviewed-on: https://go-review.googlesource.com/c/go/+/226983 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Keith Randall [Thu, 2 Apr 2020 19:00:50 +0000 (12:00 -0700)]
runtime/race: update some .syso files
Update race detector syso files for some platforms.
There's still 2 more to do, but they might take a while so I'm
mailing the ones I have now.
Note: some arm64 tests did not complete successfully due to out
of memory errors, but I suspect the .syso is correct.
Update #14481
Update #37485 (I think?)
Update #37355
Change-Id: I7e7e707a1fd7574855a538ba89dc11acc999c760
Reviewed-on: https://go-review.googlesource.com/c/go/+/226981
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Dmitri Shuralyov [Sun, 29 Mar 2020 03:10:20 +0000 (23:10 -0400)]
net/http: release callbacks after fetch promise completes
When the request context was canceled, the Transport.RoundTrip method
could return before the fetch promise resolved. This would cause the
success and failure callback functions to get called after they've
been released, which in turn prints a "call to released function"
error to the console.
Avoid that problem by releasing the callbacks after the fetch promise
completes, by moving the release calls into the callbacks themselves.
This way we can still return from the Transport.RoundTrip method as
soon as the context is canceled, without waiting on the promise to
resolve. If the AbortController is unavailable and it's not possible to
abort the fetch operation, the promise may take a long time to resolve.
For #38003.
Change-Id: Ied1475e31dcba101b3326521b0cd653dbb345e1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/226204 Reviewed-by: Johan Brandhorst <johan.brandhorst@gmail.com> Reviewed-by: Richard Musiol <neelance@gmail.com>
net: update ParseIP doc to say IPv4-mapped-IPv6 is supported
Change-Id: I49a79c07081cd8f12a3ffef21fd02a9a622a7eb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/226979 Reviewed-by: Ian Lance Taylor <iant@golang.org>
In the dev.link branch we continued developing the new object
file format support and the linker improvements described in
https://golang.org/s/better-linker .
The new object file is index-based and provides random access.
The linker maps the object files into read-only memory, and
accesses symbols on-demand using indices, as opposed to reading
all object files sequentially into the heap with the old format.
This work is not done yet. Currently we still convert back to the
old in-memory representation half way through the link process,
but only for symbols that are needed.
At this point, we think it is ready to enable the new object
files and new linker for early testing. Using the new object
files and the new linker, it reduces the linker's memory usage by
~10% and wall-clock run time by ~5%, and more to come.
Currently, both the old and new object file formats are supported.
The new format and new linker are used by default. For feature
gating, as a fallback, the old format and old linker can be used
by setting the compiler/assembler/linker's -go115newobj flag to
false. Note that the flag needs to be specified consistently to
all compilations, i.e.
[dev.link] all: merge branch 'master' into dev.link
The only conflict is a modify-deletion conflict in
cmd/link/internal/ld/link.go, where the old error reporter is
deleted in the new linker. Ported to
cmd/link/internal/ld/errors.go.
crypto/rsa: refactor RSA-PSS signing and verification
Cleaned up for readability and consistency.
There is one tiny behavioral change: when PSSSaltLengthEqualsHash is
used and both hash and opts.Hash were set, hash.Size() was used for the
salt length instead of opts.Hash.Size(). That's clearly wrong because
opts.Hash is documented to override hash.
This ports CL 226997 to the dev.link branch.
- The assembler part and old object file writing are unchanged.
- Changes to cmd/link are applied to cmd/oldlink.
- Add alignment field to new object files for the new linker.
Change-Id: Id00f323ae5bdd86b2709a702ee28bcaa9ba962f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/227025 Reviewed-by: Than McIntosh <thanm@google.com>
Joel Sing [Tue, 25 Feb 2020 15:48:24 +0000 (02:48 +1100)]
cmd/link: skip symbol references when looking for missing symbols
ErrorUnresolved attempts to find the missing symbol in another ABI,
in order to provide more friendly error messages. However, in doing so
it checks the same ABI and can find the symbol reference for the symbol
that it is currently reporting the unresolved error for. Avoid this by
ignoring SXREF symbols, which is the same behaviour used when linking
is performed.
fanzha02 [Thu, 19 Dec 2019 08:15:06 +0000 (08:15 +0000)]
cmd/asm: align an instruction or a function's address
Recently, the gVisor project needs an instruction's address
with 128 bytes alignment to fit the architecture requirement
for interrupt table.
This patch allows aligning an instruction's address to be
aligned to a specific value (2^n and in the range [8, 2048])
The main changes include:
1. Adds a new element in the FuncInfo structure defined in
cmd/internal/obj/link.go file to record the alignment
information.
2. Adds a new element in the Func structure defined in
cmd/internal/goobj/read.go file to read the alignment
information.
3. Adds the assembler support to align an intruction's offset
with a specific value (2^n and in the range [8, 2048]).
e.g. "PCALIGN $256" indicates that the next instruction should
be aligned to 256 bytes.
4. An instruction's alignment is relative to the start of the
function where this instruction is located, so the function's
address must be aligned to the same or coarser boundary.
Michael Munday [Wed, 1 Apr 2020 10:21:03 +0000 (03:21 -0700)]
cmd/compile: mark 'store multiple' as clobbering flags on s390x
Store multiple instructions can clobber flags on s390x when the
offset passed into the assembler is outside the range representable
with a signed 20 bit integer. This is because the assembler uses
the agfi instruction to implement the large offset. The assembler
could use a different sequence of instructions, but for now just
mark the instruction as 'clobberFlags' since this is risk free.
Noticed while investigating #38195.
No test yet since I'm not sure how to get this bug to trigger and
I haven't seen it affect real code.
Change-Id: I4a6ab96455a3ef8ffacb76ef0166b97eb40ff925
Reviewed-on: https://go-review.googlesource.com/c/go/+/226759
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Alex Brainman [Sat, 7 Mar 2020 00:08:06 +0000 (11:08 +1100)]
internal/syscall/windows: change WSAMsg.Name type
The problem was discovered while running
go test -a -short -gcflags=all=-d=checkptr -run=TestUDPConnSpecificMethods net
WSAMsg is type defined by Windows. And WSAMsg.Name could point to two
different structures for IPv4 and IPV6 sockets.
Currently WSAMsg.Name is declared as *syscall.RawSockaddrAny. But that
violates
(1) Conversion of a *T1 to Pointer to *T2.
rule of
https://golang.org/pkg/unsafe/#Pointer
When we convert *syscall.RawSockaddrInet4 into *syscall.RawSockaddrAny,
syscall.RawSockaddrInet4 and syscall.RawSockaddrAny do not share an
equivalent memory layout.
Same for *syscall.SockaddrInet6 into *syscall.RawSockaddrAny.
This CL changes WSAMsg.Name type to *syscall.Pointer. syscall.Pointer
length is 0, and that at least makes type checker happy.
After this change I was able to run
go test -a -short -gcflags=all=-d=checkptr std cmd
without type checker complaining.
Updates #34972
Change-Id: Ic5c2321c20abd805c687ee16ef6f643a2f8cd93f
Reviewed-on: https://go-review.googlesource.com/c/go/+/222457
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>