Michael Pratt [Tue, 7 Apr 2020 16:12:44 +0000 (12:12 -0400)]
runtime/pprof: try to use real stack in TestTryAdd
TestTryAdd is particularly brittle because it tests some real cases by
constructing fake sample stack frames. If those frames don't correctly
represent what the runtime would generate then they may fail to catch
regressions.
Instead, call runtime.Callers at the bottom of real function calls to
generate real frames as a base for truncation, etc in tests. Several of
these tests still have to fake parts of the frames to test the right
thing, but this is a bit less fragile.
Change-Id: I62522a9ded5544b06d1bf28550af5400f3af667b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227484
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Xiangdong Ji [Fri, 27 Mar 2020 11:04:21 +0000 (11:04 +0000)]
runtime: fix infinite callstack of cgo on arm64
This change adds CFA information to the assembly function 'crosscall1'
and reorgnizes its code to establish well-formed prologue and epilogue.
It will fix an infinite callstack issue when debugging cgo program with
GDB on arm64.
Brief root cause analysis:
GDB's aarch64 unwinder parses prologue to determine current frame's size
and previous PC&SP if CFA information is not available.
The unwinder parses the prologue of 'crosscall1' to determine a frame size
of 0x10, then turns to its next frame trying to compute its previous PC&SP
as they are not saved on current frame's stack as per its 'traditional frame
unwind' rules, which ends up getting an endless frame chain like:
[callee] : pc:<pc0>, sp:<sp0>
crosscall1: pc:<pc1>, sp:<sp0>+0x10
[caller] : pc:<pc1>, sp:<sp0>+0x10+0x10
[caller] : pc:<pc1>, sp:<sp0>+0x10+0x10+0x10
...
GDB fails to detect the 'caller' frame is same as 'crosscall1' and terminate
unwinding since SP increases everytime.
Fixes #37238
Change-Id: Ia6bd8555828541a3a61f7dc9b94dfa00775ec52a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226999
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Dan Scales [Thu, 14 Nov 2019 01:34:47 +0000 (17:34 -0800)]
runtime: static lock ranking for the runtime (enabled by GOEXPERIMENT)
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.
Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.
Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.
See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.
I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.
I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).
Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.
For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.
Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.
To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.
Fixes #38029
Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619 Reviewed-by: Dan Scales <danscales@google.com> Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Michael Munday [Fri, 6 Mar 2020 22:20:45 +0000 (22:20 +0000)]
cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.
Fixes #37316.
Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
cmd/compile: use MOVBQZX for OpAMD64LoweredHasCPUFeature
In the commit message of CL 212360, I wrote:
> This new intrinsic ... generates MOVB+TESTB+NE.
> (It is possible that MOVBQZX+TESTQ+NE would be better.)
I should have tested. MOVBQZX+TESTQ+NE does in fact appear to be better.
For the benchmark in #36196, on my machine:
name old time/op new time/op delta
FMA-8 0.86ns ± 6% 0.70ns ± 5% -18.79% (p=0.000 n=98+97)
NonFMA-8 0.61ns ± 5% 0.60ns ± 4% -0.74% (p=0.001 n=100+97)
Interestingly, these are both considerably faster than
the measurements I took a couple of months ago (1.4ns/2ns).
It appears that CL 219131 (clearing VZEROUPPER in asyncPreempt) helped a lot.
And FMA is now once again slower than NonFMA, although this change
helps it regain some ground.
Ruixin(Peter) Bao [Tue, 7 Apr 2020 13:20:08 +0000 (09:20 -0400)]
cmd/internal: add MVCIN instruction to s390x assembler
On s390x, we already have MVCIN opcode in asmz.go,
but we did not use it. This CL uses that opcode and adds MVCIN
instruction.
MVCIN instruction can be used to move data from one storage location
to another while reversing the order of bytes within the field. This
could be useful when transforming data from little-endian to big-endian.
Change-Id: Ifa1a911c0d3442f4a62f91f74ed25b196d01636b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227478 Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The darwin/arm port is removed in Go 1.15. Setting GOOS=darwin
GOARCH=arm will fail, therefore "go test cmd/link" on macOS will
fail (in non -short mode). Remove this test point.
Updates #37611.
Change-Id: Ia9531c4b4a6692a0c49153517af9fdddd1f3e0bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/227341 Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd/compile: lay out exit post-dominated blocks at the end
Complete a long-standing TODO in the code.
Exit blocks are cold code, so we lay them out at the end of the function.
Blocks that are post-dominated by exit blocks are also ipso facto exit blocks.
Treat them as such.
Implement using a simple loop, because there are generally very few exit blocks.
In addition to improved instruction cache, this empirically yields
better register allocation.
This also appears to improve compiler performance
(-0.15% geomean time/op, -1.20% geomean user time/op),
but that could just be alignment effects.
Compiler benchmarking hasn't been super reliably recently,
and there's no particular reason to think this should
speed up the compiler that much.
Multiple instances of testDWARF run in parallel, with a shared
backing store of the env input slice. Do modification of the
environment locally, instead of on the shared slice.
'go get' will now check absolute paths without wildcards the same way
it checks relative paths. modload.DirImportPath may be used for both
without converting path separators.
Fixes #38038
Change-Id: I453299898ece58f3b5002a5e80021d6bfe815fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/226857
Run-TryBot: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Michael Matloob <matloob@golang.org>
Lynn Boger [Mon, 30 Mar 2020 19:23:19 +0000 (15:23 -0400)]
cmd/compile: improve lowered moves and zeros for ppc64le
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.
This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:
If the final pass(es) are identical during ssa.html generation,
they are persisted in-memory as "pendingPhases" but never get
written as a column in the html. This change flushes those
in-memory phases.
Change-Id: I05c73d848b8f40dc864a18c733ca3f47b1eab54d
Reviewed-on: https://go-review.googlesource.com/c/go/+/227004 Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd/compile: refactor around HTMLWriter removing logger in favor of Func
Replace HTMLWriter's Logger field with a *Func. Implement Fatalf method
for HTMLWriter which gets the Frontend() from the Func and calls down
into it's Fatalf method, passing the msg and args along. Replace
remaining calls to the old Logger with calls to logging methods on
the Func.
cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.
Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.
This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.
One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)
This CL does only amd64. It is easy to extend to other architectures.
For the benchmark in #36196, on my machine, this offers a mild speedup.
name old time/op new time/op delta
FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96)
NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98)
Updates #15808
Updates #36196
Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
Dan Scales [Wed, 1 Apr 2020 03:24:05 +0000 (20:24 -0700)]
cmd/compile: allow mid-stack inlining when there is a cycle of recursion
We still disallow inlining for an immediately-recursive function, but allow
inlining if a function is in a recursion chain.
If all functions in the recursion chain are simple, then we could inline
forever down the recursion chain (eventually running out of stack on the
compiler), so we add a map to keep track of the functions we have
already inlined at a call site. We stop inlining when we reach a
function that we have already inlined in the recursive chain. Of course,
normally the inlining will have stopped earlier, because of the cost
function.
We could also limit the depth of inlining by a simple count (say, limit
max inlining of 10 at any given site). Would that limit other
opportunities too much?
Added a test in test/inline.go. runtime.BenchmarkStackCopyNoCache() is
also already a good test that triggers the check to stop inlining
when we reach the start of the recursive chain again.
For the bent benchmark suite, the performance improvement was mostly not
statistically significant, but the geomean averaged out to: -0.68%. The text size
increase was less than .1% for all bent benchmarks. The cmd/go text size increase
was 0.02% and the cmd/compile text size increase was .1%.
Fixes #29737
Change-Id: I892fa84bb07a947b3125ec8f25ed0e508bf2bdf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/226818
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Xiangdong Ji [Fri, 13 Dec 2019 10:30:29 +0000 (10:30 +0000)]
crypto/sha512: optimize sha512 by removing function literal
The function 'block' called indirectly via function literal 'blockGeneric' prevents
'gc' performing an accurate escape analysis to its arguments, that will result in
unnecessary heap object allocation and GC cost.
Consistent performance improvement to sha512 and its dependency packages are
observed on various arm64 servers if eliminating the function literal, especially for
small-sized benchmarks.
Jay Conrod [Fri, 3 Apr 2020 14:42:55 +0000 (10:42 -0400)]
cmd/go: report original module path in error parsing replaced go.mod
MVS reports an error when a go.mod file declares a module path that
doesn't match the path it was required with. If the module is a
replacement, its declared path may be the original path (preferred) or
the replacement path.
This CL makes the reported error a little more clear: the "required as"
path should be the original required path, not the replacement path.
Fixes #38220
Change-Id: I08b50a100679a447c8803cca1d1b32bc115ec1b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227097
Run-TryBot: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
David Chase [Sun, 3 Nov 2019 03:57:11 +0000 (23:57 -0400)]
cmd/compile: add logging for large (>= 128 byte) copies
For 1.15, unless someone really wants it in 1.14.
A performance-sensitive user thought this would be useful,
though "large" was not well-defined. If 128 is large,
there are 139 static instances of "large" copies in the compiler
itself.
Generate a CALLIND relocation only for indirect calls, not for
indirect jumps. In particular, the RET instruction is lowered to
JMP (LR), an indirect jump, and occurs frequently. The large
amount of spurious relocations causes the linker to do a lot of
extra work.
Change-Id: Ie0edc04609788f5a687fd00c22558c3f83867697
Reviewed-on: https://go-review.googlesource.com/c/go/+/227079
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: David Chase <drchase@google.com>
Change-Id: I93784e9a9efdd1531e3c342aa0899bf059da0ae1
Reviewed-on: https://go-review.googlesource.com/c/go/+/226983 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Keith Randall [Thu, 2 Apr 2020 19:00:50 +0000 (12:00 -0700)]
runtime/race: update some .syso files
Update race detector syso files for some platforms.
There's still 2 more to do, but they might take a while so I'm
mailing the ones I have now.
Note: some arm64 tests did not complete successfully due to out
of memory errors, but I suspect the .syso is correct.
Update #14481
Update #37485 (I think?)
Update #37355
Change-Id: I7e7e707a1fd7574855a538ba89dc11acc999c760
Reviewed-on: https://go-review.googlesource.com/c/go/+/226981
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Dmitri Shuralyov [Sun, 29 Mar 2020 03:10:20 +0000 (23:10 -0400)]
net/http: release callbacks after fetch promise completes
When the request context was canceled, the Transport.RoundTrip method
could return before the fetch promise resolved. This would cause the
success and failure callback functions to get called after they've
been released, which in turn prints a "call to released function"
error to the console.
Avoid that problem by releasing the callbacks after the fetch promise
completes, by moving the release calls into the callbacks themselves.
This way we can still return from the Transport.RoundTrip method as
soon as the context is canceled, without waiting on the promise to
resolve. If the AbortController is unavailable and it's not possible to
abort the fetch operation, the promise may take a long time to resolve.
For #38003.
Change-Id: Ied1475e31dcba101b3326521b0cd653dbb345e1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/226204 Reviewed-by: Johan Brandhorst <johan.brandhorst@gmail.com> Reviewed-by: Richard Musiol <neelance@gmail.com>
net: update ParseIP doc to say IPv4-mapped-IPv6 is supported
Change-Id: I49a79c07081cd8f12a3ffef21fd02a9a622a7eb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/226979 Reviewed-by: Ian Lance Taylor <iant@golang.org>
In the dev.link branch we continued developing the new object
file format support and the linker improvements described in
https://golang.org/s/better-linker .
The new object file is index-based and provides random access.
The linker maps the object files into read-only memory, and
accesses symbols on-demand using indices, as opposed to reading
all object files sequentially into the heap with the old format.
This work is not done yet. Currently we still convert back to the
old in-memory representation half way through the link process,
but only for symbols that are needed.
At this point, we think it is ready to enable the new object
files and new linker for early testing. Using the new object
files and the new linker, it reduces the linker's memory usage by
~10% and wall-clock run time by ~5%, and more to come.
Currently, both the old and new object file formats are supported.
The new format and new linker are used by default. For feature
gating, as a fallback, the old format and old linker can be used
by setting the compiler/assembler/linker's -go115newobj flag to
false. Note that the flag needs to be specified consistently to
all compilations, i.e.
[dev.link] all: merge branch 'master' into dev.link
The only conflict is a modify-deletion conflict in
cmd/link/internal/ld/link.go, where the old error reporter is
deleted in the new linker. Ported to
cmd/link/internal/ld/errors.go.
crypto/rsa: refactor RSA-PSS signing and verification
Cleaned up for readability and consistency.
There is one tiny behavioral change: when PSSSaltLengthEqualsHash is
used and both hash and opts.Hash were set, hash.Size() was used for the
salt length instead of opts.Hash.Size(). That's clearly wrong because
opts.Hash is documented to override hash.
This ports CL 226997 to the dev.link branch.
- The assembler part and old object file writing are unchanged.
- Changes to cmd/link are applied to cmd/oldlink.
- Add alignment field to new object files for the new linker.
Change-Id: Id00f323ae5bdd86b2709a702ee28bcaa9ba962f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/227025 Reviewed-by: Than McIntosh <thanm@google.com>
Joel Sing [Tue, 25 Feb 2020 15:48:24 +0000 (02:48 +1100)]
cmd/link: skip symbol references when looking for missing symbols
ErrorUnresolved attempts to find the missing symbol in another ABI,
in order to provide more friendly error messages. However, in doing so
it checks the same ABI and can find the symbol reference for the symbol
that it is currently reporting the unresolved error for. Avoid this by
ignoring SXREF symbols, which is the same behaviour used when linking
is performed.
fanzha02 [Thu, 19 Dec 2019 08:15:06 +0000 (08:15 +0000)]
cmd/asm: align an instruction or a function's address
Recently, the gVisor project needs an instruction's address
with 128 bytes alignment to fit the architecture requirement
for interrupt table.
This patch allows aligning an instruction's address to be
aligned to a specific value (2^n and in the range [8, 2048])
The main changes include:
1. Adds a new element in the FuncInfo structure defined in
cmd/internal/obj/link.go file to record the alignment
information.
2. Adds a new element in the Func structure defined in
cmd/internal/goobj/read.go file to read the alignment
information.
3. Adds the assembler support to align an intruction's offset
with a specific value (2^n and in the range [8, 2048]).
e.g. "PCALIGN $256" indicates that the next instruction should
be aligned to 256 bytes.
4. An instruction's alignment is relative to the start of the
function where this instruction is located, so the function's
address must be aligned to the same or coarser boundary.
Michael Munday [Wed, 1 Apr 2020 10:21:03 +0000 (03:21 -0700)]
cmd/compile: mark 'store multiple' as clobbering flags on s390x
Store multiple instructions can clobber flags on s390x when the
offset passed into the assembler is outside the range representable
with a signed 20 bit integer. This is because the assembler uses
the agfi instruction to implement the large offset. The assembler
could use a different sequence of instructions, but for now just
mark the instruction as 'clobberFlags' since this is risk free.
Noticed while investigating #38195.
No test yet since I'm not sure how to get this bug to trigger and
I haven't seen it affect real code.
Change-Id: I4a6ab96455a3ef8ffacb76ef0166b97eb40ff925
Reviewed-on: https://go-review.googlesource.com/c/go/+/226759
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Alex Brainman [Sat, 7 Mar 2020 00:08:06 +0000 (11:08 +1100)]
internal/syscall/windows: change WSAMsg.Name type
The problem was discovered while running
go test -a -short -gcflags=all=-d=checkptr -run=TestUDPConnSpecificMethods net
WSAMsg is type defined by Windows. And WSAMsg.Name could point to two
different structures for IPv4 and IPV6 sockets.
Currently WSAMsg.Name is declared as *syscall.RawSockaddrAny. But that
violates
(1) Conversion of a *T1 to Pointer to *T2.
rule of
https://golang.org/pkg/unsafe/#Pointer
When we convert *syscall.RawSockaddrInet4 into *syscall.RawSockaddrAny,
syscall.RawSockaddrInet4 and syscall.RawSockaddrAny do not share an
equivalent memory layout.
Same for *syscall.SockaddrInet6 into *syscall.RawSockaddrAny.
This CL changes WSAMsg.Name type to *syscall.Pointer. syscall.Pointer
length is 0, and that at least makes type checker happy.
After this change I was able to run
go test -a -short -gcflags=all=-d=checkptr std cmd
without type checker complaining.
Updates #34972
Change-Id: Ic5c2321c20abd805c687ee16ef6f643a2f8cd93f
Reviewed-on: https://go-review.googlesource.com/c/go/+/222457
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
We can initialize the runtime sigqueue packages on first use.
We don't require an explicit initialization step. So, remove it.
Change-Id: I484e02dc2c67395fd5584f35ecda2e28b37168df
Reviewed-on: https://go-review.googlesource.com/c/go/+/226540
Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
DWARF constant symbols were always marked and converted to
sym.Symbols when DWARF generation uses sym.Symbols. Now that the
DWARF generation uses the loader, no need to force-mark them.
Change-Id: Ia4032430697cfa901fb4b6d106a483973277ea0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226803 Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Jeremy Faller <jeremy@golang.org>
[dev.link] cmd/oldlink: decouple from goobj2 package
The new object file support in the old linker should not be used.
This is a minimal change that removes stuff from the old linker's
loader package, so that it decouples from the goobj2 package,
allowing the latter to evolve.
Keep the change local in the loader package, so most of the old
linker doesn't need to change. At this point I don't think we
want to make significant changes to the old linker.
Change-Id: I078c4cbb35dc4627c4b82f512a4aceec9b594925
Reviewed-on: https://go-review.googlesource.com/c/go/+/226800 Reviewed-by: Jeremy Faller <jeremy@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>
[dev.link] cmd/internal/goobj2: change StringRef encoding for better locality
Previously, StringRef is encoded as an offset pointing to
{ len, [len]byte }. This CL changes it to { len, offset }, where
offset points the bytes.
With the new format, reading a string header is just reading two
adjacent uint32s, without accessing the string table. This should
improve locality of object file reading.
Change-Id: Iec30708f9d9adb2f0242db6c4767c0f8e730f4df
Reviewed-on: https://go-review.googlesource.com/c/go/+/226797
Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Jeremy Faller <jeremy@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>
Keith Randall [Tue, 3 Mar 2020 18:07:32 +0000 (18:07 +0000)]
reflect: when Converting between float32s, don't lose signal NaNs
Trying this CL again, with a test that skips 387.
When converting from float32->float64->float32, any signal NaNs
get converted to quiet NaNs. Avoid that so using reflect.Value.Convert
between two float32 types keeps the signal bit of NaNs.
Skip the test on 387. I don't see any sane way of ensuring that a
float load + float store is faithful on that platform.
Bradford Lamson-Scribner [Sun, 29 Mar 2020 19:17:46 +0000 (13:17 -0600)]
cmd/compile: combine ssa.html columns with identical contents
Combine columns in ssa.html output if they are identical. There
can now be multiple titles per column which are all clickable to
expand and collapse their column. Give collapsed columns some
padding for better readability. Some of the work in this CL was
started by Josh Bleecher Snyder and mailed to me in order to
continue to completion.
Updates #37766
Change-Id: I313b0917dc1bafe1eb99d91798ea915e5bcfaae9
Reviewed-on: https://go-review.googlesource.com/c/go/+/226209 Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
cmd/compile, runtime: use more registers for amd64 write barrier calls
The compiler-inserted write barrier calls use a special ABI
for speed and to minimize the binary size impact.
runtime.gcWriteBarrier takes its args in DI and AX.
This change adds gcWriteBarrier wrapper functions,
varying only in the register used for the second argument.
(Allowing variation in the first argument doesn't offer improvements,
which is convenient, as it avoids quadratic API growth.)
This reduces the number of register copies.
The goals are reduced binary size via reduced register pressure/copies.
One downside to this change is that when the write barrier is on,
we may bounce through several different write barrier wrappers,
which is bad for the instruction cache.
Package runtime write barrier benchmarks for this change:
name old time/op new time/op delta
WriteBarrier-8 16.6ns ± 6% 15.6ns ± 6% -5.73% (p=0.000 n=97+99)
BulkWriteBarrier-8 4.37ns ± 7% 4.22ns ± 8% -3.45% (p=0.000 n=96+99)
However, I don't particularly trust these numbers.
I ran runtime.BenchmarkWriteBarrier multiple times as I rebased
this change, and noticed that the results have high variance
depending on the parent change, perhaps due to aligment.
This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std.
In CL 187657, I refactored constant conversion logic without realizing
that conversions between int/float and complex types are allowed for
constants (assuming the constant values are representable by the
destination type), but are never allowed for non-constant expressions.
This CL expands convertop to take an extra srcConstant parameter to
indicate whether the source expression is a constant; and if so, to
allow any numeric-to-numeric conversion. (Conversions of values that
cannot be represented in the destination type are rejected by
evconst.)
Fixes #38117.
Change-Id: Id7077d749a14c8fd910be38da170fa5254819f2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/226197
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Jay Conrod [Thu, 12 Mar 2020 22:56:40 +0000 (18:56 -0400)]
cmd/go: add support for GOPROXY fallback on unexpected errors
URLs in GOPROXY may now be separated with commas (,) or pipes (|). If
a request to a proxy fails with any error (including connection errors
and timeouts) and the proxy URL is followed by a pipe, the go command
will try the request with the next proxy in the list. If the proxy is
followed by a comma, the go command will only try the next proxy if
the error a 404 or 410 HTTP response.
The go command will determine how to connect to the checksum database
using the same logic. Before accessing the checksum database, the go
command sends a request to <proxyURL>/sumdb/<sumdb-name>/supported.
If a proxy responds with 404 or 410, or if any other error occurs and
the proxy URL in GOPROXY is followed by a pipe, the go command will
try the request with the next proxy. If all proxies respond with 404
or 410 or are configured to fall back on errors, the go command will
connect to the checksum database directly.
This CL does not change the default value or meaning of GOPROXY.
Fixes #37367
Change-Id: I35dd218823fe8cb9383e9ac7bbfec2cc8a358748
Reviewed-on: https://go-review.googlesource.com/c/go/+/226460
Run-TryBot: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
Bryan C. Mills [Mon, 30 Mar 2020 19:43:08 +0000 (15:43 -0400)]
os/signal: make TestStop resilient to initially-blocked signals
For reasons unknown, SIGUSR1 appears to be blocked at process start
for tests on the android-arm-corellium and android-arm64-corellium
builders. (This has been observed before, too: see CL 203957.)
Make the test resilient to blocked signals by always calling Notify
and waiting for potential signal delivery after sending any signal
that is not known to be unblocked.
Also remove the initial SIGWINCH signal from testCancel. The behavior
of an unhandled SIGWINCH is already tested in TestStop, so we don't
need to re-test that same case: waiting for an unhandled signal takes
a comparatively long time (because we necessarily don't know when it
has been delivered), so this redundancy makes the overall test binary
needlessly slow, especially since it is called from both TestReset and
TestIgnore.
Since each signal is always unblocked while we have a notification
channel registered for it, we don't need to modify any other tests:
TestStop and testCancel are the only functions that send signals
without a registered channel.
Fixes #38165
Updates #33174
Updates #15661
Change-Id: I215880894e954b62166024085050d34323431b63
Reviewed-on: https://go-review.googlesource.com/c/go/+/226461
Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Richard Miller [Tue, 31 Mar 2020 18:44:19 +0000 (19:44 +0100)]
runtime: skip gdb tests on Plan 9
There's no gdb on Plan 9.
Change-Id: Ibeb0fbd3c096a69181c19b1fb2bc6291612b6da3
Reviewed-on: https://go-review.googlesource.com/c/go/+/226657 Reviewed-by: David du Colombier <0intro@gmail.com>
Bryan C. Mills [Mon, 30 Mar 2020 18:04:08 +0000 (14:04 -0400)]
context: fix a flaky timeout in TestLayersTimeout
In CL 223019, I reduced the short timeout in the testLayers helper to
be even shorter than it was. That exposed a racy (time-dependent)
select later in the function, which failed in one of the slower
builders (android-386-emu).
Also streamline the test to make it easier to test with a very high -count flag:
- Run tests that sleep for shortDuration in parallel to reduce latency.
- Use shorter durations in examples to reduce test running time.
- Avoid mutating global state (in package math/rand) in testLayers.
After this change (but not before it),
'go test -run=TestLayersTimeout -count=100000 context' passes on my workstation.
Fixes #38161
Change-Id: Iaf4abe7ac308b2100d8828267cda9f4f8ae4be82
Reviewed-on: https://go-review.googlesource.com/c/go/+/226457 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Per RFC 8017, reject signatures which are not the same length as the RSA
modulus. This matches the behavior of SignPKCS1v15 which properly left pads
the signatures it generates to the size of the modulus.
Cherry Zhang [Fri, 27 Mar 2020 20:32:22 +0000 (16:32 -0400)]
[dev.link] cmd/link: set attributes atomically
Now concurrent relocsym may access symbols attributes
concurrently, causing data race when using the race detector. I
think it is still safe as we read/write on different bits, and
not write the same symbol's attributes from multiple goroutines,
so it will always reads the right value regardless whether the
write happens before or after, as long as the memory model is not
so insane.
Use atomic accesses to appease the race detector. It doesn't seem
to cost much, at least on x86.
Joel Sing [Mon, 30 Mar 2020 14:57:52 +0000 (01:57 +1100)]
cmd/asm,cmd/internal/obj/riscv: provide branch pseudo-instructions
Implement various branch pseudo-instructions for riscv64. These make it easier
to read/write assembly and will also make it easier for the compiler to generate
optimised code.
fanzha02 [Thu, 19 Dec 2019 08:15:06 +0000 (08:15 +0000)]
cmd/asm: align an instruction or a function's address
Recently, the gVisor project needs an instruction's address
with 128 bytes alignment and a function's start address with
2K bytes alignment to fit the architecture requirement for
interrupt table.
This patch allows aligning the address of an instruction to be
aligned to a specific value (2^n and not higher than 2048) and
the address of a function to be 2048 bytes.
The main changes include:
1. Adds ALIGN2048 flag to align a function's address with
2048 bytes.
e.g. "TEXT ·Add(SB),NOSPLIT|ALIGN2048" indicates that the
address of Add function should be aligned to 2048 bytes.
2. Adds a new element in the FuncInfo structure defined in
cmd/internal/obj/link.go file to record the alignment
information.
3. Adds a new element in the Func structure defined in
cmd/internal/goobj/read.go file to read the alignment
information.
4. Because go introduces a new object file format, also add
a new element in the FuncInfo structure defined in
cmd/internal/goobj2/funcinfo.go to record the alignment
information.
5. Adds the assembler support to align an intruction's offset
with a specific value (2^n and not higher than 2048).
e.g. "PCALIGN $256" indicates that the next instruction should
be aligned to 256 bytes.
This CL also adds a test.
Change-Id: I31cfa6fb5bc35dee2c44bf65913e90cddfcb492a
Reviewed-on: https://go-review.googlesource.com/c/go/+/212767 Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Bryan C. Mills [Mon, 30 Mar 2020 19:01:33 +0000 (15:01 -0400)]
os/signal: in TestStop, skip the final "unexpected signal" check for SIGUSR1 on Android
In CL 226138, I updated TestStop to have more uniform behavior for its signals.
However, that test seems to always fail for SIGUSR1 on the Android ARM builders.
I'm not sure what's special about Android for this particular case,
but let's skip the test to unbreak the builders while I investigate.
For #38165
Updates #33174
Change-Id: I35a70346cd9757a92acd505a020bf95e6871405c
Reviewed-on: https://go-review.googlesource.com/c/go/+/226458
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Jeremy Faller [Tue, 17 Mar 2020 14:24:40 +0000 (10:24 -0400)]
[dev.link] cmd/link: parallelize asmb on amd64
Introduces a parallel OutBuf implementation, and POC on amd64. Due to
some of the weird behaviors I saw on MacOS (SIGBUS while calling msync),
I will wait for feedback to port to other architectures.
On my mac, sped up Asmb by ~78% for cmd/compile (below). Will likely
have an appreciable speedup on kubelet benchmark.
Giovanni Bajo [Sun, 29 Mar 2020 12:21:12 +0000 (14:21 +0200)]
cmd/compile: avoid zero extensions after 32-bit shifts
zeroUpper32Bits wasn't checking for shift-extension ops. This would not
check shifts that were marking as bounded by prove (normally, shifts
are wrapped in a sequence that ends with an ANDL, and zeroUpper32Bits
would see the ANDL).
This produces no changes on generated output right now, but will be
important once CL196679 lands because many shifts will be marked
as bounded, and lower will stop generating the masking code sequence
around them.
Change-Id: Iaea94acc5b60bb9a5021c9fb7e4a1e2e5244435e
Reviewed-on: https://go-review.googlesource.com/c/go/+/226338 Reviewed-by: Keith Randall <khr@golang.org>
Than McIntosh [Mon, 30 Mar 2020 14:06:54 +0000 (10:06 -0400)]
[dev.link] cmd/link: move the wavefront past addexport()
Reorganize the linker phase ordering so that addexport() runs before
loadlibfull. In previous CLs addexport() was changed to use loader
APIs but then copy back its work into sym.Symbol, so this change
removes the copying/shim code in question.
Tobias Klauser [Sun, 29 Mar 2020 22:38:09 +0000 (00:38 +0200)]
cmd/cgo, misc/cgo: only cache anonymous struct typedefs with parent name
CL 181857 broke the translation of certain C types using cmd/cgo -godefs
because it stores each typedef, array and qualified type with their
parent type name in the translation cache.
Fix this by only considering the parent type for typedefs of anonymous
structs which is the only case where types might become ambiguous.
Updates #31891
Fixes #37479
Fixes #37621
Change-Id: I301a749ec89585789cb0d213593bb8b7341beb88
Reviewed-on: https://go-review.googlesource.com/c/go/+/226341
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Than McIntosh [Mon, 30 Mar 2020 14:05:23 +0000 (10:05 -0400)]
[dev.link] cmd/link: run setArchSyms earlier
Add some new loader.Sym equivalents to the archSyms struct so that we
can run setArchSyms earlier in the pipeline, and add a "mode" to
setArchSyms to control whether it should create loader.Sym symbols or
their *sym.Symbol equivalents.
These change needed for a subsequent patch in which addexport() is run
earlier as well
Bryan C. Mills [Fri, 27 Mar 2020 19:04:09 +0000 (15:04 -0400)]
os/signal: rework test timeouts and concurrency
Use a uniform function (named “quiesce”) to wait for possible signals
in a way that gives the kernel many opportunities to deliver them.
Simplify channel usage and concurrency in stress tests.
Use (*testing.T).Deadline instead of parsing the deadline in TestMain.
In TestStop, sleep forever in a loop if we expect the test to die from
a signal. That should reduce the flakiness of TestNohup, since
TestStop will no longer spuriously pass when run as a subprocess of
TestNohup.
Since independent signals should not interfere, run the different
signals in TestStop in parallel when testing in short mode.
Since TestNohup runs TestStop as a subprocess, and TestStop needs to
wait many times for signals to quiesce, run its test subprocesses
concurrently and in short mode — reducing the latency of that test by
more than a factor of 2.
The above two changes reduce the running time of TestNohup on my
workstation to ~345ms, making it possible to run much larger counts of
the test in the same amount of wall time. If the test remains flaky
after this CL, we can spend all or part of that latency improvement on
a longer settle time.
Updates #33174
Change-Id: I09206f213d8c1888b50bf974f965221a5d482419
Reviewed-on: https://go-review.googlesource.com/c/go/+/226138
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>