Keith Randall [Tue, 18 Aug 2015 17:26:28 +0000 (10:26 -0700)]
[dev.ssa] cmd/compile: add decompose pass
Decompose breaks compound objects up into pieces that can be
operated on by the target architecture. The decompose pass only
does phi ops, the rest is done by the rewrite rules in generic.rules.
Compound objects include strings,slices,interfaces,structs,arrays.
Arrays aren't decomposed because of indexing (we could support
constant indexes, but dynamic indexes can't be handled using SSA).
Structs will come in a subsequent CL.
TODO: after this pass we have lost the association between, e.g.,
a string's pointer and its size. It would be nice if we could keep
that information around for debugging info somehow.
Keith Randall [Tue, 18 Aug 2015 21:17:30 +0000 (14:17 -0700)]
[dev.ssa] cmd/compile: used Bounded field to fix empty range loops
for i, v := range a {
}
Walk converts this to a regular for loop, like this:
for i := 0, p := &a[0]; i < len(a); i++, p++ {
v := *p
}
Unfortunately, &a[0] fails its bounds check when a is
the empty slice (or string). The old compiler gets around this
by marking &a[0] as Bounded, meaning "don't emit bounds checks
for this index op". This change makes SSA honor that same mark.
The SSA compiler hasn't implemented bounds check panics yet,
so the failed bounds check just causes the current routine
to return immediately.
Adds support for high multiply which is used by the frontend when
rewriting const division. The frontend currently only does this for 8,
16, and 32 bit integer arithmetic.
Change-Id: I9b6c6018f3be827a50ee6c185454ebc79b3094c8
Reviewed-on: https://go-review.googlesource.com/13696 Reviewed-by: Keith Randall <khr@golang.org>
David Chase [Wed, 12 Aug 2015 20:38:11 +0000 (16:38 -0400)]
[dev.ssa] cmd/compile: first unoptimized cut at adding FP support
Added F32 and F64 load, store, and addition.
Added F32 and F64 multiply.
Added F32 and F64 subtraction and division.
Added X15 to "clobber" for FP sub/div
Added FP constants
Added separate FP test in gc/testdata
Change-Id: Ifa60dbad948a40011b478d9605862c4b0cc9134c
Reviewed-on: https://go-review.googlesource.com/13612 Reviewed-by: Keith Randall <khr@golang.org>
Keith Randall [Sat, 15 Aug 2015 04:47:20 +0000 (21:47 -0700)]
[dev.ssa] cmd/compile/internal/ssa: Use explicit size for store ops
Using the type of the store argument is not safe, it may change
during rewriting, giving us the wrong store width.
(Store ptr (Trunc32to16 val) mem)
This should be a 2-byte store. But we have the rule:
(Trunc32to16 x) -> x
So if the Trunc rewrite happens before the Store -> MOVW rewrite,
then the Store thinks that the value it is storing is 4 bytes
in size and uses a MOVL. Bad things ensue.
Fix this by encoding the store width explicitly in the auxint field.
In general, we can't rely on the type of arguments, as they may
change during rewrites. The type of the op itself (as used by
the Load rules) is still ok to use.
[dev.ssa] cmd/compile: make failed nil checks panic
Introduce pseudo-ops PanicMem and LoweredPanicMem.
PanicMem could be rewritten directly into MOVL
during lowering, but then we couldn't log nil checks.
With this change, runnable nil check tests pass:
GOSSAPKG=main go run run.go -- nil*.go
Compiler output nil check tests fail:
GOSSAPKG=p go run run.go -- nil*.go
This is due to several factors:
* SSA has improved elimination of unnecessary nil checks.
* SSA is missing elimination of implicit nil checks.
* SSA is missing extra logging about why nil checks were removed.
I'm not sure how best to resolve these failures,
particularly in a world in which the two backends
will live side by side for some time.
For now, punt on the problem.
Change-Id: Ib2ca6824551671f92e0e1800b036f5ca0905e2a3
Reviewed-on: https://go-review.googlesource.com/13474 Reviewed-by: Keith Randall <khr@golang.org>
[dev.ssa] cmd/compile: fix function call memory accounting
We were not recording function calls as
changing the state of memory.
As a result, the scheduler was not aware that
storing values to the stack in order to make a
function call must happen *after* retrieving
results from the stack from a just-completed
function call.
This fixes the container/ring tests.
This was my first experience debugging an issue
using the HTML output. I'm feeling quite
pleased with it.
Change-Id: I9e8276846be9fd7a60422911b11816c5175e3d0a
Reviewed-on: https://go-review.googlesource.com/13560 Reviewed-by: Keith Randall <khr@golang.org>
Todd Neal [Fri, 7 Aug 2015 01:13:27 +0000 (20:13 -0500)]
[dev.ssa] cmd/compile/ssa: don't nil check phis with non-nil arguments
Move the known-non-nil scan outside the work loop to resolve an issue
with values that were declared outside the block being operated on.
Also consider phis whose arguments are all non-nil, as non-nil.
Change-Id: I4d5b840042de9eb181f2cb918f36913fb5d517a2
Reviewed-on: https://go-review.googlesource.com/13441 Reviewed-by: Keith Randall <khr@golang.org>
[dev.ssa] cmd/compile: detect rewrite loops of length > 1
Use a version of Floyd's cycle finding algorithm,
but advance by 1 and 1/2 steps per cycle rather
than by 1 and 2. It is simpler and should be cheaper
in the normal, acyclic case.
This should fix the 386 and arm builds,
which are currently hung.
Change-Id: If8bd443011b28a5ecb004a549239991d3dfc862b
Reviewed-on: https://go-review.googlesource.com/13473 Reviewed-by: Keith Randall <khr@golang.org>
Keith Randall [Mon, 10 Aug 2015 18:10:53 +0000 (11:10 -0700)]
[dev.ssa] cmd/compile/internal/ssa: enforce load-store ordering in scheduler
We must make sure that all loads that use a store are scheduled
before the next store. Add additional dependency edges to the
value graph to enforce this constraint.
Keith Randall [Mon, 3 Aug 2015 19:33:03 +0000 (12:33 -0700)]
[dev.ssa] cmd/compile/internal/ssa: Fix scheduler
The DFS scheduler doesn't do the right thing. If a Value x is used by
more than one other Value, then x is put into the DFS queue when
its first user (call it y) is visited. It is not removed and reinserted
when the second user of x (call it z) is visited, so the dependency
between x and z is not respected. There is no easy way to fix this with
the DFS queue because we'd have to rip values out of the middle of the
DFS queue.
The new scheduler works from the end of the block backwards, scheduling
instructions which have had all of their uses already scheduled.
A simple priority scheme breaks ties between multiple instructions that
are ready to schedule simultaneously.
Keep track of whether we've scheduled or not, and make print() use
the scheduled order if we have.
Fix some shift tests that this change tickles. Add unsigned right shift tests.
Reworks nilcheck to be performed by a depth first traversal of the
dominator tree, keeping an updated map of the values that have been
nil-checked during the traversal.
[dev.ssa] cmd/compile: use Copy instead of ConvNop
The existing backend simply elides OCONVNOP.
There's no reason for us to do any differently.
Rather than insert ConvNops and then rewrite them
away, stop creating them in the first place.
Change-Id: I4bcbe2229fcebd189ae18df24f2c612feb6e215e
Reviewed-on: https://go-review.googlesource.com/12810 Reviewed-by: Keith Randall <khr@golang.org>
runtime/cgo: fix darwin/amd64 signal handling setup
Was not allocating space for the frame above sigpanic,
nor was it pushing the LR into the right place.
Because traceback past sigpanic only needs the
LR for faulting leaves, this was not noticed too much.
But it did break the sync/atomic nil deref tests.
Change-Id: Icba53fffa193423aab744c37f21ee893ce2ee3ac
Reviewed-on: https://go-review.googlesource.com/12926 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Convert shift ops to also encode the size of the shift amount.
Change signed right shift from using CMOV to using bit twiddles.
It is a little bit better (5 instructions instead of 4, but fewer
bytes and slightly faster code). It's also a bit faster than
the 4-instruction branch version, even with a very predictable
branch. As tested on my machine, YMMV.
David Chase [Thu, 30 Jul 2015 16:31:18 +0000 (12:31 -0400)]
cmd/compile: add case for ODOTTYPE to escwalk
ODOTTYPE should be treated a whole lot like ODOT,
but it was missing completely from the switch in
escwalk and thus escape status did not propagate
to fields.
Since interfaces are required to trigger this bug,
the test was added to escape_iface.go.
runtime: change arm software div/mod call sequence not to modify stack
Instead of pushing the denominator argument on the stack,
the denominator is now passed in m.
This fixes a variety of bugs related to trying to take stack traces
backwards from the middle of the software div/mod routines.
Some of those bugs have been kludged around in the past,
but others have not. Instead of trying to patch up after breaking
the stack, this CL stops breaking the stack.
This is an update of https://golang.org/cl/19810043,
which was rolled back in https://golang.org/cl/20350043.
The problem in the original CL was that there were divisions
at bad times, when m was not available. These were divisions
by constant denominators, either in C code or in assembly.
The Go compiler knows how to generate division by multiplication
for constant denominators, but the C compiler did not.
There is no longer any C code, so that's taken care of.
There was one problematic DIV in runtime.usleep (assembly)
but https://golang.org/cl/12898 took care of that one.
So now this approach is safe.
Reject DIV/MOD in NOSPLIT functions to keep them from
coming back.
Fixes #6681.
Fixes #6699.
Fixes #10486.
Change-Id: I09a13c76ad08ba75b3bd5d46a3eb78e66a84ab38
Reviewed-on: https://go-review.googlesource.com/12899 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Thu, 30 Jul 2015 05:04:09 +0000 (22:04 -0700)]
cmd/cgo: discard trailing zero-sized fields in a non-empty C struct
In order to fix issue #9401 the compiler was changed to add a padding
byte to any non-empty Go struct that ends in a zero-sized field. That
causes the Go version of such a C struct to have a different size than
the C struct, which can considerable confusion. Change cgo so that it
discards any such zero-sized fields, so that the Go and C structs are
the same size.
This is a change from previous releases, in that it used to be
possible to refer to a zero-sized trailing field (by taking its
address), and with this change it no longer is. That is unfortunate,
but something has to change. It seems better to visibly break
programs that do this rather than to silently break programs that rely
on the struct sizes being the same.
runtime: replace divide with multiply in runtime.usleep on arm
We want to adjust the DIV calling convention to use m,
and usleep can be called without an m, so switch to a
multiplication by the reciprocal (and test).
Step toward a fix for #6699 and #10486.
Change-Id: Iccf76a18432d835e48ec64a2fa34a0e4d6d4b955
Reviewed-on: https://go-review.googlesource.com/12898 Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd/internal/obj/arm: fix line numbers after constant pool
If a function is large enough to need to flush the constant pool
mid-function, the line number assignment code was forcing the
line numbers not just for the constant pool but for all the instructions
that follow it. This made the line number information completely
wrong for all but the beginning of large functions on arm.
Same problem in code copied into arm64.
This broke runtime/trace's TestTraceSymbolize.
Fixes arm build.
Change-Id: I84d9fb2c798c4085f69b68dc766ab4800c7a6ca4
Reviewed-on: https://go-review.googlesource.com/12894 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
This allows running a cross-compile like
GOOS=darwin GOARCH=arm go build std
to check that everything builds.
Otherwise there is a redefinition error because both
root_nocgo_darwin.go and root_darwin_armx.go
supply initSystemRoots.
Change-Id: Ic95976b2b698d28c629bfc93d8dac0048b023578
Reviewed-on: https://go-review.googlesource.com/12897 Reviewed-by: Ian Lance Taylor <iant@golang.org>
net: allow longer timeout in dialClosedPort test on windows
The test expects the dial to take 1.0 seconds
on Windows and allows it to go to 1.095 seconds.
That's far too optimistic.
Recent failures are reporting roughly 1.2 seconds.
Let it have 1.5.
Change-Id: Id69811ccb65bf4b4c159301a2b4767deb6ee8d28
Reviewed-on: https://go-review.googlesource.com/12895 Reviewed-by: Ian Lance Taylor <iant@golang.org>
math/rand: warn against using package for security-sensitive work
Urge users of math/rand to consider using crypto/rand when doing
security-sensitive work.
Related to issue #11871. While we haven't reached consensus on how
to make the package inherently safer, everyone agrees that the docs
for math/rand can be improved.
Change-Id: I576a312e51b2a3445691da6b277c7b4717173197
Reviewed-on: https://go-review.googlesource.com/12900 Reviewed-by: Rob Pike <r@golang.org>
cmd/compile: fix uninitialized memory during type switch assertE2I2
Fixes arm64 builder crash.
The bug is possible on all architectures; you just have to get lucky
and hit a preemption or a stack growth on entry to assertE2I2.
The test stacks the deck.
Change-Id: I8419da909b06249b1ad15830cbb64e386b6aa5f6
Reviewed-on: https://go-review.googlesource.com/12890 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Rob Pike <r@golang.org>
The skips added in CL 12579, based on incorrect time stamps,
should be sufficient to identify and exclude all the time-related
flakiness on these systems.
runtime/trace: record event sequence numbers explicitly
Nearly all the flaky failures we've seen in trace tests have been
due to the use of time stamps to determine relative event ordering.
This is tricky for many reasons, including:
- different cores might not have exactly synchronized clocks
- VMs are worse than real hardware
- non-x86 chips have different timer resolution than x86 chips
- on fast systems two events can end up with the same time stamp
Stop trying to make time reliable. It's clearly not going to be for Go 1.5.
Instead, record an explicit event sequence number for ordering.
Using our own counter solves all of the above problems.
The trace still contains time stamps, of course. The sequence number
is just used for ordering.
Should alleviate #10554 somewhat. Then tickDiv can be chosen to
be a useful time unit instead of having to be exact for ordering.
Separating ordering and time stamps lets the trace parser diagnose
systems where the time stamp order and actual order do not match
for one reason or another. This CL adds that check to the end of
trace.Parse, after all other sequence order-based checking.
If that error is found, we skip the test instead of failing it.
Putting the check in trace.Parse means that cmd/trace will pick
up the same check, refusing to display a trace where the time stamps
do not match actual ordering.
Using net/http's BenchmarkClientServerParallel4 on various CPU counts,
not tracing vs tracing:
Keith Randall [Tue, 28 Jul 2015 23:04:50 +0000 (16:04 -0700)]
[dev.ssa] cmd/compile/internal/ssa: implement lots of small (<8byte) ops.
Lots and lots of ops!
Also XOR for good measure.
Add a pass to the compiler generator to check that all of the
architecture-specific opcodes are handled by genValue. We will
catch any missing ones if we come across them during compilation,
but probably better to catch them statically.
The layout code has to date insisted on stack frames that are 16-aligned
including the saved LR, and it ensured this by growing the frame itself.
This breaks code that refers to values near the top of the frame by positive
offset from SP, and in general it's too magical: if you see TEXT xxx, $N,
you expect that the frame size is actually N, not sometimes N and sometimes N+8.
This led to a serious bug in the compiler where ambiguously live values
were not being zeroed correctly, which in turn triggered an assertion
in the GC about finding only valid pointers. The compiler has been
fixed to always emit aligned frames, and the hand-written assembly
has also been fixed.
Now that everything is aligned, make unaligned an error instead of
something to "fix" silently.
The nosplit stack overflow checks were confused about morestack.
The comment about not having correct SP information at the call
to morestack was true, but that was a real bug, not something to
work around. I fixed that problem in CL 12144. With that fixed,
no need to special-case morestack in the way done here.
This cleanup and simplification of the code was the first step
to fixing a bug that happened when I started working on the
arm64 frame size adjustments, but the cleanup was sufficient
to make the bug go away.
runtime, reflect: use correctly aligned stack frame sizes on arm64
arm64 requires either no stack frame or a frame with a size that is 8 mod 16
(adding the saved LR will make it 16-aligned).
The cmd/internal/obj/arm64 has been silently aligning frames, but it led to
a terrible bug when the compiler and obj disagreed on the frame size,
and it's just generally confusing, so we're going to make misaligned frames
an error instead of something that is silently changed.
This CL prepares by updating assembly files.
Note that the changes in this CL are already being done silently by
cmd/internal/obj/arm64, so there is no semantic effect here,
just a clarity effect.
If the compiler doesn't do it, cmd/internal/obj/arm64 will,
and that will break the zeroing of ambiguously live values
done in zerorange, which in turn produces uninitialized
pointer cells that the GC trips over.
This adds a GCCPUFraction field to MemStats that reports the
cumulative fraction of the program's execution time spent in the
garbage collector. This is equivalent to the utilization percent shown
in the gctrace output and makes this available programmatically.
This does make one small effect on the gctrace output: we now report
the duration of mark termination up to just before the final
start-the-world, rather than up to just after. However, unlike
stop-the-world, I don't believe there's any way that start-the-world
can block, so it should take negligible time.
While there are many statistics one might want to expose via MemStats,
this is one of the few that will undoubtedly remain meaningful
regardless of future changes to the memory system.
The diff for this change is larger than the actual change. Mostly it
lifts the code for computing the GC CPU utilization out of the
debug.gctrace path.
Currently we only capture GC phase transition times if
debug.gctrace>0, but we're about to compute GC CPU utilization
regardless of whether debug.gctrace is set, so we need these
regardless of debug.gctrace.
runtime: avoid race between SIGPROF traceback and stack barriers
The following sequence of events can lead to the runtime attempting an
out-of-bounds access on a stack barrier slice:
1. A SIGPROF comes in on a thread while the G on that thread is in
_Gsyscall. The sigprof handler calls gentraceback, which saves a
local copy of the G's stkbar slice. Currently the G has no stack
barriers, so this slice is empty.
2. On another thread, the GC concurrently scans the stack of the
goroutine being profiled (it considers it stopped because it's in
_Gsyscall) and installs stack barriers.
3. Back on the sigprof thread, gentraceback comes across a stack
barrier in the stack and attempts to look it up in its (zero
length) copy of G's old stkbar slice, which causes an out-of-bounds
access.
This commit fixes this by adding a simple cas spin to synchronize the
SIGPROF handler with stack barrier insertion.
In general I would prefer that this synchronization be done through
the G status, since that's how stack scans are otherwise synchronized,
but adding a new lock is a much smaller change and G statuses are full
of subtlety.