Keith Randall [Mon, 24 Aug 2015 04:14:25 +0000 (21:14 -0700)]
[dev.ssa] cmd/compile: make sure to keep offset and sym of MOV opcodes.
MOVXload and MOVXstore opcodes have both an auxint offset
and an aux offset (a symbol name, like a local or arg or global).
Make sure we keep those values during rewrites.
This aids in making sense of the aggregate set of work outstanding.
Interest in the details of any particular implementation failure
is better handled locally anyway.
In my local tree, running make.bash after this CL yields:
14.85% 1811 SSA unimplemented: unhandled expr SLICEARR
13.84% 1687 SSA unimplemented: unhandled expr CALLINTER
11.84% 1444 SSA unimplemented: unhandled stmt RETJMP
10.24% 1249 SSA unimplemented: unhandled expr EFACE
8.52% 1039 SSA unimplemented: unhandled expr SLICE
4.92% 600 SSA unimplemented: local variable with class PAUTO,heap unimplemented
4.90% 598 SSA unimplemented: unhandled expr SLICESTR
3.91% 477 SSA unimplemented: local variable with class PFUNC unimplemented
3.45% 421 SSA unimplemented: not lowered: IMake INTER PTR64 PTR64
3.42% 417 SSA unimplemented: unhandled expr APPEND
3.21% 391 SSA unimplemented: unhandled expr CLOSUREVAR
3.06% 373 SSA unimplemented: unhandled stmt DEFER
3.04% 371 SSA unimplemented: unhandled stmt AS2DOTTYPE
1.61% 196 SSA unimplemented: unhandled expr DOTTYPE
1.56% 190 SSA unimplemented: not lowered: Load STRUCT PTR64 mem
0.79% 96 SSA unimplemented: not lowered: StringMake STRING PTR64 UINTPTR
0.69% 84 SSA unimplemented: unhandled binary op NE FLOAT64
0.53% 65 SSA unimplemented: unhandled expr STRUCTLIT
0.50% 61 SSA unimplemented: not lowered: SliceMake ARRAY PTR64 UINTPTR UINTPTR
0.45% 55 SSA unimplemented: zero for type float64 not implemented
0.44% 54 SSA unimplemented: unhandled addr CLOSUREVAR
0.38% 46 SSA unimplemented: unhandled binary op EQ FLOAT64
0.35% 43 SSA unimplemented: unhandled binary op LT FLOAT64
0.34% 42 SSA unimplemented: unhandled len(map)
0.33% 40 SSA unimplemented: unhandled stmt FALL
0.23% 28 SSA unimplemented: CONVNOP closure
0.21% 25 SSA unimplemented: local variable with class PPARAM,heap unimplemented
0.21% 25 SSA unimplemented: unhandled binary op GT FLOAT64
0.18% 22 SSA unimplemented: unhandled OCONV FLOAT32 -> FLOAT64
0.18% 22 SSA unimplemented: unhandled expr REAL
0.16% 20 SSA unimplemented: unhandled stmt PROC
0.16% 19 SSA unimplemented: unhandled closure arg
0.15% 18 SSA unimplemented: unhandled OCONV INT64 -> FLOAT64
0.12% 15 SSA unimplemented: unhandled expr CFUNC
0.10% 12 SSA unimplemented: unhandled OCONV UINT64 -> FLOAT64
0.09% 11 SSA unimplemented: unhandled OLITERAL 4
0.09% 11 SSA unimplemented: unhandled expr IMAG
0.07% 9 SSA unimplemented: unhandled binary op GE FLOAT64
0.07% 9 SSA unimplemented: unhandled binary op MINUS FLOAT64
0.06% 7 SSA unimplemented: unhandled OCONV FLOAT64 -> FLOAT32
0.06% 7 SSA unimplemented: unhandled binary op NE FLOAT32
0.06% 7 SSA unimplemented: variable address class 5 not implemented
0.05% 6 SSA unimplemented: not lowered: Load COMPLEX128 PTR64 mem
0.05% 6 SSA unimplemented: unhandled expr SLICE3ARR
0.04% 5 SSA unimplemented: unhandled binary op LE FLOAT64
0.03% 4 SSA unimplemented: unhandled OCONV UINTPTR -> FLOAT64
0.03% 4 SSA unimplemented: unhandled binary op EQ COMPLEX128
0.03% 4 SSA unimplemented: unhandled binary op EQ FLOAT32
0.03% 4 SSA unimplemented: unhandled expr COMPLEX
0.02% 3 SSA unimplemented: local variable with class PPARAMOUT,heap unimplemented
0.02% 3 SSA unimplemented: not lowered: Load ARRAY PTR64 mem
0.02% 3 SSA unimplemented: unhandled OCONV INT32 -> FLOAT64
0.02% 3 SSA unimplemented: unhandled OCONV INT64 -> FLOAT32
0.02% 3 SSA unimplemented: unhandled expr SLICE3
0.02% 2 SSA unimplemented: unhandled OCONV COMPLEX64 -> COMPLEX128
0.02% 2 SSA unimplemented: unhandled OCONV FLOAT64 -> INT64
0.02% 2 SSA unimplemented: unhandled OCONV FLOAT64 -> UINT64
0.02% 2 SSA unimplemented: unhandled OCONV INT -> FLOAT64
0.02% 2 SSA unimplemented: unhandled OCONV UINT64 -> FLOAT32
0.02% 2 SSA unimplemented: unhandled binary op EQ COMPLEX64
0.02% 2 SSA unimplemented: unhandled binary op MINUS FLOAT32
0.02% 2 SSA unimplemented: zero for type complex128 not implemented
0.02% 2 SSA unimplemented: zero for type complex64 not implemented
0.02% 2 SSA unimplemented: zero for type float32 not implemented
0.01% 1 SSA unimplemented: not lowered: EqFat BOOL INTER INTER
0.01% 1 SSA unimplemented: not lowered: Store mem UINTPTR COMPLEX128 mem
0.01% 1 SSA unimplemented: unhandled OCONV UINT32 -> FLOAT64
0.01% 1 SSA unimplemented: unhandled cap(chan)
0.01% 1 SSA unimplemented: unhandled expr ARRAYLIT
0.01% 1 SSA unimplemented: unhandled expr PLUS
0.01% 1 SSA unimplemented: unhandled stmt CHECKNIL
Change-Id: I43474fe6d6ec22a9f57239090136f6e97eebfdf2
Reviewed-on: https://go-review.googlesource.com/13848 Reviewed-by: Keith Randall <khr@golang.org>
[dev.ssa] cmd/compile: support spilling and loading flags
This CL takes a simple approach to spilling and loading flags.
We never spill. When a load is needed, we recalculate,
loading the arguments as needed.
This is simple and architecture-independent.
It is not very efficient, but as of this CL,
there are fewer than 200 flag spills during make.bash.
This was tested by manually reverting CLs 13813 and 13843,
causing SETcc, MOV, and LEA instructions to clobber flags,
which dramatically increases the number of flags spills.
With that done, all stdlib tests that used to pass
still pass.
For future reference, here are some other, more efficient
amd64-only schemes that we could adapt in the future if needed.
(1) Spill exactly the flags needed.
For example, if we know that the flags will be needed
by a SETcc or Jcc op later, we could use SETcc to
extract just the relevant flag. When needed,
we could use TESTB and change the op to JNE/SETNE.
(Alternatively, we could leave the op unaltered
and prepare an appropriate CMPB instruction
to produce the desired flag.)
However, this requires separate handling for every
instruction that uses the flags register,
including (say) SBBQcarrymask.
We could enable this on an ad hoc basis for common cases
and fall back to recalculation for other cases.
(2) Spill all flags with PUSHF and POPF
This modifies SP, which the runtime won't like.
It also requires coordination with stackalloc to
make sure that we have a stack slot ready for use.
(3) Spill almost all flags with LAHF, SETO, and SAHF
See http://blog.freearrow.com/archives/396
for details. This would handle all the flags we currently
use. However, LAHF and SAHF are not universally available
and it requires arranging for AX to be free.
Change-Id: Ie36600fd8e807ef2bee83e2e2ae3685112a7f276
Reviewed-on: https://go-review.googlesource.com/13844 Reviewed-by: Keith Randall <khr@golang.org>
Keith Randall [Tue, 18 Aug 2015 17:26:28 +0000 (10:26 -0700)]
[dev.ssa] cmd/compile: add decompose pass
Decompose breaks compound objects up into pieces that can be
operated on by the target architecture. The decompose pass only
does phi ops, the rest is done by the rewrite rules in generic.rules.
Compound objects include strings,slices,interfaces,structs,arrays.
Arrays aren't decomposed because of indexing (we could support
constant indexes, but dynamic indexes can't be handled using SSA).
Structs will come in a subsequent CL.
TODO: after this pass we have lost the association between, e.g.,
a string's pointer and its size. It would be nice if we could keep
that information around for debugging info somehow.
Keith Randall [Tue, 18 Aug 2015 21:17:30 +0000 (14:17 -0700)]
[dev.ssa] cmd/compile: used Bounded field to fix empty range loops
for i, v := range a {
}
Walk converts this to a regular for loop, like this:
for i := 0, p := &a[0]; i < len(a); i++, p++ {
v := *p
}
Unfortunately, &a[0] fails its bounds check when a is
the empty slice (or string). The old compiler gets around this
by marking &a[0] as Bounded, meaning "don't emit bounds checks
for this index op". This change makes SSA honor that same mark.
The SSA compiler hasn't implemented bounds check panics yet,
so the failed bounds check just causes the current routine
to return immediately.
Adds support for high multiply which is used by the frontend when
rewriting const division. The frontend currently only does this for 8,
16, and 32 bit integer arithmetic.
Change-Id: I9b6c6018f3be827a50ee6c185454ebc79b3094c8
Reviewed-on: https://go-review.googlesource.com/13696 Reviewed-by: Keith Randall <khr@golang.org>
David Chase [Wed, 12 Aug 2015 20:38:11 +0000 (16:38 -0400)]
[dev.ssa] cmd/compile: first unoptimized cut at adding FP support
Added F32 and F64 load, store, and addition.
Added F32 and F64 multiply.
Added F32 and F64 subtraction and division.
Added X15 to "clobber" for FP sub/div
Added FP constants
Added separate FP test in gc/testdata
Change-Id: Ifa60dbad948a40011b478d9605862c4b0cc9134c
Reviewed-on: https://go-review.googlesource.com/13612 Reviewed-by: Keith Randall <khr@golang.org>
Keith Randall [Sat, 15 Aug 2015 04:47:20 +0000 (21:47 -0700)]
[dev.ssa] cmd/compile/internal/ssa: Use explicit size for store ops
Using the type of the store argument is not safe, it may change
during rewriting, giving us the wrong store width.
(Store ptr (Trunc32to16 val) mem)
This should be a 2-byte store. But we have the rule:
(Trunc32to16 x) -> x
So if the Trunc rewrite happens before the Store -> MOVW rewrite,
then the Store thinks that the value it is storing is 4 bytes
in size and uses a MOVL. Bad things ensue.
Fix this by encoding the store width explicitly in the auxint field.
In general, we can't rely on the type of arguments, as they may
change during rewrites. The type of the op itself (as used by
the Load rules) is still ok to use.
[dev.ssa] cmd/compile: make failed nil checks panic
Introduce pseudo-ops PanicMem and LoweredPanicMem.
PanicMem could be rewritten directly into MOVL
during lowering, but then we couldn't log nil checks.
With this change, runnable nil check tests pass:
GOSSAPKG=main go run run.go -- nil*.go
Compiler output nil check tests fail:
GOSSAPKG=p go run run.go -- nil*.go
This is due to several factors:
* SSA has improved elimination of unnecessary nil checks.
* SSA is missing elimination of implicit nil checks.
* SSA is missing extra logging about why nil checks were removed.
I'm not sure how best to resolve these failures,
particularly in a world in which the two backends
will live side by side for some time.
For now, punt on the problem.
Change-Id: Ib2ca6824551671f92e0e1800b036f5ca0905e2a3
Reviewed-on: https://go-review.googlesource.com/13474 Reviewed-by: Keith Randall <khr@golang.org>
[dev.ssa] cmd/compile: fix function call memory accounting
We were not recording function calls as
changing the state of memory.
As a result, the scheduler was not aware that
storing values to the stack in order to make a
function call must happen *after* retrieving
results from the stack from a just-completed
function call.
This fixes the container/ring tests.
This was my first experience debugging an issue
using the HTML output. I'm feeling quite
pleased with it.
Change-Id: I9e8276846be9fd7a60422911b11816c5175e3d0a
Reviewed-on: https://go-review.googlesource.com/13560 Reviewed-by: Keith Randall <khr@golang.org>
Todd Neal [Fri, 7 Aug 2015 01:13:27 +0000 (20:13 -0500)]
[dev.ssa] cmd/compile/ssa: don't nil check phis with non-nil arguments
Move the known-non-nil scan outside the work loop to resolve an issue
with values that were declared outside the block being operated on.
Also consider phis whose arguments are all non-nil, as non-nil.
Change-Id: I4d5b840042de9eb181f2cb918f36913fb5d517a2
Reviewed-on: https://go-review.googlesource.com/13441 Reviewed-by: Keith Randall <khr@golang.org>
[dev.ssa] cmd/compile: detect rewrite loops of length > 1
Use a version of Floyd's cycle finding algorithm,
but advance by 1 and 1/2 steps per cycle rather
than by 1 and 2. It is simpler and should be cheaper
in the normal, acyclic case.
This should fix the 386 and arm builds,
which are currently hung.
Change-Id: If8bd443011b28a5ecb004a549239991d3dfc862b
Reviewed-on: https://go-review.googlesource.com/13473 Reviewed-by: Keith Randall <khr@golang.org>
Keith Randall [Mon, 10 Aug 2015 18:10:53 +0000 (11:10 -0700)]
[dev.ssa] cmd/compile/internal/ssa: enforce load-store ordering in scheduler
We must make sure that all loads that use a store are scheduled
before the next store. Add additional dependency edges to the
value graph to enforce this constraint.
Keith Randall [Mon, 3 Aug 2015 19:33:03 +0000 (12:33 -0700)]
[dev.ssa] cmd/compile/internal/ssa: Fix scheduler
The DFS scheduler doesn't do the right thing. If a Value x is used by
more than one other Value, then x is put into the DFS queue when
its first user (call it y) is visited. It is not removed and reinserted
when the second user of x (call it z) is visited, so the dependency
between x and z is not respected. There is no easy way to fix this with
the DFS queue because we'd have to rip values out of the middle of the
DFS queue.
The new scheduler works from the end of the block backwards, scheduling
instructions which have had all of their uses already scheduled.
A simple priority scheme breaks ties between multiple instructions that
are ready to schedule simultaneously.
Keep track of whether we've scheduled or not, and make print() use
the scheduled order if we have.
Fix some shift tests that this change tickles. Add unsigned right shift tests.
Reworks nilcheck to be performed by a depth first traversal of the
dominator tree, keeping an updated map of the values that have been
nil-checked during the traversal.
[dev.ssa] cmd/compile: use Copy instead of ConvNop
The existing backend simply elides OCONVNOP.
There's no reason for us to do any differently.
Rather than insert ConvNops and then rewrite them
away, stop creating them in the first place.
Change-Id: I4bcbe2229fcebd189ae18df24f2c612feb6e215e
Reviewed-on: https://go-review.googlesource.com/12810 Reviewed-by: Keith Randall <khr@golang.org>
runtime/cgo: fix darwin/amd64 signal handling setup
Was not allocating space for the frame above sigpanic,
nor was it pushing the LR into the right place.
Because traceback past sigpanic only needs the
LR for faulting leaves, this was not noticed too much.
But it did break the sync/atomic nil deref tests.
Change-Id: Icba53fffa193423aab744c37f21ee893ce2ee3ac
Reviewed-on: https://go-review.googlesource.com/12926 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Convert shift ops to also encode the size of the shift amount.
Change signed right shift from using CMOV to using bit twiddles.
It is a little bit better (5 instructions instead of 4, but fewer
bytes and slightly faster code). It's also a bit faster than
the 4-instruction branch version, even with a very predictable
branch. As tested on my machine, YMMV.
David Chase [Thu, 30 Jul 2015 16:31:18 +0000 (12:31 -0400)]
cmd/compile: add case for ODOTTYPE to escwalk
ODOTTYPE should be treated a whole lot like ODOT,
but it was missing completely from the switch in
escwalk and thus escape status did not propagate
to fields.
Since interfaces are required to trigger this bug,
the test was added to escape_iface.go.
runtime: change arm software div/mod call sequence not to modify stack
Instead of pushing the denominator argument on the stack,
the denominator is now passed in m.
This fixes a variety of bugs related to trying to take stack traces
backwards from the middle of the software div/mod routines.
Some of those bugs have been kludged around in the past,
but others have not. Instead of trying to patch up after breaking
the stack, this CL stops breaking the stack.
This is an update of https://golang.org/cl/19810043,
which was rolled back in https://golang.org/cl/20350043.
The problem in the original CL was that there were divisions
at bad times, when m was not available. These were divisions
by constant denominators, either in C code or in assembly.
The Go compiler knows how to generate division by multiplication
for constant denominators, but the C compiler did not.
There is no longer any C code, so that's taken care of.
There was one problematic DIV in runtime.usleep (assembly)
but https://golang.org/cl/12898 took care of that one.
So now this approach is safe.
Reject DIV/MOD in NOSPLIT functions to keep them from
coming back.
Fixes #6681.
Fixes #6699.
Fixes #10486.
Change-Id: I09a13c76ad08ba75b3bd5d46a3eb78e66a84ab38
Reviewed-on: https://go-review.googlesource.com/12899 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Thu, 30 Jul 2015 05:04:09 +0000 (22:04 -0700)]
cmd/cgo: discard trailing zero-sized fields in a non-empty C struct
In order to fix issue #9401 the compiler was changed to add a padding
byte to any non-empty Go struct that ends in a zero-sized field. That
causes the Go version of such a C struct to have a different size than
the C struct, which can considerable confusion. Change cgo so that it
discards any such zero-sized fields, so that the Go and C structs are
the same size.
This is a change from previous releases, in that it used to be
possible to refer to a zero-sized trailing field (by taking its
address), and with this change it no longer is. That is unfortunate,
but something has to change. It seems better to visibly break
programs that do this rather than to silently break programs that rely
on the struct sizes being the same.
runtime: replace divide with multiply in runtime.usleep on arm
We want to adjust the DIV calling convention to use m,
and usleep can be called without an m, so switch to a
multiplication by the reciprocal (and test).
Step toward a fix for #6699 and #10486.
Change-Id: Iccf76a18432d835e48ec64a2fa34a0e4d6d4b955
Reviewed-on: https://go-review.googlesource.com/12898 Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd/internal/obj/arm: fix line numbers after constant pool
If a function is large enough to need to flush the constant pool
mid-function, the line number assignment code was forcing the
line numbers not just for the constant pool but for all the instructions
that follow it. This made the line number information completely
wrong for all but the beginning of large functions on arm.
Same problem in code copied into arm64.
This broke runtime/trace's TestTraceSymbolize.
Fixes arm build.
Change-Id: I84d9fb2c798c4085f69b68dc766ab4800c7a6ca4
Reviewed-on: https://go-review.googlesource.com/12894 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
This allows running a cross-compile like
GOOS=darwin GOARCH=arm go build std
to check that everything builds.
Otherwise there is a redefinition error because both
root_nocgo_darwin.go and root_darwin_armx.go
supply initSystemRoots.
Change-Id: Ic95976b2b698d28c629bfc93d8dac0048b023578
Reviewed-on: https://go-review.googlesource.com/12897 Reviewed-by: Ian Lance Taylor <iant@golang.org>
net: allow longer timeout in dialClosedPort test on windows
The test expects the dial to take 1.0 seconds
on Windows and allows it to go to 1.095 seconds.
That's far too optimistic.
Recent failures are reporting roughly 1.2 seconds.
Let it have 1.5.
Change-Id: Id69811ccb65bf4b4c159301a2b4767deb6ee8d28
Reviewed-on: https://go-review.googlesource.com/12895 Reviewed-by: Ian Lance Taylor <iant@golang.org>
math/rand: warn against using package for security-sensitive work
Urge users of math/rand to consider using crypto/rand when doing
security-sensitive work.
Related to issue #11871. While we haven't reached consensus on how
to make the package inherently safer, everyone agrees that the docs
for math/rand can be improved.
Change-Id: I576a312e51b2a3445691da6b277c7b4717173197
Reviewed-on: https://go-review.googlesource.com/12900 Reviewed-by: Rob Pike <r@golang.org>
cmd/compile: fix uninitialized memory during type switch assertE2I2
Fixes arm64 builder crash.
The bug is possible on all architectures; you just have to get lucky
and hit a preemption or a stack growth on entry to assertE2I2.
The test stacks the deck.
Change-Id: I8419da909b06249b1ad15830cbb64e386b6aa5f6
Reviewed-on: https://go-review.googlesource.com/12890 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Rob Pike <r@golang.org>
The skips added in CL 12579, based on incorrect time stamps,
should be sufficient to identify and exclude all the time-related
flakiness on these systems.
runtime/trace: record event sequence numbers explicitly
Nearly all the flaky failures we've seen in trace tests have been
due to the use of time stamps to determine relative event ordering.
This is tricky for many reasons, including:
- different cores might not have exactly synchronized clocks
- VMs are worse than real hardware
- non-x86 chips have different timer resolution than x86 chips
- on fast systems two events can end up with the same time stamp
Stop trying to make time reliable. It's clearly not going to be for Go 1.5.
Instead, record an explicit event sequence number for ordering.
Using our own counter solves all of the above problems.
The trace still contains time stamps, of course. The sequence number
is just used for ordering.
Should alleviate #10554 somewhat. Then tickDiv can be chosen to
be a useful time unit instead of having to be exact for ordering.
Separating ordering and time stamps lets the trace parser diagnose
systems where the time stamp order and actual order do not match
for one reason or another. This CL adds that check to the end of
trace.Parse, after all other sequence order-based checking.
If that error is found, we skip the test instead of failing it.
Putting the check in trace.Parse means that cmd/trace will pick
up the same check, refusing to display a trace where the time stamps
do not match actual ordering.
Using net/http's BenchmarkClientServerParallel4 on various CPU counts,
not tracing vs tracing:
Keith Randall [Tue, 28 Jul 2015 23:04:50 +0000 (16:04 -0700)]
[dev.ssa] cmd/compile/internal/ssa: implement lots of small (<8byte) ops.
Lots and lots of ops!
Also XOR for good measure.
Add a pass to the compiler generator to check that all of the
architecture-specific opcodes are handled by genValue. We will
catch any missing ones if we come across them during compilation,
but probably better to catch them statically.