Cypherpunks repositories - gostls13.git/log

runtime: add TestSizeof

Borrowed from cmd/compile, TestSizeof ensures
that the size of important types doesn't change unexpectedly.
It also helps reviewers see the impact of intended changes.

Change-Id: If57955f0c3e66054de3f40c6bba585b88694c7be
Reviewed-on: https://go-review.googlesource.com/99837
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

net: optimize IP.String for IPv4

This is optimization is only for IPv4. It allocates a result buffer and
writes the IPv4 octets as dotted decimal into it before converting
it to a string just once, reducing allocations.

Benchmark shows performance improvement:

name             old time/op    new time/op    delta
IPString/IPv4-8     284ns ± 4%     144ns ± 6%  -49.35%  (p=0.000 n=19+17)
IPString/IPv6-8    1.34µs ± 5%    1.14µs ± 5%  -14.37%  (p=0.000 n=19+20)

name             old alloc/op   new alloc/op   delta
IPString/IPv4-8     24.0B ± 0%     16.0B ± 0%  -33.33%  (p=0.000 n=20+20)
IPString/IPv6-8      232B ± 0%      224B ± 0%   -3.45%  (p=0.000 n=20+20)

name             old allocs/op  new allocs/op  delta
IPString/IPv4-8      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=20+20)
IPString/IPv6-8      12.0 ± 0%      11.0 ± 0%   -8.33%  (p=0.000 n=20+20)

Fixes #24306

Change-Id: I4e2d30d364e78183d55a42907d277744494b6df3
Reviewed-on: https://go-review.googlesource.com/99395
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

test/codegen: port MULs merging tests to codegen

And delete them from asm_go.

Change-Id: I0057cbd90ca55fa51c596e32406e190f3866f93e
Reviewed-on: https://go-review.googlesource.com/99815
Reviewed-by: Keith Randall <khr@golang.org>

runtime: fix comment for hwcap on linux/arm

hwcap is set in archauxv, setup_auxv no longer exists.

Change-Id: I0fc9393e0c1c45192e0eff4715e9bdd69fab2653
Reviewed-on: https://go-review.googlesource.com/99779
Reviewed-by: Ian Lance Taylor <iant@golang.org>

go/internal/gcimporter: simplify defer

Directly use rc.Close instead of wrapping it with a closure.

Change-Id: I3dc1c21ccbfe031c230b035126d5ea3bc62055c3
Reviewed-on: https://go-review.googlesource.com/99716
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

cmd/go: briefly document test caching in go test -h output

Fixes #23971

Change-Id: I073f278cc058aa15a23c0ea06292c02d50a3df21
Reviewed-on: https://go-review.googlesource.com/95582
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rob Pike <r@golang.org>

test/codegen: port math/bits.RotateLeft tests to codegen

Only RotateLeft{64,32} were tested, and just for ppc64. This CL adds
tests for RotateLeft{64,32,16,8} on arm64 and amd64/386, for the cases
where the calls are actually instrinsified.

RotateLeft tests (the last ones for math/bits functions) are deleted
from asm_test.

This CL also adds a space between the "//" and the arch name in the
comments, to uniform this file to the style used in all the other
files.

Change-Id: Ifc2a27261d70bcc294b4ec64490d8367f62d2b89
Reviewed-on: https://go-review.googlesource.com/99596
Reviewed-by: Giovanni Bajo <rasky@develer.com>

cmd/trace: set cname for span slices

Define a set of color names available in trace viewer

https://user-images.githubusercontent.com/4999471/37063995-5d0bad48-2169-11e8-92be-9cb363e21c38.png

Change-Id: I312fcbc5430d7512b4c39ddc79a769259bad8c22
Reviewed-on: https://go-review.googlesource.com/99055
Reviewed-by: Heschi Kreinick <heschi@google.com>

cmd/trace: remove unrelated arrows in task-oriented traceview

Also grey out instants that represent events occurred outside the
task's span. Furthermore, if the unrelated instants represent user
annotation events but not for the task of the interest, skip rendering
completely.

This helps users to focus on the task-related events better.

UI screen shot:
https://gist.github.com/hyangah/1df5d2c8f429fd933c481e9636b89b55#file-golang-org_cl_99035

Change-Id: I2b5aef41584c827f8c1e915d0d8e5c95fe2b4b65
Reviewed-on: https://go-review.googlesource.com/99035
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>

go/printer: simplify handling of line directives

Strangely enough, the existing implementation used adjusted (by line
directives) source positions to determine layout and thus required
position corrections when printing a line directive.

Instead, just use the unadjusted, absolute source positions and then
printing a line directive doesn't require any adjustments, only some
care to make sure it remains in column 1 as before.

The new code doesn't need to parse line directives anymore and simply
ensures that comments with the //line prefix and starting in column 1
remain in that position. That is a slight change from the old behavior
(which ignored incorrect line directives, e.g. because they had an
invalid line number) but unlikely to show up in real code.

This is prep work for handling of line directives that also specify
columns (which now won't require much special handling anymore).

For #24143.

Change-Id: I07eb2e1b35b37337e632e3dbf5b70c783c615f8a
Reviewed-on: https://go-review.googlesource.com/99621
Reviewed-by: Matthew Dempsky <mdempsky@google.com>

encoding/csv: disallow quote for use as Comma

'"' has special semantic meaning that conflicts with using it as Comma.

Change-Id: Ife25ba43ca25dba2ea184c1bb7579a230d376059
Reviewed-on: https://go-review.googlesource.com/99696
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

runtime: explain and enforce that _panic values live on the stack

It's a bit mysterious that _defer.sp is a uintptr that gets
stack-adjusted explicitly while _panic.argp is an unsafe.Pointer that
doesn't, but turns out to be critically important when a deferred
function grows the stack before doing a recover.

Add a comment explaining that this works because _panic values live on
the stack. Enforce this by marking _panic go:notinheap.

Change-Id: I9ca49e84ee1f86d881552c55dccd0662b530836b
Reviewed-on: https://go-review.googlesource.com/99735
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>

runtime: ensure abort actually crashes the process

On all non-x86 arches, runtime.abort simply reads from nil.
Unfortunately, if this happens on a user stack, the signal handler
will dutifully turn this into a panicmem, which lets user defers run
and which user code can even recover from.

To fix this, add an explicit check to the signal handler that turns
faults in abort into hard crashes directly in the signal handler. This
has the added benefit of giving a register dump at the abort point.

Change-Id: If26a7f13790745ee3867db7f53b72d8281176d70
Reviewed-on: https://go-review.googlesource.com/93661
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

runtime: call abort instead of raw INT $3 or bad MOV

Everything except for amd64, amd64p32, and 386 currently defines and
uses an abort function. This CL makes these match. The next CL will
recognize the abort function to make this more useful.

Change-Id: I7c155871ea48919a9220417df0630005b444f488
Reviewed-on: https://go-review.googlesource.com/93660
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

runtime: make throw safer to call

Currently, throw may grow the stack, which means whenever we call it
from a context where it's not safe to grow the stack, we first have to
switch to the system stack. This is pretty easy to get wrong.

Fix this by making throw switch to the system stack so it doesn't grow
the stack and is hence safe to call without a system stack switch at
the call site.

The only thing this complicates is badsystemstack itself, which would
now go into an infinite loop before printing anything (previously it
would also go into an infinite loop, but would at least print the
error first). Fix this by making badsystemstack do a direct write and
then crash hard.

Change-Id: Ic5b4a610df265e47962dcfa341cabac03c31c049
Reviewed-on: https://go-review.googlesource.com/93659
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

runtime: move unrecoverable panic handling to the system stack

Currently parts of unrecoverable panic handling (notably, printing
panic messages) can happen on the user stack. This may grow the stack,
which is generally fine, but if we're handling a runtime panic, it's
better to do as little as possible in case the runtime is in an
inconsistent state.

Hence, this commit rearranges the handling of unrecoverable panics so
that it's done entirely on the system stack.

This is mostly a matter of shuffling code a bit so everything can move
into a systemstack block. The one slight subtlety is in the "panic
during panic" case, where we now depend on startpanic_m's caller to
print the stack rather than startpanic_m itself. To make this work,
startpanic_m now returns a boolean indicating that the caller should
avoid trying to print any panic messages and get right to the stack
trace. Since the caller is already in a position to do this, this
actually simplifies things a little.

Change-Id: Id72febe8c0a9fb31d9369b600a1816d65a49bfed
Reviewed-on: https://go-review.googlesource.com/93658
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

cmd/compile: simplify OpSlicemask optimization

The previous CL introduced isConstDelta. Use it to simplify the
OpSlicemask optimization in the prove pass. This passes toolstash
-cmp.

Change-Id: If2aa762db4cdc0cd1c581a536340530a9831081b
Reviewed-on: https://go-review.googlesource.com/87481
Reviewed-by: Keith Randall <khr@golang.org>

cmd/compile: add fence-post implications to prove

This adds four new deductions to the prove pass, all related to adding
or subtracting one from a value. This is the first hint of actual
arithmetic relations in the prove pass.

The most effective of these is

   x-1 >= w && x > min  ⇒  x > w

This helps eliminate bounds checks in code like

  if x > 0 {
    // do something with s[x-1]
  }

Altogether, these deductions prove an additional 260 branches in std
and cmd. Furthermore, they will let us eliminate some tricky
compiler-inserted panics in the runtime that are interfering with
static analysis.

Fixes #23354.

Change-Id: I7088223e0e0cd6ff062a75c127eb4bb60e6dce02
Reviewed-on: https://go-review.googlesource.com/87480
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>

cmd/compile: derive unsigned limits from signed limits in prove

This adds a few simple deductions to the prove pass' fact table to
derive unsigned concrete limits from signed concrete limits where
possible.

This tweak lets the pass prove 70 additional branch conditions in std
and cmd.

This is based on a comment from the recently-deleted factsTable.get:
"// TODO: also use signed data if lim.min >= 0".

Change-Id: Ib4340249e7733070f004a0aa31254adf5df8a392
Reviewed-on: https://go-review.googlesource.com/87479
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: Keith Randall <khr@golang.org>

cmd/compile: make prove pass use unsatisfiability

Currently the prove pass uses implication queries. For each block, it
collects the set of branch conditions leading to that block, and
queries this fact table for whether any of these facts imply the
block's own branch condition (or its inverse). This works remarkably
well considering it doesn't do any deduction on these facts, but it
has various downsides:

1. It requires an implementation both of adding facts to the table and
   determining implications. These are very nearly duals of each
   other, but require separate implementations. Likewise, the process
   of asserting facts of dominating branch conditions is very nearly
   the dual of the process of querying implied branch conditions.

2. It leads to less effective use of derived facts. For example, the
   prove pass currently derives facts about the relations between len
   and cap, but can't make use of these unless a branch condition is
   in the exact form of a derived fact. If one of these derived facts
   contradicts another fact, it won't notice or make use of this.

This CL changes the approach of the prove pass to instead use
*contradiction* instead of implication. Rather than ever querying a
branch condition, it simply adds branch conditions to the fact table.
If this leads to a contradiction (specifically, it makes the fact set
unsatisfiable), that branch is impossible and can be cut. As a result,

1. We can eliminate the code for determining implications
   (factsTable.get disappears entirely). Also, there is now a single
   implementation of visiting and asserting branch conditions, since
   we don't have to flip them around to treat them as facts in one
   place and queries in another.

2. Derived facts can be used effectively. It doesn't matter *why* the
   fact table is unsatisfiable; a contradiction in any of the facts is
   enough.

3. As an added benefit, it's now quite easy to avoid traversing beyond
   provably-unreachable blocks. In contrast, the current
   implementation always visits all blocks.

The prove pass already has nearly all of the mechanism necessary to
compute unsatisfiability, which means this both simplifies the code
and makes it more powerful.

The only complication is that the current implication procedure has a
hack for dealing with the 0 <= Args[0] condition of OpIsInBounds and
OpIsSliceInBounds. We replace this with asserting the appropriate fact
when we process one of these conditions. This seems much cleaner
anyway, and works because we can now take advantage of derived facts.

This has no measurable effect on compiler performance.

Effectiveness:

There is exactly one condition in all of std and cmd that this fails
to prove that the old implementation could: (int64(^uint(0)>>1) < x)
in encoding/gob. This can never be true because x is an int, and it's
basically coincidence that the old code gets this. (For example, it
fails to prove the similar (x < ^int64(^uint(0)>>1)) condition that
immediately precedes it, and even though the conditions are logically
unrelated, it wouldn't get the second one if it hadn't first processed
the first!)

It does, however, prove a few dozen additional branches. These come
from facts that are added to the fact table about the relations
between len and cap. These were almost never queried directly before,
but could lead to contradictions, which the unsat-based approach is
able to use.

There are exactly two branches in std and cmd that this implementation
proves in the *other* direction. This sounds scary, but is okay
because both occur in already-unreachable blocks, so it doesn't matter
what we chose. Because the fact table logic is sound but incomplete,
it fails to prove that the block isn't reachable, even though it is
able to prove that both outgoing branches are impossible. We could
turn these blocks into BlockExit blocks, but it doesn't seem worth the
trouble of the extra proof effort for something that happens twice in
all of std and cmd.

Tests:

This CL updates test/prove.go to change the expected messages because
it can no longer give a "reason" why it proved or disproved a
condition. It also adds a new test of a branch it couldn't prove
before.

It mostly guts test/sliceopt.go, removing everything related to slice
bounds optimizations and moving a few relevant tests to test/prove.go.
Much of this test is actually unreachable. The new prove pass figures
this out and doesn't try to prove anything about the unreachable
parts. The output on the unreachable parts is already suspect because
anything can be proved at that point, so it's really just a regression
test for an algorithm the compiler no longer uses.

This is a step toward fixing #23354. That issue is quite easy to fix
once we can use derived facts effectively.

Change-Id: Ia48a1b9ee081310579fe474e4a61857424ff8ce8
Reviewed-on: https://go-review.googlesource.com/87478
Reviewed-by: Keith Randall <khr@golang.org>

cmd/compile: simplify limit logic in prove

This replaces the open-coded intersection of limits in the prove pass
with a general limit intersection operation. This should get identical
results except in one case where it's more precise: when handling an
equality relation, if the value is *outside* the existing range, this
will reduce the range to empty rather than resetting it. This will be
important to a follow-up CL where we can take advantage of empty
ranges.

For #23354.

Change-Id: I3d3d75924f61b1da1cb604b3a9d189b26fb3a14e
Reviewed-on: https://go-review.googlesource.com/87477
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>

cmd/compile: more String methods for prove types

These aid in debugging.

Change-Id: Ieb38c996765f780f6103f8c3292639d408c25123
Reviewed-on: https://go-review.googlesource.com/87476
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

cmd/compile: minor comment improvements/corrections

Change-Id: Ie0934f1528d58d4971cdef726d3e2d23cf3935d3
Reviewed-on: https://go-review.googlesource.com/87475
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>

Revert "cmd/compile: cleanup nodpc and nodfp"

This reverts commit dcac984b97470c4f047f0d3d87b0af40f5246ed2.

Reason for revert: broke LR architectures (arm64, ppc64, s390x)

Change-Id: I531d311c9053e81503c8c78d6cf044b318fc828b
Reviewed-on: https://go-review.googlesource.com/99695
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

math/big: allocate less in Float.Sqrt

The Newton sqrtInverse procedure we use to compute Float.Sqrt should
not allocate a number of times proportional to the number of Newton
iterations we need to reach the desired precision.

At the beginning the function the target precision is known, so even
if we do want to perform the early steps at low precisions (to save
time), it's still possible to pre-allocate larger backing arrays, both
for the temp variables in the loop and the variable that'll hold the
final result.

There's one complication. At the following line:

  u.Sub(three, u)

the Sub method will allocate, because the receiver aliases one of the
arguments, and the large backing array we initially allocated for u
will be replaced by a smaller one allocated by Sub. We can work around
this by introducing a second temp variable u2 that we use to hold the
Sub call result.

Overall, the sqrtInverse procedure still allocates a number of times
proportional to the number of Newton steps, because unfortunately a
few of the Mul calls in the Newton function allocate; but at least we
allocate less in the function itself.

FloatSqrt/256-4        1.97µs ± 1%    1.84µs ± 1%   -6.61%  (p=0.000 n=8+8)
FloatSqrt/1000-4       4.80µs ± 3%    4.28µs ± 1%  -10.78%  (p=0.000 n=8+8)
FloatSqrt/10000-4      40.0µs ± 1%    38.3µs ± 1%   -4.15%  (p=0.000 n=8+8)
FloatSqrt/100000-4      955µs ± 1%     932µs ± 0%   -2.49%  (p=0.000 n=8+7)
FloatSqrt/1000000-4    79.8ms ± 1%    79.4ms ± 1%     ~     (p=0.105 n=8+8)

name                 old alloc/op   new alloc/op   delta
FloatSqrt/256-4          816B ± 0%      512B ± 0%  -37.25%  (p=0.000 n=8+8)
FloatSqrt/1000-4       2.50kB ± 0%    1.47kB ± 0%  -41.03%  (p=0.000 n=8+8)
FloatSqrt/10000-4      23.5kB ± 0%    18.2kB ± 0%  -22.62%  (p=0.000 n=8+8)
FloatSqrt/100000-4      251kB ± 0%     173kB ± 0%  -31.26%  (p=0.000 n=8+8)
FloatSqrt/1000000-4    4.61MB ± 0%    2.86MB ± 0%  -37.90%  (p=0.000 n=8+8)

name                 old allocs/op  new allocs/op  delta
FloatSqrt/256-4          12.0 ± 0%       8.0 ± 0%  -33.33%  (p=0.000 n=8+8)
FloatSqrt/1000-4         19.0 ± 0%       9.0 ± 0%  -52.63%  (p=0.000 n=8+8)
FloatSqrt/10000-4        35.0 ± 0%      14.0 ± 0%  -60.00%  (p=0.000 n=8+8)
FloatSqrt/100000-4       55.0 ± 0%      23.0 ± 0%  -58.18%  (p=0.000 n=8+8)
FloatSqrt/1000000-4       122 ± 0%        75 ± 0%  -38.52%  (p=0.000 n=8+8)

Change-Id: I950dbf61a40267a6cca82ae72524c3024bcb149c
Reviewed-on: https://go-review.googlesource.com/87659
Reviewed-by: Robert Griesemer <gri@golang.org>

math/big: speedup nat.setBytes for bigger slices

Set up to _S (number of bytes in Uint) bytes at time
by using BigEndian.Uint32 and BigEndian.Uint64.

The performance improves for slices bigger than _S bytes.
This is the case for 128/256bit arith that initializes
it's objects from bytes.

name               old time/op  new time/op  delta
NatSetBytes/8-4    29.8ns ± 1%  11.4ns ± 0%  -61.63%  (p=0.000 n=9+8)
NatSetBytes/24-4    109ns ± 1%    56ns ± 0%  -48.75%  (p=0.000 n=9+8)
NatSetBytes/128-4   420ns ± 2%   110ns ± 1%  -73.83%  (p=0.000 n=10+10)
NatSetBytes/7-4    26.2ns ± 1%  21.3ns ± 2%  -18.63%  (p=0.000 n=8+9)
NatSetBytes/23-4    106ns ± 1%    67ns ± 1%  -36.93%  (p=0.000 n=9+10)
NatSetBytes/127-4   410ns ± 2%   121ns ± 0%  -70.46%  (p=0.000 n=9+8)

Found this optimization opportunity by looking at ethereum_corevm
community benchmark cpuprofile.

name        old time/op  new time/op  delta
OpDiv256-4   715ns ± 1%   596ns ± 1%  -16.57%  (p=0.008 n=5+5)
OpDiv128-4   373ns ± 1%   314ns ± 1%  -15.83%  (p=0.008 n=5+5)
OpDiv64-4    301ns ± 0%   285ns ± 1%   -5.12%  (p=0.008 n=5+5)

Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57
Reviewed-on: https://go-review.googlesource.com/98775
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

cmd/compile: cleanup nodpc and nodfp

Instead of creating a new &nodfp expression for every recover() call,
or a new nodpc variable for every function instrumented by the race
detector, this CL introduces two new uintptr-typed pseudo-variables
callerSP and callerPC. These pseudo-variables act just like calls to
the runtime's getcallersp() and getcallerpc() functions.

For consistency, change runtime.gorecover's builtin stub's parameter
type from "*int32" to "uintptr".

Passes toolstash-check, but toolstash-check -race fails because of
register allocator changes.

Change-Id: I985d644653de2dac8b7b03a28829ad04dfd4f358
Reviewed-on: https://go-review.googlesource.com/99416
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

cmd/compile: remove two out-of-phase calls to walk

All calls to walkstmt/walkexpr/etc should be rooted from funccompile,
whereas transformclosure and fninit are called by main.

Passes toolstash-check.

Change-Id: Ic880e2d2d83af09618ce4daa8e7716f6b389e53e
Reviewed-on: https://go-review.googlesource.com/99418
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

cmd/compile: remove state.exitCode

We're holding onto the function's complete AST anyway, so might as
well grab the exit code from there.

Passes toolstash-check.

Change-Id: I851b5dfdb53f991e9cd9488d25d0d2abc2a8379f
Reviewed-on: https://go-review.googlesource.com/99417
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

cmd/compile: fuse escape analysis parameter tagging loops

Simplifies the code somewhat and allows removing Param.Field.

Passes toolstash-check.

Change-Id: Id854416aea8afd27ce4830ff0f5ff940f7353792
Reviewed-on: https://go-review.googlesource.com/99336
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

net/http: panic when a nil handler is passed to (*ServeMux)HandleFunc

Fixes #24297

Change-Id: I759e88655632fda97dced240b3f13392b2785d0a
Reviewed-on: https://go-review.googlesource.com/99575
Reviewed-by: Andrew Bonventre <andybons@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

time: add support for parsing timezones denoted by sign and offset

IANA Zoneinfo does not provide names for all timezones. Some are denoted
by a sign and an offset only. E.g: Europe/Turkey is currently +03 or
America/La_Paz which is -04 (https://data.iana.org/time-zones/releases/tzdata2018c.tar.gz)

Fixes #24071

Change-Id: I9c230a719945e1263c5b52bab82084d22861be3e
Reviewed-on: https://go-review.googlesource.com/98157
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

runtime: use systemstack around throw in sysSigaction

Try to fix the build on ppc64-linux and ppc64le-linux, avoiding:

--- FAIL: TestInlinedRoutineRecords (2.12s)
dwarf_test.go:97: build: # command-line-arguments
runtime.systemstack: nosplit stack overflow
752 assumed on entry to runtime.sigtrampgo (nosplit)
480 after runtime.sigtrampgo (nosplit) uses 272
400 after runtime.sigfwdgo (nosplit) uses 80
264 after runtime.setsig (nosplit) uses 136
208 after runtime.sigaction (nosplit) uses 56
136 after runtime.sysSigaction (nosplit) uses 72
88 after runtime.throw (nosplit) uses 48
16 after runtime.dopanic (nosplit) uses 72
-16 after runtime.systemstack (nosplit) uses 32

dwarf_test.go:98: build error: exit status 2
--- FAIL: TestAbstractOriginSanity (10.22s)
dwarf_test.go:97: build: # command-line-arguments
runtime.systemstack: nosplit stack overflow
752 assumed on entry to runtime.sigtrampgo (nosplit)
480 after runtime.sigtrampgo (nosplit) uses 272
400 after runtime.sigfwdgo (nosplit) uses 80
264 after runtime.setsig (nosplit) uses 136
208 after runtime.sigaction (nosplit) uses 56
136 after runtime.sysSigaction (nosplit) uses 72
88 after runtime.throw (nosplit) uses 48
16 after runtime.dopanic (nosplit) uses 72
-16 after runtime.systemstack (nosplit) uses 32

dwarf_test.go:98: build error: exit status 2
FAIL
FAIL cmd/link/internal/ld 13.404s

Change-Id: I4840604adb0e9f68a8d8e24f2f2a1a17d1634a58
Reviewed-on: https://go-review.googlesource.com/99415
Reviewed-by: Austin Clements <austin@google.com>

test/codegen: port 2^n muls tests to codegen harness

And delete them from the asm_test.go file.

Change-Id: I124c8c352299646ec7db0968cdb0fe59a3b5d83d
Reviewed-on: https://go-review.googlesource.com/99475
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>

math/big: optimize addVW and subVW on arm64

The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance.
By unrolling the cycle 4 times and 2 times, and using "LDP", "STP" instructions,
this CL can reduce the "load" cost and improve performance.

Benchmarks:

name                              old time/op    new time/op     delta
AddVV/1-8                           21.5ns ± 0%     21.5ns ± 0%      ~     (all equal)
AddVV/2-8                           13.5ns ± 0%     13.5ns ± 0%      ~     (all equal)
AddVV/3-8                           15.5ns ± 0%     15.5ns ± 0%      ~     (all equal)
AddVV/4-8                           17.5ns ± 0%     17.5ns ± 0%      ~     (all equal)
AddVV/5-8                           19.5ns ± 0%     19.5ns ± 0%      ~     (all equal)
AddVV/10-8                          29.5ns ± 0%     29.5ns ± 0%      ~     (all equal)
AddVV/100-8                          217ns ± 0%      217ns ± 0%      ~     (all equal)
AddVV/1000-8                        2.02µs ± 0%     2.02µs ± 0%      ~     (all equal)
AddVV/10000-8                       20.3µs ± 0%     20.3µs ± 0%      ~     (p=0.603 n=5+5)
AddVV/100000-8                       223µs ± 7%      228µs ± 8%      ~     (p=0.548 n=5+5)
AddVW/1-8                           9.32ns ± 0%     9.26ns ± 0%    -0.64%  (p=0.008 n=5+5)
AddVW/2-8                           19.8ns ± 3%     10.5ns ± 0%   -46.92%  (p=0.008 n=5+5)
AddVW/3-8                           11.5ns ± 0%     11.0ns ± 0%    -4.35%  (p=0.008 n=5+5)
AddVW/4-8                           13.0ns ± 0%     12.0ns ± 0%    -7.69%  (p=0.008 n=5+5)
AddVW/5-8                           14.5ns ± 0%     12.5ns ± 0%   -13.79%  (p=0.008 n=5+5)
AddVW/10-8                          22.0ns ± 0%     15.5ns ± 0%   -29.55%  (p=0.008 n=5+5)
AddVW/100-8                          167ns ± 0%       81ns ± 0%   -51.44%  (p=0.008 n=5+5)
AddVW/1000-8                        1.52µs ± 0%     0.64µs ± 0%   -57.58%  (p=0.008 n=5+5)
AddVW/10000-8                       15.1µs ± 0%      7.2µs ± 0%   -52.55%  (p=0.008 n=5+5)
AddVW/100000-8                       150µs ± 0%       71µs ± 0%   -52.95%  (p=0.008 n=5+5)
SubVW/1-8                           9.32ns ± 0%     9.26ns ± 0%    -0.64%  (p=0.008 n=5+5)
SubVW/2-8                           19.7ns ± 2%     10.5ns ± 0%   -46.70%  (p=0.008 n=5+5)
SubVW/3-8                           11.5ns ± 0%     11.0ns ± 0%    -4.35%  (p=0.008 n=5+5)
SubVW/4-8                           13.0ns ± 0%     12.0ns ± 0%    -7.69%  (p=0.008 n=5+5)
SubVW/5-8                           14.5ns ± 0%     12.5ns ± 0%   -13.79%  (p=0.008 n=5+5)
SubVW/10-8                          22.0ns ± 0%     15.5ns ± 0%   -29.55%  (p=0.008 n=5+5)
SubVW/100-8                          167ns ± 0%       81ns ± 0%   -51.44%  (p=0.008 n=5+5)
SubVW/1000-8                        1.52µs ± 0%     0.64µs ± 0%   -57.58%  (p=0.008 n=5+5)
SubVW/10000-8                       15.1µs ± 0%      7.2µs ± 0%   -52.49%  (p=0.008 n=5+5)
SubVW/100000-8                       150µs ± 0%       71µs ± 0%   -52.91%  (p=0.008 n=5+5)
AddMulVVW/1-8                       32.4ns ± 1%     32.6ns ± 1%      ~     (p=0.119 n=5+5)
AddMulVVW/2-8                       57.0ns ± 0%     57.0ns ± 0%      ~     (p=0.643 n=5+5)
AddMulVVW/3-8                       90.8ns ± 0%     90.7ns ± 0%      ~     (p=0.524 n=5+5)
AddMulVVW/4-8                        118ns ± 0%      118ns ± 1%      ~     (p=1.000 n=4+5)
AddMulVVW/5-8                        144ns ± 1%      144ns ± 0%      ~     (p=0.794 n=5+4)
AddMulVVW/10-8                       294ns ± 1%      296ns ± 0%    +0.48%  (p=0.040 n=5+5)
AddMulVVW/100-8                     2.73µs ± 0%     2.73µs ± 0%      ~     (p=0.278 n=5+5)
AddMulVVW/1000-8                    26.0µs ± 0%     26.5µs ± 0%    +2.14%  (p=0.008 n=5+5)
AddMulVVW/10000-8                    297µs ± 0%      297µs ± 0%    +0.24%  (p=0.008 n=5+5)
AddMulVVW/100000-8                  3.15ms ± 1%     3.13ms ± 0%      ~     (p=0.690 n=5+5)
DecimalConversion-8                  311µs ± 2%      309µs ± 2%      ~     (p=0.310 n=5+5)
FloatString/100-8                   2.55µs ± 2%     2.54µs ± 2%      ~     (p=1.000 n=5+5)
FloatString/1000-8                  58.1µs ± 0%     58.1µs ± 0%      ~     (p=0.151 n=5+5)
FloatString/10000-8                 4.59ms ± 0%     4.59ms ± 0%      ~     (p=0.151 n=5+5)
FloatString/100000-8                 446ms ± 0%      446ms ± 0%    +0.01%  (p=0.016 n=5+5)
FloatAdd/10-8                        183ns ± 0%      183ns ± 0%      ~     (p=0.333 n=4+5)
FloatAdd/100-8                       187ns ± 1%      192ns ± 2%      ~     (p=0.056 n=5+5)
FloatAdd/1000-8                      369ns ± 0%      371ns ± 0%    +0.54%  (p=0.016 n=4+5)
FloatAdd/10000-8                    1.88µs ± 0%     1.88µs ± 0%    -0.14%  (p=0.000 n=4+5)
FloatAdd/100000-8                   17.2µs ± 0%     17.1µs ± 0%    -0.37%  (p=0.008 n=5+5)
FloatSub/10-8                        147ns ± 0%      147ns ± 0%      ~     (all equal)
FloatSub/100-8                       145ns ± 0%      146ns ± 0%      ~     (p=0.238 n=5+4)
FloatSub/1000-8                      241ns ± 0%      241ns ± 0%      ~     (p=0.333 n=5+4)
FloatSub/10000-8                    1.06µs ± 0%     1.06µs ± 0%      ~     (p=0.444 n=5+5)
FloatSub/100000-8                   9.50µs ± 0%     9.48µs ± 0%    -0.14%  (p=0.008 n=5+5)
ParseFloatSmallExp-8                28.4µs ± 2%     28.5µs ± 1%      ~     (p=0.690 n=5+5)
ParseFloatLargeExp-8                 125µs ± 1%      124µs ± 1%      ~     (p=0.095 n=5+5)
GCD10x10/WithoutXY-8                 277ns ± 2%      278ns ± 3%      ~     (p=0.937 n=5+5)
GCD10x10/WithXY-8                   2.08µs ± 3%     2.15µs ± 3%      ~     (p=0.056 n=5+5)
GCD10x100/WithoutXY-8                592ns ± 3%      613ns ± 4%      ~     (p=0.056 n=5+5)
GCD10x100/WithXY-8                  3.40µs ± 2%     3.42µs ± 4%      ~     (p=0.841 n=5+5)
GCD10x1000/WithoutXY-8              1.37µs ± 2%     1.35µs ± 3%      ~     (p=0.460 n=5+5)
GCD10x1000/WithXY-8                 7.34µs ± 2%     7.33µs ± 4%      ~     (p=0.841 n=5+5)
GCD10x10000/WithoutXY-8             8.52µs ± 0%     8.51µs ± 1%      ~     (p=0.421 n=5+5)
GCD10x10000/WithXY-8                27.5µs ± 2%     27.2µs ± 1%      ~     (p=0.151 n=5+5)
GCD10x100000/WithoutXY-8            78.3µs ± 1%     78.5µs ± 1%      ~     (p=0.690 n=5+5)
GCD10x100000/WithXY-8                231µs ± 0%      229µs ± 1%    -1.11%  (p=0.016 n=5+5)
GCD100x100/WithoutXY-8              1.86µs ± 2%     1.86µs ± 2%      ~     (p=0.881 n=5+5)
GCD100x100/WithXY-8                 27.1µs ± 2%     27.2µs ± 1%      ~     (p=0.421 n=5+5)
GCD100x1000/WithoutXY-8             4.44µs ± 2%     4.41µs ± 1%      ~     (p=0.310 n=5+5)
GCD100x1000/WithXY-8                36.3µs ± 1%     36.2µs ± 1%      ~     (p=0.310 n=5+5)
GCD100x10000/WithoutXY-8            22.6µs ± 2%     22.5µs ± 1%      ~     (p=0.690 n=5+5)
GCD100x10000/WithXY-8                145µs ± 1%      145µs ± 1%      ~     (p=1.000 n=5+5)
GCD100x100000/WithoutXY-8            195µs ± 0%      196µs ± 1%      ~     (p=0.548 n=5+5)
GCD100x100000/WithXY-8              1.10ms ± 0%     1.10ms ± 0%    -0.30%  (p=0.016 n=5+5)
GCD1000x1000/WithoutXY-8            25.0µs ± 1%     25.2µs ± 2%      ~     (p=0.222 n=5+5)
GCD1000x1000/WithXY-8                520µs ± 0%      520µs ± 1%      ~     (p=0.151 n=5+5)
GCD1000x10000/WithoutXY-8           57.0µs ± 1%     56.9µs ± 1%      ~     (p=0.690 n=5+5)
GCD1000x10000/WithXY-8              1.21ms ± 0%     1.21ms ± 1%      ~     (p=0.881 n=5+5)
GCD1000x100000/WithoutXY-8           358µs ± 0%      359µs ± 1%      ~     (p=0.548 n=5+5)
GCD1000x100000/WithXY-8             8.73ms ± 0%     8.73ms ± 0%      ~     (p=0.548 n=5+5)
GCD10000x10000/WithoutXY-8           686µs ± 0%      687µs ± 0%      ~     (p=0.548 n=5+5)
GCD10000x10000/WithXY-8             15.9ms ± 0%     15.9ms ± 0%      ~     (p=0.841 n=5+5)
GCD10000x100000/WithoutXY-8         2.08ms ± 0%     2.08ms ± 0%      ~     (p=1.000 n=5+5)
GCD10000x100000/WithXY-8            86.7ms ± 0%     86.7ms ± 0%      ~     (p=1.000 n=5+5)
GCD100000x100000/WithoutXY-8        51.1ms ± 0%     51.0ms ± 0%      ~     (p=0.151 n=5+5)
GCD100000x100000/WithXY-8            1.23s ± 0%      1.23s ± 0%      ~     (p=0.841 n=5+5)
Hilbert-8                           2.41ms ± 1%     2.42ms ± 2%      ~     (p=0.690 n=5+5)
Binomial-8                          4.86µs ± 1%     4.86µs ± 1%      ~     (p=0.889 n=5+5)
QuoRem-8                            7.09µs ± 0%     7.08µs ± 0%    -0.09%  (p=0.024 n=5+5)
Exp-8                                161ms ± 0%      161ms ± 0%    -0.08%  (p=0.032 n=5+5)
Exp2-8                               161ms ± 0%      161ms ± 0%      ~     (p=1.000 n=5+5)
Bitset-8                            40.7ns ± 0%     40.6ns ± 0%      ~     (p=0.095 n=4+5)
BitsetNeg-8                          159ns ± 4%      148ns ± 0%    -6.92%  (p=0.016 n=5+4)
BitsetOrig-8                         378ns ± 1%      378ns ± 1%      ~     (p=0.937 n=5+5)
BitsetNegOrig-8                      647ns ± 5%      647ns ± 4%      ~     (p=1.000 n=5+5)
ModSqrt225_Tonelli-8                7.26ms ± 0%     7.27ms ± 0%      ~     (p=1.000 n=5+5)
ModSqrt224_3Mod4-8                  2.24ms ± 0%     2.24ms ± 0%      ~     (p=0.690 n=5+5)
ModSqrt5430_Tonelli-8                62.8s ± 1%      62.5s ± 0%      ~     (p=0.063 n=5+4)
ModSqrt5430_3Mod4-8                  20.8s ± 0%      20.8s ± 0%      ~     (p=0.310 n=5+5)
Sqrt-8                               101µs ± 1%      101µs ± 0%    -0.35%  (p=0.032 n=5+5)
IntSqr/1-8                          32.3ns ± 1%     32.5ns ± 1%      ~     (p=0.421 n=5+5)
IntSqr/2-8                           157ns ± 5%      156ns ± 5%      ~     (p=0.651 n=5+5)
IntSqr/3-8                           292ns ± 2%      291ns ± 3%      ~     (p=0.881 n=5+5)
IntSqr/5-8                           738ns ± 6%      740ns ± 5%      ~     (p=0.841 n=5+5)
IntSqr/8-8                          1.82µs ± 4%     1.83µs ± 4%      ~     (p=0.730 n=5+5)
IntSqr/10-8                         2.92µs ± 1%     2.93µs ± 1%      ~     (p=0.643 n=5+5)
IntSqr/20-8                         6.28µs ± 2%     6.28µs ± 2%      ~     (p=1.000 n=5+5)
IntSqr/30-8                         13.8µs ± 2%     13.9µs ± 3%      ~     (p=1.000 n=5+5)
IntSqr/50-8                         37.8µs ± 4%     37.9µs ± 4%      ~     (p=0.690 n=5+5)
IntSqr/80-8                         95.9µs ± 1%     95.8µs ± 1%      ~     (p=0.841 n=5+5)
IntSqr/100-8                         148µs ± 1%      148µs ± 1%      ~     (p=0.310 n=5+5)
IntSqr/200-8                         586µs ± 1%      586µs ± 1%      ~     (p=0.841 n=5+5)
IntSqr/300-8                        1.32ms ± 0%     1.31ms ± 0%      ~     (p=0.222 n=5+5)
IntSqr/500-8                        2.48ms ± 0%     2.48ms ± 0%      ~     (p=0.556 n=5+4)
IntSqr/800-8                        4.68ms ± 0%     4.68ms ± 0%      ~     (p=0.548 n=5+5)
IntSqr/1000-8                       7.57ms ± 0%     7.56ms ± 0%      ~     (p=0.421 n=5+5)
Mul-8                                311ms ± 0%      311ms ± 0%      ~     (p=0.548 n=5+5)
Exp3Power/0x10-8                     559ns ± 1%      560ns ± 1%      ~     (p=0.984 n=5+5)
Exp3Power/0x40-8                     641ns ± 1%      634ns ± 1%      ~     (p=0.063 n=5+5)
Exp3Power/0x100-8                   1.39µs ± 2%     1.40µs ± 2%      ~     (p=0.381 n=5+5)
Exp3Power/0x400-8                   8.27µs ± 1%     8.26µs ± 0%      ~     (p=0.571 n=5+5)
Exp3Power/0x1000-8                  59.9µs ± 0%     59.7µs ± 0%    -0.23%  (p=0.008 n=5+5)
Exp3Power/0x4000-8                   816µs ± 0%      816µs ± 0%      ~     (p=1.000 n=5+5)
Exp3Power/0x10000-8                 7.77ms ± 0%     7.77ms ± 0%      ~     (p=0.841 n=5+5)
Exp3Power/0x40000-8                 73.4ms ± 0%     73.4ms ± 0%      ~     (p=0.690 n=5+5)
Exp3Power/0x100000-8                 665ms ± 0%      664ms ± 0%    -0.14%  (p=0.008 n=5+5)
Exp3Power/0x400000-8                 5.98s ± 0%      5.98s ± 0%    -0.09%  (p=0.008 n=5+5)
Fibo-8                               116ms ± 0%      116ms ± 0%    -0.25%  (p=0.008 n=5+5)
NatSqr/1-8                           115ns ± 3%      116ns ± 2%      ~     (p=0.238 n=5+5)
NatSqr/2-8                           237ns ± 1%      237ns ± 1%      ~     (p=0.683 n=5+5)
NatSqr/3-8                           367ns ± 3%      368ns ± 3%      ~     (p=0.817 n=5+5)
NatSqr/5-8                           807ns ± 3%      812ns ± 3%      ~     (p=0.913 n=5+5)
NatSqr/8-8                          1.93µs ± 2%     1.93µs ± 3%      ~     (p=0.651 n=5+5)
NatSqr/10-8                         2.98µs ± 2%     2.99µs ± 2%      ~     (p=0.690 n=5+5)
NatSqr/20-8                         6.49µs ± 2%     6.46µs ± 2%      ~     (p=0.548 n=5+5)
NatSqr/30-8                         14.4µs ± 2%     14.3µs ± 2%      ~     (p=0.690 n=5+5)
NatSqr/50-8                         38.6µs ± 2%     38.7µs ± 2%      ~     (p=0.841 n=5+5)
NatSqr/80-8                         96.1µs ± 2%     95.8µs ± 2%      ~     (p=0.548 n=5+5)
NatSqr/100-8                         149µs ± 1%      149µs ± 1%      ~     (p=0.841 n=5+5)
NatSqr/200-8                         593µs ± 1%      590µs ± 1%      ~     (p=0.421 n=5+5)
NatSqr/300-8                        1.32ms ± 0%     1.32ms ± 1%      ~     (p=0.222 n=5+5)
NatSqr/500-8                        2.49ms ± 0%     2.49ms ± 0%      ~     (p=0.690 n=5+5)
NatSqr/800-8                        4.69ms ± 0%     4.69ms ± 0%      ~     (p=1.000 n=5+5)
NatSqr/1000-8                       7.59ms ± 0%     7.58ms ± 0%      ~     (p=0.841 n=5+5)
ScanPi-8                             322µs ± 0%      321µs ± 0%      ~     (p=0.095 n=5+5)
StringPiParallel-8                  71.4µs ± 5%     68.8µs ± 4%      ~     (p=0.151 n=5+5)
Scan/10/Base2-8                     1.10µs ± 0%     1.09µs ± 0%    -0.36%  (p=0.032 n=5+5)
Scan/100/Base2-8                    7.78µs ± 0%     7.79µs ± 0%    +0.14%  (p=0.008 n=5+5)
Scan/1000/Base2-8                   78.8µs ± 0%     79.0µs ± 0%    +0.24%  (p=0.008 n=5+5)
Scan/10000/Base2-8                  1.22ms ± 0%     1.22ms ± 0%      ~     (p=0.056 n=5+5)
Scan/100000/Base2-8                 55.1ms ± 0%     55.0ms ± 0%    -0.15%  (p=0.008 n=5+5)
Scan/10/Base8-8                      514ns ± 0%      515ns ± 0%      ~     (p=0.079 n=5+5)
Scan/100/Base8-8                    2.89µs ± 0%     2.89µs ± 0%    +0.15%  (p=0.008 n=5+5)
Scan/1000/Base8-8                   31.0µs ± 0%     31.1µs ± 0%    +0.12%  (p=0.008 n=5+5)
Scan/10000/Base8-8                   740µs ± 0%      740µs ± 0%      ~     (p=0.222 n=5+5)
Scan/100000/Base8-8                 50.6ms ± 0%     50.5ms ± 0%    -0.06%  (p=0.016 n=4+5)
Scan/10/Base10-8                     492ns ± 1%      490ns ± 1%      ~     (p=0.310 n=5+5)
Scan/100/Base10-8                   2.67µs ± 0%     2.67µs ± 0%      ~     (p=0.056 n=5+5)
Scan/1000/Base10-8                  28.7µs ± 0%     28.7µs ± 0%      ~     (p=1.000 n=5+5)
Scan/10000/Base10-8                  717µs ± 0%      716µs ± 0%      ~     (p=0.222 n=5+5)
Scan/100000/Base10-8                50.2ms ± 0%     50.3ms ± 0%    +0.05%  (p=0.008 n=5+5)
Scan/10/Base16-8                     442ns ± 1%      442ns ± 0%      ~     (p=0.468 n=5+5)
Scan/100/Base16-8                   2.46µs ± 0%     2.45µs ± 0%      ~     (p=0.159 n=5+5)
Scan/1000/Base16-8                  27.2µs ± 0%     27.2µs ± 0%      ~     (p=0.841 n=5+5)
Scan/10000/Base16-8                  721µs ± 0%      722µs ± 0%      ~     (p=0.548 n=5+5)
Scan/100000/Base16-8                52.6ms ± 0%     52.6ms ± 0%    +0.07%  (p=0.008 n=5+5)
String/10/Base2-8                    244ns ± 1%      242ns ± 1%      ~     (p=0.103 n=5+5)
String/100/Base2-8                  1.48µs ± 0%     1.48µs ± 1%      ~     (p=0.786 n=5+5)
String/1000/Base2-8                 13.3µs ± 1%     13.3µs ± 0%      ~     (p=0.222 n=5+5)
String/10000/Base2-8                 132µs ± 1%      132µs ± 1%      ~     (p=1.000 n=5+5)
String/100000/Base2-8               1.30ms ± 1%     1.30ms ± 1%      ~     (p=1.000 n=5+5)
String/10/Base8-8                    167ns ± 1%      168ns ± 1%      ~     (p=0.135 n=5+5)
String/100/Base8-8                   623ns ± 1%      626ns ± 1%      ~     (p=0.151 n=5+5)
String/1000/Base8-8                 5.24µs ± 1%     5.24µs ± 0%      ~     (p=1.000 n=5+5)
String/10000/Base8-8                50.0µs ± 1%     50.0µs ± 1%      ~     (p=1.000 n=5+5)
String/100000/Base8-8                492µs ± 1%      489µs ± 1%      ~     (p=0.056 n=5+5)
String/10/Base10-8                   503ns ± 1%      501ns ± 0%      ~     (p=0.183 n=5+5)
String/100/Base10-8                 1.96µs ± 0%     1.97µs ± 0%      ~     (p=0.389 n=5+5)
String/1000/Base10-8                12.4µs ± 1%     12.4µs ± 1%      ~     (p=0.841 n=5+5)
String/10000/Base10-8               56.7µs ± 1%     56.6µs ± 0%      ~     (p=1.000 n=5+5)
String/100000/Base10-8              25.6ms ± 0%     25.6ms ± 0%      ~     (p=0.222 n=5+5)
String/10/Base16-8                   147ns ± 0%      148ns ± 2%      ~     (p=1.000 n=4+5)
String/100/Base16-8                  505ns ± 0%      505ns ± 1%      ~     (p=0.778 n=5+5)
String/1000/Base16-8                3.94µs ± 0%     3.94µs ± 0%      ~     (p=0.841 n=5+5)
String/10000/Base16-8               37.4µs ± 1%     37.2µs ± 1%      ~     (p=0.095 n=5+5)
String/100000/Base16-8               367µs ± 1%      367µs ± 0%      ~     (p=1.000 n=5+5)
LeafSize/0-8                        6.64ms ± 0%     6.65ms ± 0%      ~     (p=0.690 n=5+5)
LeafSize/1-8                        72.5µs ± 1%     72.4µs ± 1%      ~     (p=0.841 n=5+5)
LeafSize/2-8                        72.6µs ± 1%     72.6µs ± 1%      ~     (p=1.000 n=5+5)
LeafSize/3-8                         377µs ± 0%      377µs ± 0%      ~     (p=0.421 n=5+5)
LeafSize/4-8                        71.2µs ± 1%     71.3µs ± 0%      ~     (p=0.278 n=5+5)
LeafSize/5-8                         469µs ± 0%      469µs ± 0%      ~     (p=0.310 n=5+5)
LeafSize/6-8                         376µs ± 0%      376µs ± 0%      ~     (p=0.841 n=5+5)
LeafSize/7-8                         244µs ± 0%      244µs ± 0%      ~     (p=0.841 n=5+5)
LeafSize/8-8                        71.9µs ± 1%     72.1µs ± 1%      ~     (p=0.548 n=5+5)
LeafSize/9-8                         536µs ± 0%      536µs ± 0%      ~     (p=0.151 n=5+5)
LeafSize/10-8                        470µs ± 0%      471µs ± 0%    +0.10%  (p=0.032 n=5+5)
LeafSize/11-8                        458µs ± 0%      458µs ± 0%      ~     (p=0.881 n=5+5)
LeafSize/12-8                        376µs ± 0%      376µs ± 0%      ~     (p=0.548 n=5+5)
LeafSize/13-8                        341µs ± 0%      342µs ± 0%      ~     (p=0.222 n=5+5)
LeafSize/14-8                        246µs ± 0%      245µs ± 0%      ~     (p=0.167 n=5+5)
LeafSize/15-8                        168µs ± 0%      168µs ± 0%      ~     (p=0.548 n=5+5)
LeafSize/16-8                       72.1µs ± 1%     72.2µs ± 1%      ~     (p=0.690 n=5+5)
LeafSize/32-8                       81.5µs ± 1%     81.4µs ± 1%      ~     (p=1.000 n=5+5)
LeafSize/64-8                        133µs ± 1%      134µs ± 1%      ~     (p=0.690 n=5+5)
ProbablyPrime/n=0-8                 44.3ms ± 0%     44.2ms ± 0%    -0.28%  (p=0.008 n=5+5)
ProbablyPrime/n=1-8                 64.8ms ± 0%     64.7ms ± 0%    -0.15%  (p=0.008 n=5+5)
ProbablyPrime/n=5-8                  147ms ± 0%      147ms ± 0%    -0.11%  (p=0.008 n=5+5)
ProbablyPrime/n=10-8                 250ms ± 0%      250ms ± 0%      ~     (p=0.056 n=5+5)
ProbablyPrime/n=20-8                 456ms ± 0%      455ms ± 0%    -0.05%  (p=0.008 n=5+5)
ProbablyPrime/Lucas-8               23.6ms ± 0%     23.5ms ± 0%    -0.29%  (p=0.008 n=5+5)
ProbablyPrime/MillerRabinBase2-8    20.6ms ± 0%     20.6ms ± 0%      ~     (p=0.690 n=5+5)
FloatSqrt/64-8                      2.01µs ± 1%     2.02µs ± 1%      ~     (p=0.421 n=5+5)
FloatSqrt/128-8                     4.43µs ± 2%     4.38µs ± 2%      ~     (p=0.222 n=5+5)
FloatSqrt/256-8                     6.64µs ± 1%     6.68µs ± 2%      ~     (p=0.516 n=5+5)
FloatSqrt/1000-8                    31.9µs ± 0%     31.8µs ± 0%      ~     (p=0.095 n=5+5)
FloatSqrt/10000-8                    595µs ± 0%      594µs ± 0%      ~     (p=0.056 n=5+5)
FloatSqrt/100000-8                  17.9ms ± 0%     17.9ms ± 0%      ~     (p=0.151 n=5+5)
FloatSqrt/1000000-8                  1.52s ± 0%      1.52s ± 0%      ~     (p=0.841 n=5+5)

name                              old speed      new speed       delta
AddVV/1-8                         2.97GB/s ± 0%   2.97GB/s ± 0%      ~     (p=0.971 n=4+4)
AddVV/2-8                         9.47GB/s ± 0%   9.47GB/s ± 0%    +0.01%  (p=0.016 n=5+5)
AddVV/3-8                         12.4GB/s ± 0%   12.4GB/s ± 0%      ~     (p=0.548 n=5+5)
AddVV/4-8                         14.6GB/s ± 0%   14.6GB/s ± 0%      ~     (p=1.000 n=5+5)
AddVV/5-8                         16.4GB/s ± 0%   16.4GB/s ± 0%      ~     (p=1.000 n=5+5)
AddVV/10-8                        21.7GB/s ± 0%   21.7GB/s ± 0%      ~     (p=0.548 n=5+5)
AddVV/100-8                       29.4GB/s ± 0%   29.4GB/s ± 0%      ~     (p=1.000 n=5+5)
AddVV/1000-8                      31.7GB/s ± 0%   31.7GB/s ± 0%      ~     (p=0.524 n=5+4)
AddVV/10000-8                     31.5GB/s ± 0%   31.5GB/s ± 0%      ~     (p=0.690 n=5+5)
AddVV/100000-8                    28.8GB/s ± 7%   28.1GB/s ± 8%      ~     (p=0.548 n=5+5)
AddVW/1-8                          859MB/s ± 0%    864MB/s ± 0%    +0.61%  (p=0.008 n=5+5)
AddVW/2-8                          809MB/s ± 2%   1520MB/s ± 0%   +87.78%  (p=0.008 n=5+5)
AddVW/3-8                         2.08GB/s ± 0%   2.18GB/s ± 0%    +4.54%  (p=0.008 n=5+5)
AddVW/4-8                         2.46GB/s ± 0%   2.66GB/s ± 0%    +8.33%  (p=0.016 n=4+5)
AddVW/5-8                         2.76GB/s ± 0%   3.20GB/s ± 0%   +16.03%  (p=0.008 n=5+5)
AddVW/10-8                        3.63GB/s ± 0%   5.15GB/s ± 0%   +41.83%  (p=0.008 n=5+5)
AddVW/100-8                       4.79GB/s ± 0%   9.87GB/s ± 0%  +106.12%  (p=0.008 n=5+5)
AddVW/1000-8                      5.27GB/s ± 0%  12.42GB/s ± 0%  +135.74%  (p=0.008 n=5+5)
AddVW/10000-8                     5.31GB/s ± 0%  11.19GB/s ± 0%  +110.71%  (p=0.008 n=5+5)
AddVW/100000-8                    5.32GB/s ± 0%  11.32GB/s ± 0%  +112.56%  (p=0.008 n=5+5)
SubVW/1-8                          859MB/s ± 0%    864MB/s ± 0%    +0.61%  (p=0.008 n=5+5)
SubVW/2-8                          812MB/s ± 2%   1520MB/s ± 0%   +87.09%  (p=0.008 n=5+5)
SubVW/3-8                         2.08GB/s ± 0%   2.18GB/s ± 0%    +4.55%  (p=0.008 n=5+5)
SubVW/4-8                         2.46GB/s ± 0%   2.66GB/s ± 0%    +8.33%  (p=0.008 n=5+5)
SubVW/5-8                         2.75GB/s ± 0%   3.20GB/s ± 0%   +16.03%  (p=0.008 n=5+5)
SubVW/10-8                        3.63GB/s ± 0%   5.15GB/s ± 0%   +41.82%  (p=0.008 n=5+5)
SubVW/100-8                       4.79GB/s ± 0%   9.87GB/s ± 0%  +106.13%  (p=0.008 n=5+5)
SubVW/1000-8                      5.27GB/s ± 0%  12.42GB/s ± 0%  +135.74%  (p=0.008 n=5+5)
SubVW/10000-8                     5.31GB/s ± 0%  11.17GB/s ± 0%  +110.44%  (p=0.008 n=5+5)
SubVW/100000-8                    5.32GB/s ± 0%  11.31GB/s ± 0%  +112.35%  (p=0.008 n=5+5)
AddMulVVW/1-8                     1.97GB/s ± 1%   1.96GB/s ± 1%      ~     (p=0.151 n=5+5)
AddMulVVW/2-8                     2.24GB/s ± 0%   2.25GB/s ± 0%      ~     (p=0.095 n=5+5)
AddMulVVW/3-8                     2.11GB/s ± 0%   2.12GB/s ± 0%      ~     (p=0.548 n=5+5)
AddMulVVW/4-8                     2.17GB/s ± 1%   2.17GB/s ± 1%      ~     (p=0.548 n=5+5)
AddMulVVW/5-8                     2.22GB/s ± 1%   2.21GB/s ± 1%      ~     (p=0.421 n=5+5)
AddMulVVW/10-8                    2.17GB/s ± 1%   2.16GB/s ± 0%      ~     (p=0.095 n=5+5)
AddMulVVW/100-8                   2.35GB/s ± 0%   2.35GB/s ± 0%      ~     (p=0.421 n=5+5)
AddMulVVW/1000-8                  2.47GB/s ± 0%   2.41GB/s ± 0%    -2.09%  (p=0.008 n=5+5)
AddMulVVW/10000-8                 2.16GB/s ± 0%   2.15GB/s ± 0%    -0.23%  (p=0.008 n=5+5)
AddMulVVW/100000-8                2.03GB/s ± 1%   2.04GB/s ± 0%      ~     (p=0.690 n=5+5)

name                              old alloc/op   new alloc/op    delta
FloatString/100-8                     400B ± 0%       400B ± 0%      ~     (all equal)
FloatString/1000-8                  3.22kB ± 0%     3.22kB ± 0%      ~     (all equal)
FloatString/10000-8                 55.6kB ± 0%     55.5kB ± 0%      ~     (p=0.206 n=5+5)
FloatString/100000-8                 627kB ± 0%      627kB ± 0%      ~     (all equal)
FloatAdd/10-8                        0.00B           0.00B           ~     (all equal)
FloatAdd/100-8                       0.00B           0.00B           ~     (all equal)
FloatAdd/1000-8                      0.00B           0.00B           ~     (all equal)
FloatAdd/10000-8                     0.00B           0.00B           ~     (all equal)
FloatAdd/100000-8                    0.00B           0.00B           ~     (all equal)
FloatSub/10-8                        0.00B           0.00B           ~     (all equal)
FloatSub/100-8                       0.00B           0.00B           ~     (all equal)
FloatSub/1000-8                      0.00B           0.00B           ~     (all equal)
FloatSub/10000-8                     0.00B           0.00B           ~     (all equal)
FloatSub/100000-8                    0.00B           0.00B           ~     (all equal)
FloatSqrt/64-8                        416B ± 0%       416B ± 0%      ~     (all equal)
FloatSqrt/128-8                       720B ± 0%       720B ± 0%      ~     (all equal)
FloatSqrt/256-8                       816B ± 0%       816B ± 0%      ~     (all equal)
FloatSqrt/1000-8                    2.50kB ± 0%     2.50kB ± 0%      ~     (all equal)
FloatSqrt/10000-8                   23.5kB ± 0%     23.5kB ± 0%      ~     (all equal)
FloatSqrt/100000-8                   251kB ± 0%      251kB ± 0%      ~     (all equal)
FloatSqrt/1000000-8                 4.61MB ± 0%     4.61MB ± 0%      ~     (all equal)

name                              old allocs/op  new allocs/op   delta
FloatString/100-8                     8.00 ± 0%       8.00 ± 0%      ~     (all equal)
FloatString/1000-8                    10.0 ± 0%       10.0 ± 0%      ~     (all equal)
FloatString/10000-8                   42.0 ± 0%       42.0 ± 0%      ~     (all equal)
FloatString/100000-8                   346 ± 0%        346 ± 0%      ~     (all equal)
FloatAdd/10-8                         0.00            0.00           ~     (all equal)
FloatAdd/100-8                        0.00            0.00           ~     (all equal)
FloatAdd/1000-8                       0.00            0.00           ~     (all equal)
FloatAdd/10000-8                      0.00            0.00           ~     (all equal)
FloatAdd/100000-8                     0.00            0.00           ~     (all equal)
FloatSub/10-8                         0.00            0.00           ~     (all equal)
FloatSub/100-8                        0.00            0.00           ~     (all equal)
FloatSub/1000-8                       0.00            0.00           ~     (all equal)
FloatSub/10000-8                      0.00            0.00           ~     (all equal)
FloatSub/100000-8                     0.00            0.00           ~     (all equal)
FloatSqrt/64-8                        9.00 ± 0%       9.00 ± 0%      ~     (all equal)
FloatSqrt/128-8                       13.0 ± 0%       13.0 ± 0%      ~     (all equal)
FloatSqrt/256-8                       12.0 ± 0%       12.0 ± 0%      ~     (all equal)
FloatSqrt/1000-8                      19.0 ± 0%       19.0 ± 0%      ~     (all equal)
FloatSqrt/10000-8                     35.0 ± 0%       35.0 ± 0%      ~     (all equal)
FloatSqrt/100000-8                    55.0 ± 0%       55.0 ± 0%      ~     (all equal)
FloatSqrt/1000000-8                    122 ± 0%        122 ± 0%      ~     (all equal)

Change-Id: I6888d84c037d91f9e2199f3492ea3f6a0ed77b24
Reviewed-on: https://go-review.googlesource.com/77832
Reviewed-by: Vlad Krasnov <vlad@cloudflare.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

cmd/asm, cmd/internal/obj/ppc64: avoid unnecessary load zeros

When instructions add, and, or, xor, and movd have
constant operands in some cases more instructions are
generated than necessary by the assembler.

This adds more opcode/operand combinations to the optab
and improves the code generation for the cases where the
size and sign of the constant allows the use of 1
instructions instead of 2.

Example of previous code:
oris r3, r0, 0
ori r3, r3, 65533

now:
ori r3, r0, 65533

This does not significantly reduce the overall binary size
because the improvement depends on the constant value.
Some procedures show a 1-2% reduction in size. This improvement
could also be significant in cases where the extra instructions
occur in a critical loop.

Testcase ppc64enc.s was added to cmd/asm/internal/asm/testdata
with the variations affected by this change.

Updates #23845

Change-Id: I7fdf2320c95815d99f2755ba77d0c6921cd7fad7
Reviewed-on: https://go-review.googlesource.com/95135
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>

encoding/csv: avoid mangling invalid UTF-8 in Writer

In the situation where a quoted field is necessary, avoid processing
each UTF-8 rune one-by-one, which causes mangling of invalid sequences
into utf8.RuneError, causing a loss of information.
Instead, search only for the escaped characters, handle those specially
and copy everything else in between verbatim.

This symmetrically matches the behavior of Reader.

Fixes #24298

Change-Id: I9276f64891084ce8487678f663fad711b4095dbb
Reviewed-on: https://go-review.googlesource.com/99297
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

cmd/compile: mark anonymous receiver parameters as non-escaping

This was already done for normal parameters, and the same logic
applies for receiver parameters too.

Updates #24305.

Change-Id: Ia2a46f68d14e8fb62004ff0da1db0f065a95a1b7
Reviewed-on: https://go-review.googlesource.com/99335
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

cmd/cover: don't crash on non-gofmt'ed input

Without the change to cover.go, the new test fails with

panic: overlapping edits: [4946,4950)->"", [4947,4947)->"thisNameMustBeVeryLongToCauseOverflowOfCounterIncrementStatementOntoNextLineForTest.Count[112]++;"

The original code inserts "else{", deletes "else", and then positions
a new block just after the "}" that must come before the "else".
That works on gofmt'ed code, but fails if the code looks like "}else".
When there is no space between the "{" and the "else", the new block
is inserted into a location that we are deleting, leading to the
"overlapping edits" mentioned above.

This CL fixes this case by not deleting the "else" but just using the
one that is already there. That requires adjust the block offset to
come after the "{" that we insert.

Fixes #23927

Change-Id: I40ef592490878765bbce6550ddb439e43ac525b2
Reviewed-on: https://go-review.googlesource.com/98935
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>

runtime: get traceback from VDSO code

Currently if a profiling signal arrives while executing within a VDSO
the profiler will report _ExternalCode, which is needlessly confusing
for a pure Go program. Change the VDSO calling code to record the
caller's PC/SP, so that we can do a traceback from that point. If that
fails for some reason, report _VDSO rather than _ExternalCode, which
should at least point in the right direction.

This adds some instructions to the code that calls the VDSO, but the
slowdown is reasonably negligible:

name                                  old time/op  new time/op  delta
ClockVDSOAndFallbackPaths/vDSO-8      40.5ns ± 2%  41.3ns ± 1%  +1.85%  (p=0.002 n=10+10)
ClockVDSOAndFallbackPaths/Fallback-8  41.9ns ± 1%  43.5ns ± 1%  +3.84%  (p=0.000 n=9+9)
TimeNow-8                             41.5ns ± 3%  41.5ns ± 2%    ~     (p=0.723 n=10+10)

Fixes #24142

Change-Id: Iacd935db3c4c782150b3809aaa675a71799b1c9c
Reviewed-on: https://go-review.googlesource.com/97315
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

runtime: change from rt_sigaction to sigaction

This normalizes the Linux code to act like other targets. The size
argument to the rt_sigaction system call is pushed to a single
function, sysSigaction.

This is intended as a simplification step for CL 93875 for #14327.

Change-Id: I594788e235f0da20e16e8a028e27ac8c883907c4
Reviewed-on: https://go-review.googlesource.com/99077
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

cmd/dist: skip rebuild before running tests when on the build systems

Updates #24300

Change-Id: I7752dab67e15a6dfe5fffe5b5ccbf3373bbc2c13
Reviewed-on: https://go-review.googlesource.com/99296
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

math/big: implement addMulVVW on arm64

The lack of proper addMulVVW implementation for arm64 hurts RSA performance.

This assembly implementation is optimized for arm64 based servers.

name                  old time/op    new time/op     delta
pkg:math/big goos:linux goarch:arm64
AddMulVVW/1             55.2ns ± 0%     11.9ns ± 1%    -78.37%  (p=0.000 n=8+10)
AddMulVVW/2             67.0ns ± 0%     11.2ns ± 0%    -83.28%  (p=0.000 n=7+10)
AddMulVVW/3             93.2ns ± 0%     13.2ns ± 0%    -85.84%  (p=0.000 n=10+10)
AddMulVVW/4              126ns ± 0%       13ns ± 1%    -89.82%  (p=0.000 n=10+10)
AddMulVVW/5              151ns ± 0%       17ns ± 0%    -88.87%  (p=0.000 n=10+9)
AddMulVVW/10             323ns ± 0%       25ns ± 0%    -92.20%  (p=0.000 n=10+10)
AddMulVVW/100           3.28µs ± 0%     0.14µs ± 0%    -95.82%  (p=0.000 n=10+10)
AddMulVVW/1000          31.7µs ± 0%      1.3µs ± 0%    -96.00%  (p=0.000 n=10+8)
AddMulVVW/10000          313µs ± 0%       13µs ± 0%    -95.98%  (p=0.000 n=10+10)
AddMulVVW/100000        3.24ms ± 0%     0.13ms ± 1%    -96.13%  (p=0.000 n=9+9)
pkg:crypto/rsa goos:linux goarch:arm64
RSA2048Decrypt          44.7ms ± 0%      4.0ms ± 6%    -91.08%  (p=0.000 n=8+10)
RSA2048Sign             46.3ms ± 0%      5.0ms ± 0%    -89.29%  (p=0.000 n=9+10)
3PrimeRSA2048Decrypt    22.3ms ± 0%      2.4ms ± 0%    -89.26%  (p=0.000 n=10+10)

Change-Id: I295f0bd5c51a4442d02c44ece1f6026d30dff0bc
Reviewed-on: https://go-review.googlesource.com/76270
Reviewed-by: Vlad Krasnov <vlad@cloudflare.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Vlad Krasnov <vlad@cloudflare.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

cmd/go: skip TestVetWithOnlyCgoFiles when cgo is disabled

CL 99175 added TestVetWithOnlyCgoFiles. However, this
test is failing on platforms where cgo is disabled,
because no file can be built.

This change fixes TestVetWithOnlyCgoFiles by skipping
this test when cgo is disabled.

Fixes #24304.

Change-Id: Ibb38fcd3e0ed1a791782145d3f2866f12117c6fe
Reviewed-on: https://go-review.googlesource.com/99275
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

runtime/cgo: make sure nil is undefined before defining it

While working on standalone builds of gomobile bindings, I ran into
errors on the form:

gcc_darwin_arm.c:30:31: error: ambiguous expansion of macro 'nil' [-Werror,-Wambiguous-macro]
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS11.2.sdk/usr/include/MacTypes.h:94:15: note: expanding this definition of 'nil'

Fix it by undefining nil before defining it in libcgo.h.

Change-Id: I8e9660a68c6c351e592684d03d529f0d182c0493
Reviewed-on: https://go-review.googlesource.com/99215
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

cmd/go: run vet on packages with only cgo files

CgoFiles is not included in GoFiles, so we need to check both.

Fixes #24193

Change-Id: I6a67bd912e3d9a4be0eae8fa8db6fa8a07fb5df3
Reviewed-on: https://go-review.googlesource.com/99175
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

cmd/compile: prevent untyped types from reaching walk

We already require expressions to have already been typechecked before
reaching walk. Moreover, all untyped expressions should have been
converted to their default type by walk.

However, in practice, we've been somewhat sloppy and inconsistent
about ensuring this. In particular, a lot of AST rewrites ended up
leaving untyped bool expressions scattered around. These likely aren't
harmful in practice, but it seems worth cleaning up.

The two most common cases addressed by this CL are:

1) When generating OIF and OFOR nodes, we would often typecheck the
conditional expression, but not apply defaultlit to force it to the
expression's default type.

2) When rewriting string comparisons into more fundamental primitives,
we were simply overwriting r.Type with the desired type, which didn't
propagate the type to nested subexpressions. These are fixed by
utilizing finishcompare, which correctly handles this (and is already
used by other comparison lowering rewrites).

Lastly, walkexpr is extended to assert that it's not called on untyped
expressions.

Fixes #23834.

Change-Id: Icbd29648a293555e4015d3b06a95a24ccbd3f790
Reviewed-on: https://go-review.googlesource.com/98337
Reviewed-by: Robert Griesemer <gri@golang.org>

cmd/compile: go fmt

Change-Id: I2eae33928641c6ed74badfe44d079ae90e5cc8c8
Reviewed-on: https://go-review.googlesource.com/99195
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

test/codegen: fix issue with arm64 memmove codegen test

This recently added arm64 memmove codegen check:

  func movesmall() {
    // arm64:-"memmove"
    x := [...]byte{1, 2, 3, 4, 5, 6, 7}
    copy(x[1:], x[:])
  }

is not correct, for two reasons:

1. regexps are matched from the start of the disasm line (excluding
   line information). This mean that a negative -"memmove" check will
   pass against a 'CALL runtime.memmove' line because the line does
   not start with 'memmove' (its starts with CALL...).
   The way to specify no 'memmove' match whatsoever on the line is
   -".*memmove"

2. AFAIK comments on their own line are matched against the first
   subsequent non-comment line. So the code above only verifies that
   the x := ... line does not generate a memmove. The comment should
   be moved near the copy() line, if it's that one we want to not
   generate a memmove call.

The fact that the test above is not effective can be checked by
running `go run run.go -v codegen` in the toplevel test directory with
a go1.10 toolchain (that does not have the memmove-elision
optimization). The test will still pass (it shouldn't).

This change changes the regexp to -".*memmove" and moves it near the
line it needs to (not)match.

Change-Id: Ie01ef4d775e77d92dc8d8b7856b89b200f5e5ef2
Reviewed-on: https://go-review.googlesource.com/98977
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

debug/pe: use bytes.IndexByte instead of a loop

Follow CL 98759

Change-Id: I58c8b769741b395e5bf4e723505b149d063d492a
Reviewed-on: https://go-review.googlesource.com/99095
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

database/sql: fix typo in comment

Change-Id: Ie2966bae1dc2e542c42fb32d8059a4b2d4690014
Reviewed-on: https://go-review.googlesource.com/99115
Reviewed-by: Ian Lance Taylor <iant@golang.org>

cmd/trace: force GC occassionally

to return memory to the OS after completing potentially
large operations.

Update #21870

Sys went down to 3.7G

$ DEBUG_MEMORY_USAGE=1 go tool trace trace.out

2018/03/07 09:35:52 Parsing trace...
after parsing trace
Alloc: 3385754360 Bytes
Sys: 3662047864 Bytes
HeapReleased: 0 Bytes
HeapSys: 3488907264 Bytes
HeapInUse: 3426549760 Bytes
HeapAlloc: 3385754360 Bytes
Enter to continue...
2018/03/07 09:36:09 Splitting trace...
after spliting trace
Alloc: 3238309424 Bytes
Sys: 3684410168 Bytes
HeapReleased: 0 Bytes
HeapSys: 3488874496 Bytes
HeapInUse: 3266461696 Bytes
HeapAlloc: 3238309424 Bytes
Enter to continue...
2018/03/07 09:36:39 Opening browser. Trace viewer is listening on http://100.101.224.241:12345

after httpJsonTrace
Alloc: 3000633872 Bytes
Sys: 3693978424 Bytes
HeapReleased: 0 Bytes
HeapSys: 3488743424 Bytes
HeapInUse: 3030966272 Bytes
HeapAlloc: 3000633872 Bytes
Enter to continue...

Change-Id: I56f64cae66c809cbfbad03fba7bd0d35494c1d04
Reviewed-on: https://go-review.googlesource.com/92376
Reviewed-by: Peter Weinberger <pjw@google.com>

go/build: correct value of .Doc field

Build could use the package comment from test files to populate the .Doc
field on *Package.

As go list uses this data and several packages in the standard library
have tests with package comments, this lead to:

$ go list -f '{{.Doc}}' flag container/heap image
These examples demonstrate more intricate uses of the flag package.
This example demonstrates an integer heap built using the heap interface.
This example demonstrates decoding a JPEG image and examining its pixels.

This change now only examines non-test files when attempting to populate
.Doc, resulting in the expected behavior:

$ gotip list -f '{{.Doc}}' flag container/heap image
Package flag implements command-line flag parsing.
Package heap provides heap operations for any type that implements heap.Interface.
Package image implements a basic 2-D image library.

Fixes #23594

Change-Id: I37171c26ec5cc573efd273556a05223c6f675968
Reviewed-on: https://go-review.googlesource.com/96976
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

cmd/trace: generate jsontrace data in a streaming fashion

Update #21870

The Sys went down to 4.25G from 6.2G.

$ DEBUG_MEMORY_USAGE=1 go tool trace trace.out
2018/03/07 08:49:01 Parsing trace...
after parsing trace
Alloc: 3385757184 Bytes
Sys: 3661195896 Bytes
HeapReleased: 0 Bytes
HeapSys: 3488841728 Bytes
HeapInUse: 3426516992 Bytes
HeapAlloc: 3385757184 Bytes
Enter to continue...
2018/03/07 08:49:18 Splitting trace...
after spliting trace
Alloc: 2352071904 Bytes
Sys: 4243825464 Bytes
HeapReleased: 0 Bytes
HeapSys: 4025712640 Bytes
HeapInUse: 2377703424 Bytes
HeapAlloc: 2352071904 Bytes
Enter to continue...
after httpJsonTrace
Alloc: 3228697832 Bytes
Sys: 4250379064 Bytes
HeapReleased: 0 Bytes
HeapSys: 4025647104 Bytes
HeapInUse: 3260014592 Bytes
HeapAlloc: 3228697832 Bytes

Change-Id: I546f26bdbc68b1e58f1af1235a0e299dc0ff115e
Reviewed-on: https://go-review.googlesource.com/92375
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

runtime: add missing build constraints to os_linux_{be64,noauxv,novdso,ppc64x}.go files

They do not match the file name patterns of
  *_GOOS
  *_GOARCH
  *_GOOS_GOARCH
therefore the implicit linux constraint was not being added.

Change-Id: Ie506c51cee6818db445516f96fffaa351df62cf5
Reviewed-on: https://go-review.googlesource.com/99116
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

androidtest.bash: don't require GOARCH set

The host GOARCH is most likely supported (386, amd64, arm, arm64).

Change-Id: I86324b9c00f22c592ba54bda7d2ae97c86bda904
Reviewed-on: https://go-review.googlesource.com/99155
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>

os: use WIN32_FIND_DATA.Reserved0 to identify symlinks

os.Stat implementation uses instructions described at
https://blogs.msdn.microsoft.com/oldnewthing/20100212-00/?p=14963/
to distinguish symlinks. In particular, it calls
GetFileAttributesEx or FindFirstFile and checks
either WIN32_FILE_ATTRIBUTE_DATA.dwFileAttributes
or WIN32_FIND_DATA.dwFileAttributes to see if
FILE_ATTRIBUTES_REPARSE_POINT flag is set.
And that seems to worked fine so far.

But now we discovered that OneDrive root folder
is determined as directory:

c:\>dir C:\Users\Alex | grep OneDrive
30/11/2017 07:25 PM <DIR> OneDrive
c:\>

while Go identified it as symlink.

But we did not follow Microsoft's advice to the letter - we never
checked WIN32_FIND_DATA.Reserved0. And adding that extra check
makes Go treat OneDrive as symlink. So use FindFirstFile and
WIN32_FIND_DATA.Reserved0 to determine symlinks.

Fixes #22579

Change-Id: I0cb88929eb8b47b1d24efaf1907ad5a0e20de83f
Reviewed-on: https://go-review.googlesource.com/86556
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

cmd/compile: remove funcdepth variables

There were only two large classes of use for these variables:

1) Testing "funcdepth != 0" or "funcdepth > 0", which is equivalent to
checking "Curfn != nil".

2) In oldname, detecting whether a closure variable has been created
for the current function, which can be handled by instead testing
"n.Name.Curfn != Curfn".

Lastly, merge funcstart into funchdr, since it's only called once, and
it better matches up with funcbody now.

Passes toolstash-check.

Change-Id: I8fe159a9d37ef7debc4cd310354cea22a8b23394
Reviewed-on: https://go-review.googlesource.com/99076
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

cmd/compile: cleanup funccompile and compile

Bring these functions next to each other, and clean them up a little
bit. Also, change emitptrargsmap to take Curfn as a parameter instead
of a global.

Passes toolstash-check.

Change-Id: Ib9c94fda3b2cb6f0dcec1585622b33b4f311b5e9
Reviewed-on: https://go-review.googlesource.com/99075
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

cmd/compile: prevent detection of wrong duplicates

by including *types.Type in typeVal.

Updates #21866
Fixes #24159

Change-Id: I2f8cac252d88d43e723124f2867b1410b7abab7b
Reviewed-on: https://go-review.googlesource.com/98476
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>

cmd/compile: fix miscompilation of "defer delete(m, k)"

Previously, for slow map key types (i.e., any type other than a 32-bit
or 64-bit plain memory type), we would rewrite

    defer delete(m, k)

into

    ktmp := k
    defer delete(m, &ktmp)

However, if the defer statement was inside a loop, we would end up
reusing the same ktmp value for all of the deferred deletes.

We already rewrite

    defer print(x, y, z)

into

    defer func(a1, a2, a3) {
        print(a1, a2, a3)
    }(x, y, z)

This CL generalizes this rewrite to also apply for slow map deletes.

This could be extended to apply even more generally to other builtins,
but as discussed on #24259, there are cases where we must *not* do
this (e.g., "defer recover()"). However, if we elect to do this more
generally, this CL should still make that easier.

Lastly, while here, fix a few isues in wrapCall (nee walkprintfunc):

1) lookupN appends the generation number to the symbol anyway, so "%d"
was being literally included in the generated function names.

2) walkstmt will be called when the function is compiled later anyway,
so no need to do it now.

Fixes #24259.

Change-Id: I70286867c64c69c18e9552f69e3f4154a0fc8b04
Reviewed-on: https://go-review.googlesource.com/99017
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

internal/poll: if poller init fails, assume blocking mode

Fixes #23943

Change-Id: I16e604872f1615963925ec3c4710106bcce1330c
Reviewed-on: https://go-review.googlesource.com/99015
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

cmd/compile: improve compiler error on embedded structs

Fixes #23609

Change-Id: I751aae3d849de7fce1306324fcb1a4c3842d873e
Reviewed-on: https://go-review.googlesource.com/97076
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

test/codegen: port math/bits.ReverseBytes tests to codegen

And remove them from ssa_test.

Change-Id: If767af662801219774d1bdb787c77edfa6067770
Reviewed-on: https://go-review.googlesource.com/98976
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>

cmd/compile/internal/ssa: improve store combine optimization on arm64

Current implementation doesn't consider MOVDreg type operand and fail to combine
it into larger store. This patch fixes the issue.

Fixes #24242

Change-Id: I7d68697f80e76f48c3528ece01a602bf513248ec
Reviewed-on: https://go-review.googlesource.com/98397
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

encoding/binary: use an offset instead of slicing

While running make.bash, over 5% of all pointer writes
come from encoding/binary doing struct reads.

This change replaces slicing during such reads with an offset.
This avoids updating the slice pointer with every
struct field read or write.

This has no impact when the write barrier is off.
Running the benchmarks with GOGC=1, however,
shows significant improvement:

name          old time/op    new time/op    delta
ReadStruct-8    13.2µs ± 6%    10.1µs ± 5%  -23.24%  (p=0.000 n=10+10)

name          old speed      new speed      delta
ReadStruct-8  5.69MB/s ± 6%  7.40MB/s ± 5%  +30.18%  (p=0.000 n=10+10)

Change-Id: I22904263196bfeddc38abe8989428e263aee5253
Reviewed-on: https://go-review.googlesource.com/98757
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

runtime: skip pointless writes in freedefer

Change-Id: I501a0e5c87ec88616c7dcdf1b723758b6df6c088
Reviewed-on: https://go-review.googlesource.com/98758
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

debug/macho: use bytes.IndexByte instead of a loop

Simpler, and no doubt faster.

Change-Id: Idd401918da07a257de365087721e9ff061e6fd07
Reviewed-on: https://go-review.googlesource.com/98759
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

cmd/compile/internal/ssa: inline small memmove for arm64

This patch enables the optimization for arm64 target.

Performance results on Amberwing for strconv benchmark:
name             old time/op  new time/op  delta
Quote             721ns ± 0%   617ns ± 0%  -14.40%  (p=0.016 n=5+4)
QuoteRune         118ns ± 0%   117ns ± 0%   -0.85%  (p=0.008 n=5+5)
AppendQuote       436ns ± 2%   321ns ± 0%  -26.31%  (p=0.008 n=5+5)
AppendQuoteRune  34.7ns ± 0%  28.4ns ± 0%  -18.16%  (p=0.000 n=5+4)
[Geo mean]        189ns        160ns       -15.41%

Change-Id: I5714c474e7483d07ca338fbaf49beb4bbcc11c44
Reviewed-on: https://go-review.googlesource.com/98735
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

test/codegen: port math/bits.OnesCount tests to codegen

And remove them from ssa_test.

Change-Id: I3efac5fea529bb0efa2dae32124530482ba5058e
Reviewed-on: https://go-review.googlesource.com/98815
Reviewed-by: Keith Randall <khr@golang.org>

cmd/internal/obj/arm64: gofmt

Change-Id: Ica778fef2d0245fbb14f595597e45c7cf6adef84
Reviewed-on: https://go-review.googlesource.com/98895
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

iostest.bash: don't build std library twice

Instead, mirror androidtest.bash and build once, then run run.bash.

Change-Id: I174ae30b2a429a62b20bb290a70cb07ed712b1e4
Reviewed-on: https://go-review.googlesource.com/98915
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

cmd/dist: default to GOARM=7 on android

Auto-detecting GOARM on Android makes as little sense as for nacl/arm
and darwin/arm.

Also update androidtest.sh to not require GOARM set.

Change-Id: Id409ce1573d3c668d00fa4b7e3562ad7ece6fef5
Reviewed-on: https://go-review.googlesource.com/98875
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

math/big: don't use R18 in ARM64 assembly

R18 seems reserved on Apple platforms.

May fix darwin/arm64 build.

Change-Id: Ia2c1de550a64827c85a64affa53b94c62aacce8e
Reviewed-on: https://go-review.googlesource.com/98896
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>

runtime: fix stack switch check in walltime/nanotime on linux/arm

CL 98095 got the check wrong. We should be testing
'getg() == getg().m.curg', not 'getg().m == getg().m.curg'.

Change-Id: I32f6238b00409b67afa8efe732513d542aec5bc7
Reviewed-on: https://go-review.googlesource.com/98855
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

test/codegen: port math/bits.TrailingZeros tests to codegen

And remove them from ssa_test.

Change-Id: Ib5de5c0d908f23915e0847eca338cacf2fa5325b
Reviewed-on: https://go-review.googlesource.com/98795
Reviewed-by: Giovanni Bajo <rasky@develer.com>

net/http: correct subtle transposition of offset and whence in test

Change-Id: I788972bdf85c0225397c0e74901bf9c33c6d30c7
GitHub-Last-Rev: 57737fe782bf7ad2d765c2efd80d75b3baca2c7b
GitHub-Pull-Request: golang/go#24265
Reviewed-on: https://go-review.googlesource.com/98761
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

runtime, cmd/compile: use ldp for DUFFCOPY on ARM64

name         old time/op  new time/op  delta
CopyFat8     2.15ns ± 1%  2.19ns ± 6%     ~     (p=0.171 n=8+9)
CopyFat12    2.15ns ± 0%  2.17ns ± 2%     ~     (p=0.137 n=8+10)
CopyFat16    2.17ns ± 3%  2.15ns ± 0%     ~     (p=0.211 n=10+10)
CopyFat24    2.16ns ± 1%  2.15ns ± 0%     ~     (p=0.087 n=10+10)
CopyFat32    11.5ns ± 0%  12.8ns ± 2%  +10.87%  (p=0.000 n=8+10)
CopyFat64    20.2ns ± 2%  12.9ns ± 0%  -36.11%  (p=0.000 n=10+10)
CopyFat128   37.2ns ± 0%  21.5ns ± 0%  -42.20%  (p=0.000 n=10+10)
CopyFat256   71.6ns ± 0%  38.7ns ± 0%  -45.95%  (p=0.000 n=10+10)
CopyFat512    140ns ± 0%    73ns ± 0%  -47.86%  (p=0.000 n=10+9)
CopyFat520    142ns ± 0%    74ns ± 0%  -47.54%  (p=0.000 n=10+10)
CopyFat1024   277ns ± 0%   141ns ± 0%  -49.10%  (p=0.000 n=10+10)

Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a
Reviewed-on: https://go-review.googlesource.com/97395
Reviewed-by: Cherry Zhang <cherryyz@google.com>

cmd/doc: make local dot-slash path names work

Before, an argument that started ./ or ../ was not treated as
a package relative to the current directory. Thus

$ cd $GOROOT/src/text
$ go doc ./template

could find html/template as $GOROOT/src/html/./template
is a valid Go source directory.

Fix this by catching such paths and making them absolute before
processing.

Fixes #23383.

Change-Id: Ic2a92eaa3a6328f728635657f9de72ac3ee82afb
Reviewed-on: https://go-review.googlesource.com/98396
Reviewed-by: Ian Lance Taylor <iant@golang.org>

crypto/aes: optimize arm64 AES implementation

This patch makes use of arm64 AES instructions to accelerate AES computation
and only supports optimization on Linux for arm64

name        old time/op    new time/op     delta
Encrypt-32     255ns ± 0%       26ns ± 0%   -89.73%
Decrypt-32     256ns ± 0%       26ns ± 0%   -89.77%
Expand-32      990ns ± 5%      901ns ± 0%    -9.05%

name        old speed      new speed       delta
Encrypt-32  62.5MB/s ± 0%  610.4MB/s ± 0%  +876.39%
Decrypt-32  62.3MB/s ± 0%  610.2MB/s ± 0%  +879.6%

Fixes #18498

Change-Id: If416e5a151785325527b32ff72f6da3812493ed0
Reviewed-on: https://go-review.googlesource.com/64490
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

math/big: optimize addVV and subVV on arm64

The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance.
By unrolling the cycle 4x and 2x, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance.

Benchmarks:

name                              old time/op    new time/op     delta
AddVV/1-8                           21.5ns ± 0%     11.5ns ± 0%   -46.51%  (p=0.008 n=5+5)
AddVV/2-8                           13.5ns ± 0%     12.0ns ± 0%   -11.11%  (p=0.008 n=5+5)
AddVV/3-8                           15.5ns ± 0%     13.0ns ± 0%   -16.13%  (p=0.008 n=5+5)
AddVV/4-8                           17.5ns ± 0%     13.5ns ± 0%   -22.86%  (p=0.008 n=5+5)
AddVV/5-8                           19.5ns ± 0%     14.5ns ± 0%   -25.64%  (p=0.008 n=5+5)
AddVV/10-8                          29.5ns ± 0%     18.0ns ± 0%   -38.98%  (p=0.008 n=5+5)
AddVV/100-8                          217ns ± 0%       94ns ± 0%   -56.64%  (p=0.008 n=5+5)
AddVV/1000-8                        2.02µs ± 0%     1.03µs ± 0%   -48.85%  (p=0.008 n=5+5)
AddVV/10000-8                       20.5µs ± 0%     11.3µs ± 0%   -44.70%  (p=0.008 n=5+5)
AddVV/100000-8                       247µs ± 3%      154µs ± 0%   -37.52%  (p=0.008 n=5+5)
SubVV/1-8                           21.5ns ± 0%     11.5ns ± 0%      ~     (p=0.079 n=4+5)
SubVV/2-8                           13.5ns ± 0%     12.0ns ± 0%   -11.11%  (p=0.008 n=5+5)
SubVV/3-8                           15.5ns ± 0%     13.0ns ± 0%   -16.13%  (p=0.008 n=5+5)
SubVV/4-8                           17.5ns ± 0%     13.5ns ± 0%   -22.86%  (p=0.008 n=5+5)
SubVV/5-8                           19.5ns ± 0%     14.5ns ± 0%   -25.64%  (p=0.008 n=5+5)
SubVV/10-8                          29.5ns ± 0%     18.0ns ± 0%   -38.98%  (p=0.008 n=5+5)
SubVV/100-8                          217ns ± 0%       94ns ± 0%   -56.64%  (p=0.008 n=5+5)
SubVV/1000-8                        2.02µs ± 0%     0.80µs ± 0%   -60.50%  (p=0.008 n=5+5)
SubVV/10000-8                       20.5µs ± 0%     11.3µs ± 0%   -44.99%  (p=0.008 n=5+5)
SubVV/100000-8                       221µs ±11%      223µs ±16%      ~     (p=0.690 n=5+5)
AddVW/1-8                           9.32ns ± 0%     9.32ns ± 0%      ~     (all equal)
AddVW/2-8                           19.7ns ± 1%     19.7ns ± 0%      ~     (p=0.381 n=5+4)
AddVW/3-8                           11.5ns ± 0%     11.5ns ± 0%      ~     (all equal)
AddVW/4-8                           13.0ns ± 0%     13.0ns ± 0%      ~     (all equal)
AddVW/5-8                           14.5ns ± 0%     14.5ns ± 0%      ~     (all equal)
AddVW/10-8                          22.0ns ± 0%     22.0ns ± 0%      ~     (all equal)
AddVW/100-8                          167ns ± 0%      167ns ± 0%      ~     (all equal)
AddVW/1000-8                        1.52µs ± 0%     1.52µs ± 0%    +0.40%  (p=0.008 n=5+5)
AddVW/10000-8                       15.1µs ± 0%     15.1µs ± 0%      ~     (p=0.556 n=5+4)
AddVW/100000-8                       152µs ± 1%      152µs ± 1%      ~     (p=0.690 n=5+5)
AddMulVVW/1-8                       33.3ns ± 0%     32.7ns ± 1%    -1.86%  (p=0.008 n=5+5)
AddMulVVW/2-8                       59.3ns ± 1%     56.9ns ± 1%    -4.15%  (p=0.008 n=5+5)
AddMulVVW/3-8                       80.5ns ± 1%     85.4ns ± 3%    +6.19%  (p=0.008 n=5+5)
AddMulVVW/4-8                        127ns ± 0%      111ns ± 1%   -13.19%  (p=0.008 n=5+5)
AddMulVVW/5-8                        144ns ± 0%      149ns ± 0%    +3.47%  (p=0.016 n=4+5)
AddMulVVW/10-8                       298ns ± 1%      283ns ± 0%    -4.77%  (p=0.008 n=5+5)
AddMulVVW/100-8                     3.06µs ± 0%     2.99µs ± 0%    -2.21%  (p=0.008 n=5+5)
AddMulVVW/1000-8                    31.3µs ± 0%     26.9µs ± 0%   -14.17%  (p=0.008 n=5+5)
AddMulVVW/10000-8                    316µs ± 0%      305µs ± 0%    -3.51%  (p=0.008 n=5+5)
AddMulVVW/100000-8                  3.17ms ± 0%     3.17ms ± 1%      ~     (p=0.690 n=5+5)
DecimalConversion-8                  316µs ± 1%      313µs ± 2%      ~     (p=0.095 n=5+5)
FloatString/100-8                   2.53µs ± 1%     2.56µs ± 2%      ~     (p=0.222 n=5+5)
FloatString/1000-8                  58.4µs ± 0%     58.5µs ± 0%      ~     (p=0.206 n=5+5)
FloatString/10000-8                 4.59ms ± 0%     4.58ms ± 0%    -0.31%  (p=0.008 n=5+5)
FloatString/100000-8                 446ms ± 0%      444ms ± 0%    -0.31%  (p=0.008 n=5+5)
FloatAdd/10-8                        184ns ± 0%      172ns ± 0%    -6.30%  (p=0.008 n=5+5)
FloatAdd/100-8                       189ns ± 2%      191ns ± 4%      ~     (p=0.381 n=5+5)
FloatAdd/1000-8                      371ns ± 0%      347ns ± 1%    -6.42%  (p=0.008 n=5+5)
FloatAdd/10000-8                    1.87µs ± 0%     1.68µs ± 0%   -10.16%  (p=0.008 n=5+5)
FloatAdd/100000-8                   17.1µs ± 0%     15.6µs ± 0%    -8.74%  (p=0.016 n=5+4)
FloatSub/10-8                        152ns ± 0%      138ns ± 0%    -9.47%  (p=0.000 n=4+5)
FloatSub/100-8                       148ns ± 0%      142ns ± 0%    -4.05%  (p=0.000 n=5+4)
FloatSub/1000-8                      245ns ± 1%      217ns ± 0%   -11.28%  (p=0.000 n=5+4)
FloatSub/10000-8                    1.07µs ± 0%     0.88µs ± 1%   -18.14%  (p=0.008 n=5+5)
FloatSub/100000-8                   9.58µs ± 0%     7.96µs ± 0%   -16.84%  (p=0.008 n=5+5)
ParseFloatSmallExp-8                28.8µs ± 1%     29.0µs ± 1%      ~     (p=0.095 n=5+5)
ParseFloatLargeExp-8                 126µs ± 1%      126µs ± 1%      ~     (p=0.841 n=5+5)
GCD10x10/WithoutXY-8                 277ns ± 2%      281ns ± 4%      ~     (p=0.746 n=5+5)
GCD10x10/WithXY-8                   2.10µs ± 1%     2.12µs ± 3%      ~     (p=0.548 n=5+5)
GCD10x100/WithoutXY-8                615ns ± 3%      607ns ± 2%      ~     (p=0.135 n=5+5)
GCD10x100/WithXY-8                  3.50µs ± 2%     3.62µs ± 5%      ~     (p=0.151 n=5+5)
GCD10x1000/WithoutXY-8              1.39µs ± 2%     1.39µs ± 3%      ~     (p=0.690 n=5+5)
GCD10x1000/WithXY-8                 7.39µs ± 1%     7.34µs ± 2%      ~     (p=0.135 n=5+5)
GCD10x10000/WithoutXY-8             8.66µs ± 1%     8.68µs ± 1%      ~     (p=0.421 n=5+5)
GCD10x10000/WithXY-8                28.1µs ± 2%     27.0µs ± 2%    -3.81%  (p=0.008 n=5+5)
GCD10x100000/WithoutXY-8            79.3µs ± 1%     79.3µs ± 1%      ~     (p=0.841 n=5+5)
GCD10x100000/WithXY-8                238µs ± 0%      227µs ± 1%    -4.74%  (p=0.008 n=5+5)
GCD100x100/WithoutXY-8              1.89µs ± 1%     1.88µs ± 2%      ~     (p=0.968 n=5+5)
GCD100x100/WithXY-8                 26.7µs ± 1%     27.0µs ± 1%    +1.44%  (p=0.032 n=5+5)
GCD100x1000/WithoutXY-8             4.48µs ± 1%     4.45µs ± 2%      ~     (p=0.341 n=5+5)
GCD100x1000/WithXY-8                36.3µs ± 1%     35.1µs ± 1%    -3.27%  (p=0.008 n=5+5)
GCD100x10000/WithoutXY-8            22.8µs ± 0%     22.7µs ± 1%      ~     (p=0.056 n=5+5)
GCD100x10000/WithXY-8                145µs ± 1%      133µs ± 1%    -8.33%  (p=0.008 n=5+5)
GCD100x100000/WithoutXY-8            198µs ± 0%      195µs ± 0%    -1.56%  (p=0.008 n=5+5)
GCD100x100000/WithXY-8              1.11ms ± 0%     1.00ms ± 0%   -10.04%  (p=0.008 n=5+5)
GCD1000x1000/WithoutXY-8            25.2µs ± 1%     24.8µs ± 1%    -1.63%  (p=0.016 n=5+5)
GCD1000x1000/WithXY-8                513µs ± 0%      517µs ± 2%      ~     (p=0.421 n=5+5)
GCD1000x10000/WithoutXY-8           57.0µs ± 0%     52.7µs ± 1%    -7.56%  (p=0.008 n=5+5)
GCD1000x10000/WithXY-8              1.20ms ± 0%     1.10ms ± 0%    -8.70%  (p=0.008 n=5+5)
GCD1000x100000/WithoutXY-8           358µs ± 0%      318µs ± 1%   -11.03%  (p=0.008 n=5+5)
GCD1000x100000/WithXY-8             8.71ms ± 0%     7.65ms ± 0%   -12.19%  (p=0.008 n=5+5)
GCD10000x10000/WithoutXY-8           690µs ± 0%      630µs ± 0%    -8.71%  (p=0.008 n=5+5)
GCD10000x10000/WithXY-8             16.0ms ± 1%     14.9ms ± 0%    -6.85%  (p=0.008 n=5+5)
GCD10000x100000/WithoutXY-8         2.09ms ± 0%     1.75ms ± 0%   -16.09%  (p=0.016 n=5+4)
GCD10000x100000/WithXY-8            86.8ms ± 0%     76.3ms ± 0%   -12.09%  (p=0.008 n=5+5)
GCD100000x100000/WithoutXY-8        51.1ms ± 0%     46.0ms ± 0%    -9.97%  (p=0.008 n=5+5)
GCD100000x100000/WithXY-8            1.25s ± 0%      1.15s ± 0%    -7.92%  (p=0.008 n=5+5)
Hilbert-8                           2.45ms ± 1%     2.49ms ± 1%    +1.99%  (p=0.008 n=5+5)
Binomial-8                          4.98µs ± 3%     4.90µs ± 2%      ~     (p=0.421 n=5+5)
QuoRem-8                            7.10µs ± 0%     6.21µs ± 0%   -12.55%  (p=0.016 n=5+4)
Exp-8                                161ms ± 0%      161ms ± 0%      ~     (p=0.421 n=5+5)
Exp2-8                               161ms ± 0%      161ms ± 0%      ~     (p=0.151 n=5+5)
Bitset-8                            40.4ns ± 0%     40.3ns ± 0%      ~     (p=0.190 n=5+5)
BitsetNeg-8                          163ns ± 3%      137ns ± 2%   -15.91%  (p=0.008 n=5+5)
BitsetOrig-8                         377ns ± 1%      372ns ± 1%    -1.22%  (p=0.024 n=5+5)
BitsetNegOrig-8                      631ns ± 1%      605ns ± 1%    -4.09%  (p=0.008 n=5+5)
ModSqrt225_Tonelli-8                7.26ms ± 0%     7.26ms ± 0%      ~     (p=0.548 n=5+5)
ModSqrt224_3Mod4-8                  2.24ms ± 0%     2.24ms ± 0%      ~     (p=1.000 n=5+5)
ModSqrt5430_Tonelli-8                62.4s ± 0%      62.4s ± 0%      ~     (p=0.841 n=5+5)
ModSqrt5430_3Mod4-8                  20.8s ± 0%      20.7s ± 0%      ~     (p=0.056 n=5+5)
Sqrt-8                               101µs ± 0%       89µs ± 0%   -12.17%  (p=0.008 n=5+5)
IntSqr/1-8                          32.5ns ± 1%     32.7ns ± 1%      ~     (p=0.056 n=5+5)
IntSqr/2-8                           160ns ± 5%      158ns ± 0%      ~     (p=0.397 n=5+4)
IntSqr/3-8                           298ns ± 4%      296ns ± 4%      ~     (p=0.667 n=5+5)
IntSqr/5-8                           737ns ± 5%      761ns ± 3%    +3.34%  (p=0.016 n=5+5)
IntSqr/8-8                          1.87µs ± 4%     1.90µs ± 3%      ~     (p=0.222 n=5+5)
IntSqr/10-8                         2.96µs ± 4%     2.92µs ± 6%      ~     (p=0.310 n=5+5)
IntSqr/20-8                         6.28µs ± 3%     6.21µs ± 2%      ~     (p=0.310 n=5+5)
IntSqr/30-8                         14.0µs ± 2%     13.9µs ± 2%      ~     (p=0.548 n=5+5)
IntSqr/50-8                         37.7µs ± 3%     38.3µs ± 2%      ~     (p=0.095 n=5+5)
IntSqr/80-8                         95.9µs ± 2%     95.1µs ± 1%      ~     (p=0.310 n=5+5)
IntSqr/100-8                         148µs ± 1%      148µs ± 1%      ~     (p=0.841 n=5+5)
IntSqr/200-8                         586µs ± 1%      587µs ± 1%      ~     (p=1.000 n=5+5)
IntSqr/300-8                        1.32ms ± 0%     1.31ms ± 1%    -0.73%  (p=0.032 n=5+5)
IntSqr/500-8                        2.48ms ± 0%     2.45ms ± 0%    -1.15%  (p=0.008 n=5+5)
IntSqr/800-8                        4.68ms ± 0%     4.62ms ± 0%    -1.23%  (p=0.008 n=5+5)
IntSqr/1000-8                       7.57ms ± 0%     7.50ms ± 0%    -0.84%  (p=0.008 n=5+5)
Mul-8                                311ms ± 0%      308ms ± 0%    -0.81%  (p=0.008 n=5+5)
Exp3Power/0x10-8                     574ns ± 1%      578ns ± 2%      ~     (p=0.500 n=5+5)
Exp3Power/0x40-8                     640ns ± 1%      646ns ± 0%      ~     (p=0.056 n=5+5)
Exp3Power/0x100-8                   1.42µs ± 1%     1.42µs ± 1%      ~     (p=0.246 n=5+5)
Exp3Power/0x400-8                   8.30µs ± 1%     8.29µs ± 1%      ~     (p=0.802 n=5+5)
Exp3Power/0x1000-8                  60.0µs ± 0%     59.9µs ± 0%    -0.24%  (p=0.016 n=5+5)
Exp3Power/0x4000-8                   817µs ± 0%      816µs ± 0%    -0.17%  (p=0.008 n=5+5)
Exp3Power/0x10000-8                 7.80ms ± 1%     7.70ms ± 0%    -1.23%  (p=0.008 n=5+5)
Exp3Power/0x40000-8                 73.4ms ± 0%     72.5ms ± 0%    -1.28%  (p=0.008 n=5+5)
Exp3Power/0x100000-8                 665ms ± 0%      656ms ± 0%    -1.34%  (p=0.008 n=5+5)
Exp3Power/0x400000-8                 5.99s ± 0%      5.90s ± 0%    -1.40%  (p=0.008 n=5+5)
Fibo-8                               116ms ± 0%       50ms ± 0%   -57.09%  (p=0.008 n=5+5)
NatSqr/1-8                           112ns ± 4%      112ns ± 2%      ~     (p=0.968 n=5+5)
NatSqr/2-8                           251ns ± 2%      250ns ± 1%      ~     (p=0.571 n=5+5)
NatSqr/3-8                           378ns ± 2%      379ns ± 2%      ~     (p=0.794 n=5+5)
NatSqr/5-8                           829ns ± 3%      827ns ± 2%      ~     (p=1.000 n=5+5)
NatSqr/8-8                          1.97µs ± 2%     1.95µs ± 2%      ~     (p=0.310 n=5+5)
NatSqr/10-8                         3.02µs ± 2%     2.99µs ± 2%      ~     (p=0.421 n=5+5)
NatSqr/20-8                         6.51µs ± 2%     6.49µs ± 1%      ~     (p=0.841 n=5+5)
NatSqr/30-8                         14.1µs ± 2%     14.0µs ± 2%      ~     (p=0.841 n=5+5)
NatSqr/50-8                         38.1µs ± 2%     38.3µs ± 3%      ~     (p=0.690 n=5+5)
NatSqr/80-8                         95.5µs ± 2%     96.0µs ± 1%      ~     (p=0.421 n=5+5)
NatSqr/100-8                         150µs ± 1%      148µs ± 2%      ~     (p=0.095 n=5+5)
NatSqr/200-8                         588µs ± 1%      590µs ± 1%      ~     (p=0.421 n=5+5)
NatSqr/300-8                        1.32ms ± 1%     1.31ms ± 1%      ~     (p=0.841 n=5+5)
NatSqr/500-8                        2.50ms ± 0%     2.47ms ± 0%    -1.03%  (p=0.008 n=5+5)
NatSqr/800-8                        4.70ms ± 0%     4.64ms ± 0%    -1.31%  (p=0.008 n=5+5)
NatSqr/1000-8                       7.60ms ± 0%     7.52ms ± 0%    -1.01%  (p=0.008 n=5+5)
ScanPi-8                             326µs ± 0%      326µs ± 0%      ~     (p=0.841 n=5+5)
StringPiParallel-8                  70.3µs ± 5%     63.8µs ±10%      ~     (p=0.056 n=5+5)
Scan/10/Base2-8                     1.09µs ± 0%     1.09µs ± 0%      ~     (p=0.317 n=5+5)
Scan/100/Base2-8                    7.79µs ± 0%     7.78µs ± 0%      ~     (p=0.063 n=5+5)
Scan/1000/Base2-8                   79.0µs ± 0%     78.9µs ± 0%    -0.18%  (p=0.008 n=5+5)
Scan/10000/Base2-8                  1.22ms ± 0%     1.22ms ± 0%    -0.15%  (p=0.008 n=5+5)
Scan/100000/Base2-8                 55.1ms ± 0%     55.2ms ± 0%    +0.20%  (p=0.008 n=5+5)
Scan/10/Base8-8                      512ns ± 0%      512ns ± 1%      ~     (p=0.810 n=5+5)
Scan/100/Base8-8                    2.89µs ± 0%     2.89µs ± 0%      ~     (p=0.810 n=5+5)
Scan/1000/Base8-8                   31.0µs ± 0%     31.0µs ± 0%      ~     (p=0.151 n=5+5)
Scan/10000/Base8-8                   740µs ± 0%      741µs ± 0%    +0.10%  (p=0.008 n=5+5)
Scan/100000/Base8-8                 50.6ms ± 0%     50.6ms ± 0%    +0.08%  (p=0.008 n=5+5)
Scan/10/Base10-8                     487ns ± 0%      487ns ± 0%      ~     (p=0.571 n=5+5)
Scan/100/Base10-8                   2.67µs ± 0%     2.67µs ± 0%      ~     (p=0.810 n=5+5)
Scan/1000/Base10-8                  28.7µs ± 0%     28.7µs ± 0%    +0.06%  (p=0.008 n=5+5)
Scan/10000/Base10-8                  716µs ± 0%      717µs ± 0%      ~     (p=0.222 n=5+5)
Scan/100000/Base10-8                50.3ms ± 0%     50.3ms ± 0%    +0.10%  (p=0.008 n=5+5)
Scan/10/Base16-8                     438ns ± 0%      437ns ± 1%      ~     (p=0.786 n=5+5)
Scan/100/Base16-8                   2.47µs ± 0%     2.47µs ± 0%    -0.19%  (p=0.048 n=5+5)
Scan/1000/Base16-8                  27.2µs ± 0%     27.3µs ± 0%      ~     (p=0.087 n=5+5)
Scan/10000/Base16-8                  722µs ± 0%      722µs ± 0%    +0.11%  (p=0.008 n=5+5)
Scan/100000/Base16-8                52.6ms ± 0%     52.7ms ± 0%    +0.15%  (p=0.008 n=5+5)
String/10/Base2-8                    247ns ± 2%      248ns ± 1%      ~     (p=0.437 n=5+5)
String/100/Base2-8                  1.51µs ± 0%     1.51µs ± 0%    -0.37%  (p=0.024 n=5+5)
String/1000/Base2-8                 13.6µs ± 1%     13.5µs ± 0%      ~     (p=0.095 n=5+5)
String/10000/Base2-8                 135µs ± 0%      135µs ± 1%      ~     (p=0.841 n=5+5)
String/100000/Base2-8               1.32ms ± 1%     1.32ms ± 1%      ~     (p=0.690 n=5+5)
String/10/Base8-8                    169ns ± 1%      169ns ± 1%      ~     (p=1.000 n=5+5)
String/100/Base8-8                   636ns ± 0%      634ns ± 1%      ~     (p=0.413 n=5+5)
String/1000/Base8-8                 5.33µs ± 1%     5.32µs ± 0%      ~     (p=0.222 n=5+5)
String/10000/Base8-8                50.9µs ± 1%     50.7µs ± 0%      ~     (p=0.151 n=5+5)
String/100000/Base8-8                500µs ± 1%      497µs ± 0%      ~     (p=0.421 n=5+5)
String/10/Base10-8                   516ns ± 1%      513ns ± 0%    -0.62%  (p=0.016 n=5+4)
String/100/Base10-8                 1.97µs ± 0%     1.96µs ± 0%      ~     (p=0.667 n=4+5)
String/1000/Base10-8                12.5µs ± 0%     11.5µs ± 0%    -7.92%  (p=0.008 n=5+5)
String/10000/Base10-8               57.7µs ± 0%     52.5µs ± 0%    -8.93%  (p=0.008 n=5+5)
String/100000/Base10-8              25.6ms ± 0%     21.6ms ± 0%   -15.94%  (p=0.008 n=5+5)
String/10/Base16-8                   150ns ± 1%      149ns ± 0%      ~     (p=0.413 n=5+4)
String/100/Base16-8                  514ns ± 1%      514ns ± 1%      ~     (p=0.849 n=5+5)
String/1000/Base16-8                4.01µs ± 0%     4.01µs ± 0%      ~     (p=0.421 n=5+5)
String/10000/Base16-8               37.8µs ± 1%     37.8µs ± 1%      ~     (p=0.841 n=5+5)
String/100000/Base16-8               373µs ± 2%      373µs ± 0%      ~     (p=0.421 n=5+5)
LeafSize/0-8                        6.63ms ± 0%     6.63ms ± 0%      ~     (p=0.730 n=4+5)
LeafSize/1-8                        74.0µs ± 0%     67.7µs ± 1%    -8.53%  (p=0.008 n=5+5)
LeafSize/2-8                        74.2µs ± 0%     68.3µs ± 1%    -7.99%  (p=0.008 n=5+5)
LeafSize/3-8                         379µs ± 0%      309µs ± 0%   -18.52%  (p=0.008 n=5+5)
LeafSize/4-8                        72.7µs ± 1%     66.7µs ± 0%    -8.37%  (p=0.008 n=5+5)
LeafSize/5-8                         471µs ± 0%      384µs ± 0%   -18.55%  (p=0.008 n=5+5)
LeafSize/6-8                         378µs ± 0%      308µs ± 0%   -18.59%  (p=0.008 n=5+5)
LeafSize/7-8                         245µs ± 0%      204µs ± 1%   -16.75%  (p=0.008 n=5+5)
LeafSize/8-8                        73.4µs ± 0%     66.9µs ± 1%    -8.79%  (p=0.008 n=5+5)
LeafSize/9-8                         538µs ± 0%      437µs ± 0%   -18.75%  (p=0.008 n=5+5)
LeafSize/10-8                        472µs ± 0%      396µs ± 1%   -16.01%  (p=0.008 n=5+5)
LeafSize/11-8                        460µs ± 0%      374µs ± 0%   -18.58%  (p=0.008 n=5+5)
LeafSize/12-8                        378µs ± 0%      308µs ± 0%   -18.38%  (p=0.008 n=5+5)
LeafSize/13-8                        343µs ± 0%      284µs ± 0%   -17.30%  (p=0.008 n=5+5)
LeafSize/14-8                        248µs ± 0%      206µs ± 0%   -16.94%  (p=0.008 n=5+5)
LeafSize/15-8                        169µs ± 0%      144µs ± 0%   -14.69%  (p=0.008 n=5+5)
LeafSize/16-8                       72.9µs ± 0%     66.8µs ± 1%    -8.27%  (p=0.008 n=5+5)
LeafSize/32-8                       82.5µs ± 0%     76.7µs ± 0%    -7.04%  (p=0.008 n=5+5)
LeafSize/64-8                        134µs ± 0%      129µs ± 0%    -3.80%  (p=0.008 n=5+5)
ProbablyPrime/n=0-8                 44.2ms ± 0%     43.4ms ± 0%    -1.95%  (p=0.008 n=5+5)
ProbablyPrime/n=1-8                 64.9ms ± 0%     64.0ms ± 0%    -1.27%  (p=0.008 n=5+5)
ProbablyPrime/n=5-8                  147ms ± 0%      146ms ± 0%    -0.58%  (p=0.008 n=5+5)
ProbablyPrime/n=10-8                 250ms ± 0%      249ms ± 0%    -0.35%  (p=0.008 n=5+5)
ProbablyPrime/n=20-8                 456ms ± 0%      455ms ± 0%    -0.18%  (p=0.008 n=5+5)
ProbablyPrime/Lucas-8               23.6ms ± 0%     22.7ms ± 0%    -3.74%  (p=0.008 n=5+5)
ProbablyPrime/MillerRabinBase2-8    20.7ms ± 0%     20.6ms ± 0%      ~     (p=0.421 n=5+5)
FloatSqrt/64-8                      2.25µs ± 1%     2.29µs ± 0%    +1.48%  (p=0.008 n=5+5)
FloatSqrt/128-8                     4.86µs ± 1%     4.92µs ± 1%    +1.21%  (p=0.032 n=5+5)
FloatSqrt/256-8                     13.6µs ± 0%     13.7µs ± 1%    +1.31%  (p=0.032 n=5+5)
FloatSqrt/1000-8                    70.0µs ± 1%     70.1µs ± 0%      ~     (p=0.690 n=5+5)
FloatSqrt/10000-8                   1.92ms ± 0%     1.90ms ± 0%    -0.59%  (p=0.008 n=5+5)
FloatSqrt/100000-8                  55.3ms ± 0%     54.8ms ± 0%    -1.01%  (p=0.008 n=5+5)
FloatSqrt/1000000-8                  4.56s ± 0%      4.50s ± 0%    -1.28%  (p=0.008 n=5+5)

name                              old speed      new speed       delta
AddVV/1-8                         2.97GB/s ± 0%   5.56GB/s ± 0%   +86.85%  (p=0.008 n=5+5)
AddVV/2-8                         9.47GB/s ± 0%  10.66GB/s ± 0%   +12.50%  (p=0.008 n=5+5)
AddVV/3-8                         12.4GB/s ± 0%   14.7GB/s ± 0%   +19.10%  (p=0.008 n=5+5)
AddVV/4-8                         14.6GB/s ± 0%   18.9GB/s ± 0%   +29.63%  (p=0.016 n=4+5)
AddVV/5-8                         16.4GB/s ± 0%   22.0GB/s ± 0%   +34.47%  (p=0.016 n=5+4)
AddVV/10-8                        21.7GB/s ± 0%   35.5GB/s ± 0%   +63.89%  (p=0.008 n=5+5)
AddVV/100-8                       29.4GB/s ± 0%   68.0GB/s ± 0%  +131.38%  (p=0.008 n=5+5)
AddVV/1000-8                      31.7GB/s ± 0%   61.9GB/s ± 0%   +95.43%  (p=0.008 n=5+5)
AddVV/10000-8                     31.2GB/s ± 0%   56.4GB/s ± 0%   +80.83%  (p=0.008 n=5+5)
AddVV/100000-8                    25.9GB/s ± 3%   41.4GB/s ± 0%   +59.98%  (p=0.008 n=5+5)
SubVV/1-8                         2.97GB/s ± 0%   5.56GB/s ± 0%   +86.97%  (p=0.016 n=4+5)
SubVV/2-8                         9.47GB/s ± 0%  10.66GB/s ± 0%   +12.51%  (p=0.008 n=5+5)
SubVV/3-8                         12.4GB/s ± 0%   14.8GB/s ± 0%   +19.23%  (p=0.016 n=4+5)
SubVV/4-8                         14.6GB/s ± 0%   18.9GB/s ± 0%   +29.56%  (p=0.008 n=5+5)
SubVV/5-8                         16.4GB/s ± 0%   22.0GB/s ± 0%   +34.47%  (p=0.016 n=4+5)
SubVV/10-8                        21.7GB/s ± 0%   35.5GB/s ± 0%   +63.89%  (p=0.008 n=5+5)
SubVV/100-8                       29.4GB/s ± 0%   68.0GB/s ± 0%  +131.38%  (p=0.008 n=5+5)
SubVV/1000-8                      31.6GB/s ± 0%   80.1GB/s ± 0%  +153.08%  (p=0.008 n=5+5)
SubVV/10000-8                     31.2GB/s ± 0%   56.7GB/s ± 0%   +81.79%  (p=0.008 n=5+5)
SubVV/100000-8                    29.1GB/s ±10%   29.0GB/s ±18%      ~     (p=0.690 n=5+5)
AddVW/1-8                          859MB/s ± 0%    859MB/s ± 0%    -0.01%  (p=0.008 n=5+5)
AddVW/2-8                          811MB/s ± 1%    814MB/s ± 0%      ~     (p=0.413 n=5+4)
AddVW/3-8                         2.08GB/s ± 0%   2.08GB/s ± 0%      ~     (p=0.206 n=5+5)
AddVW/4-8                         2.46GB/s ± 0%   2.46GB/s ± 0%      ~     (p=0.056 n=5+5)
AddVW/5-8                         2.75GB/s ± 0%   2.75GB/s ± 0%      ~     (p=0.508 n=5+5)
AddVW/10-8                        3.63GB/s ± 0%   3.63GB/s ± 0%      ~     (p=0.214 n=5+5)
AddVW/100-8                       4.79GB/s ± 0%   4.79GB/s ± 0%      ~     (p=0.500 n=5+5)
AddVW/1000-8                      5.27GB/s ± 0%   5.25GB/s ± 0%    -0.43%  (p=0.008 n=5+5)
AddVW/10000-8                     5.30GB/s ± 0%   5.30GB/s ± 0%      ~     (p=0.397 n=5+5)
AddVW/100000-8                    5.27GB/s ± 1%   5.25GB/s ± 1%      ~     (p=0.690 n=5+5)
AddMulVVW/1-8                     1.92GB/s ± 0%   1.96GB/s ± 1%    +1.95%  (p=0.008 n=5+5)
AddMulVVW/2-8                     2.16GB/s ± 1%   2.25GB/s ± 1%    +4.32%  (p=0.008 n=5+5)
AddMulVVW/3-8                     2.39GB/s ± 1%   2.25GB/s ± 3%    -5.79%  (p=0.008 n=5+5)
AddMulVVW/4-8                     2.00GB/s ± 0%   2.31GB/s ± 1%   +15.31%  (p=0.008 n=5+5)
AddMulVVW/5-8                     2.22GB/s ± 0%   2.14GB/s ± 0%    -3.86%  (p=0.008 n=5+5)
AddMulVVW/10-8                    2.15GB/s ± 1%   2.25GB/s ± 0%    +5.03%  (p=0.008 n=5+5)
AddMulVVW/100-8                   2.09GB/s ± 0%   2.14GB/s ± 0%    +2.25%  (p=0.008 n=5+5)
AddMulVVW/1000-8                  2.04GB/s ± 0%   2.38GB/s ± 0%   +16.52%  (p=0.008 n=5+5)
AddMulVVW/10000-8                 2.03GB/s ± 0%   2.10GB/s ± 0%    +3.64%  (p=0.008 n=5+5)
AddMulVVW/100000-8                2.02GB/s ± 0%   2.02GB/s ± 1%      ~     (p=0.690 n=5+5)

Change-Id: Ie482d67a7dbb5af6f5d81af2b3d9d14bd66336db
Reviewed-on: https://go-review.googlesource.com/77831
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

os/exec: document Process.Kill behaviour

It is not clear from documentation what the Process.Kill does. And it
leads to reccuring confusion about Cmd.Start/Wait methods.

Fixes #24220

Change-Id: I66609d21d2954e195d13648014681530eed8ea6c
Reviewed-on: https://go-review.googlesource.com/98715
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

path/filepath: use a temp dir in path_test.go

We should avoid writing temp files to GOROOT, since it might be readonly.

Fixes #23881

Change-Id: Iaa38ec404b303f0cf27fdfb7daf1ddd60fd5d1c9
GitHub-Last-Rev: de0211df8474cc3bbef40f792e2f85b3b6ee259c
GitHub-Pull-Request: golang/go#24238
Reviewed-on: https://go-review.googlesource.com/98517
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

cmd/compile: refactor order.go into methods

No functional changes, just changing all the orderfoo functions
into (*Order).foo methods.

Passes toolstash-check.

Change-Id: Ib9833daa98aff3c645ce56794a414f8472689152
Reviewed-on: https://go-review.googlesource.com/98617
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

os/signal: disable loading of history during test

This change modifies Go to disable loading of users' shell history for
TestTerminalSignal tests. TestTerminalSignal, as part of its workload,
will execute a new interactive bash shell. Bash will attempt to load the
user's history from the file pointed to by the HISTFILE environment
variable. For users with large histories that may take up to several
seconds, pushing the whole test past the 5 second timeout and causing
it to fail.

Change-Id: I11b2f83ee91f51fa1e9774a39181ab365f9a6b3a
GitHub-Last-Rev: 7efdf616a2fcecdf479420fc0004057cee2ea6b2
GitHub-Pull-Request: golang/go#24255
Reviewed-on: https://go-review.googlesource.com/98616
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

internal/trace: remove backlinks from span/task end to start

This is an updated version of golang.org/cl/96395, with the fix to
TestUserSpan.

This reverts commit 7b6f6267e90a8e4eab37a3f2164ba882e6222adb.

Change-Id: I31eec8ba0997f9178dffef8dac608e731ab70872
Reviewed-on: https://go-review.googlesource.com/98236
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>

test/codegen: port math/bits.Leadingzero tests to codegen

Change-Id: Ic21d25db5d56ce77516c53082dfbc010e5875b81
Reviewed-on: https://go-review.googlesource.com/98655
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

runtime: rename vdso symbols to use camel case

This was originally C code using names with underscores, which were
retained when the code was rewritten into Go. Change the code to use
Go-like camel case names.

The names that come from the ELF ABI are left unchanged.

Change-Id: I181bc5dd81284c07bc67b7df4635f4734b41d646
Reviewed-on: https://go-review.googlesource.com/98520
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

runtime: remove unused SYS_* definitions on Linux

Also fix the indentation of the SYS_* definitions in sys_linux_mipsx.s
and order them numerically.

Change-Id: I0c454301c329a163e7db09dcb25d4e825149858c
Reviewed-on: https://go-review.googlesource.com/98448
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

test: port bits.Len intrinsics tests to the new codegen harness

This change move bits.Len* intrinsification tests to the new codegen
test harness, removing them from the old ssa_test file. Five different
test functions (one for each bit.Len function tested) was used, to
avoid possible unwanted interactions between multiple calls inside one
function.

Change-Id: Iffd5be55b58e88597fa30a562a28dacb01236d8b
Reviewed-on: https://go-review.googlesource.com/98156
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>

internal/bytealg: fix arm64 Index function

Missed removing the argument loading from the indexbody function.

Change-Id: Ia1391231fc99771d00410a09fe80a09f08ceed02
Reviewed-on: https://go-review.googlesource.com/98575
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

cmd/compile: fold offsets into memory ops

Fold offsets for:

  {ADD,SUB,MUL}[SD]mem
  ADD[LQ]constmem
  {ADD,SUB,AND,OR,XOR}[LQ]mem

Cumulatively, the rules trigger ~900 times in all.bash.

Fixes #23325

Change-Id: If6c701f68fa0b57907a353a07a516b914127d0d8
Reviewed-on: https://go-review.googlesource.com/98035
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

internal/bytealg: move short string Index implementations into bytealg

Also move the arm64 CountByte implementation while we're here.

Fixes #19792

Change-Id: I1e0fdf1e03e3135af84150a2703b58dad1b0d57e
Reviewed-on: https://go-review.googlesource.com/98518
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

internal/bytealg: move compare functions to bytealg

Move bytes.Compare and runtime·cmpstring to bytealg.

Update #19792

Change-Id: I139e6d7c59686bef7a3017e3dec99eba5fd10447
Reviewed-on: https://go-review.googlesource.com/98515
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

internal/bytealg: move Count to bytealg

Move bytes.Count and strings.Count to bytealg.

Update #19792

Change-Id: I3e4e14b504a0b71758885bb131e5656e342cf8cb
Reviewed-on: https://go-review.googlesource.com/98495
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

test: convert all math-related tests from asm_test

Change-Id: If542f0b5c5754e6eb2f9b302fe5a148ba9a57338
Reviewed-on: https://go-review.googlesource.com/98443
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

test: move load/store combines into asmcheck

This CL moves the load/store combining tests into asmcheck.
In addition at being more compact, it's also now easier to
spot what it is missing in each architecture.

While doing so, I think I uncovered a bug in ppc64le and arm64
rules, because they fail to load/store combine in non-trivial
functions. Not sure why, I'll open an issue.

Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3
Reviewed-on: https://go-review.googlesource.com/98441
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

test: in asmcheck, dump only the functions which fail

Before this change, in case of any failure, asmcheck was
dumping to stderr the whole output of compile -S, which
can be very long if it contains multiple functions.

Make it so it filters the output to only display the
assembly output of functions for which at least one opcode
check failed. This greatly simplifies debugging.

Change-Id: I1bbf54473b8252a3384e2c1dade82d926afc119d
Reviewed-on: https://go-review.googlesource.com/98444
Run-TryBot: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>

test: port a nil-check interface test from asm_test

Change-Id: I69c1688506d1aeca655047acf35d1bff966fc01e
Reviewed-on: https://go-review.googlesource.com/98442
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

test: use the version of Go used to run run.go

Currently, the top-level testsuite always uses whatever version
of Go is found in the PATH to execute all the tests. This
forces the developers to tweak the PATH to run the testsuite.

Change it to use the same version of Go used to run run.go.
This allows developers to run the testsuite using the tip
compiler by simply saying "../bin/go run run.go".

I think this is a better solution compared to always forcing
"../bin/go", because it allows developers to run the testsuite
using different Go versions, for instance to check if a new
test is fixed in tip compared to the installed compiler.

Fixes #24217

Change-Id: I41b299c753b6e77c41e28be9091b2b630efea9d2
Reviewed-on: https://go-review.googlesource.com/98439
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>