Daniel Martí [Fri, 24 Apr 2020 09:53:17 +0000 (10:53 +0100)]
cmd/go: make 'mod verify' use multiple CPUs
'go mod verify' checksums one module zip at a time, which is
CPU-intensive on most modern machines with fast disks. As a result, one
can see a CPU bottleneck when running the command on, for example, a
module where 'go list -m all' lists ~440 modules:
$ /usr/bin/time go mod verify
all modules verified
11.47user 0.77system 0:09.41elapsed 130%CPU (0avgtext+0avgdata 24284maxresident)k
0inputs+0outputs (0major+4156minor)pagefaults 0swaps
Instead, verify up to GOMAXPROCS zips at once, which should line up
pretty well with the amount of processors we can use on a machine. The
results below are obtained via 'benchcmd -n 5 GoModVerify go mod verify'
on the same large module.
name old time/op new time/op delta
GoModVerify 9.35s ± 1% 3.03s ± 2% -67.60% (p=0.008 n=5+5)
name old user-time/op new user-time/op delta
GoModVerify 11.2s ± 1% 16.3s ± 3% +45.38% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
GoModVerify 841ms ± 9% 865ms ± 8% ~ (p=0.548 n=5+5)
name old peak-RSS-bytes new peak-RSS-bytes delta
GoModVerify 27.8MB ±13% 50.7MB ±27% +82.01% (p=0.008 n=5+5)
The peak memory usage nearly doubles, and there is some extra overhead,
but it seems clearly worth the tradeoff given that we see a ~3x speedup
on my laptop with 4 physical cores. The vast majority of developer
machines nowadays should have 2-4 cores at least.
No test or benchmark is included; one can benchmark 'go mod verify'
directly, as I did above. The existing tests also cover correctness,
including any data races via -race.
Fixes #38623.
Change-Id: I45d8154687a6f3a6a9fb0e2b13da4190f321246c
Reviewed-on: https://go-review.googlesource.com/c/go/+/229817
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
Rob Pike [Tue, 28 Apr 2020 15:45:59 +0000 (01:45 +1000)]
cmd/cover: include a package name in the HTML title
A recent change added a title to the HTML coverage report but
neglected to include the package name. Add the package name here.
It's a little trickier than you'd think because there may be multiple
packages and we don't want to parse the files, so we just extract
a directory name from the path of the first file. This will almost
always be right, and has the advantage that it gives a better result
for package main. There are rare cases it will get wrong, but that
will be no hardship.
If this turns out not to be good enough, we can refine it.
Fixes #38609
Change-Id: I2201f6caef906e0b0258b90d7de518879041fe72
Reviewed-on: https://go-review.googlesource.com/c/go/+/230517 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Robert Griesemer [Tue, 28 Apr 2020 23:54:44 +0000 (16:54 -0700)]
strconv: implement parseFloatPrefix returning no. of bytes consumed
parseFloatPrefix will make it easier to implement ParseComplex.
Verified that there's no relevant performance impact:
Benchmarks run on a "quiet" MacBook Pro, 3.3GHz Dual-Core Intel Core i7,
with 16GB 2133MHz LPDDR3 RAM running macOS 10.15.4.
Change-Id: I8ff66b582ae8b468d89c9ffc35c569c735cf0341
Reviewed-on: https://go-review.googlesource.com/c/go/+/230737 Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd/compile: omit file:pos for non-existent, permission errors
Omits printing the file:line:column when trying to open either
* non-existent files
* files without permission
Given:
go tool compile x.go
For either of x.go not existing, or if no read permissions:
* Before:
x.go:0: open x.go: no such file or directory
x.go:0: open x.go: permission denied
* After:
open x.go: no such file or directory
open x.go: permission denied
While here, noticed an oddity with the Linux builders, that appear
to always be running under root, hence the test for permission errors
with 0222 -W-*-W-*-W- can't pass on linux-amd64 builders.
The filed bug is #38608.
When I browsed the source code, I saw that there is no corresponding example of this function. I am not sure if there is a need for an increase, this is my first time to submit CL.
Change-Id: Idbf4e1e1ed2995176a76959d561e152263a2fd26
Reviewed-on: https://go-review.googlesource.com/c/go/+/230741
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Matthew Dempsky [Wed, 29 Apr 2020 23:45:05 +0000 (16:45 -0700)]
net/http/cgi: replace constant map with switch statement
The switch statement can be statically optimized by the compiler,
whereas similarly optimizing the map index expression would require
additional compiler analysis to detect the map is never mutated.
cmd/compile,runtime: stack maps only at calls, remove register maps
Currently, we emit stack maps and register maps at almost every
instruction. This was originally intended to support non-cooperative
preemption, but was only ever used for debug call injection. Now debug
call injection also uses conservative frame scanning. As a result,
stack maps are only needed at call sites and register maps aren't
needed at all except that we happen to also encode unsafe-point
information in the register map PCDATA stream.
This CL reduces stack maps to only appear at calls, and replace full
register maps with just safe/unsafe-point information.
This is all protected by the go115ReduceLiveness feature flag, which
is defined in both runtime and cmd/compile.
This CL significantly reduces binary sizes and also speeds up compiles
and links:
name old exe-bytes new exe-bytes delta
BinGoSize 15.0MB ± 0% 14.1MB ± 0% -5.72%
name old pcln-bytes new pcln-bytes delta
BinGoSize 3.14MB ± 0% 2.48MB ± 0% -21.08%
The binary size improvement is even slightly better when you include
the CLs leading up to this. Relative to the parent of "cmd/compile:
mark PanicBounds/Extend as calls":
name old exe-bytes new exe-bytes delta
BinGoSize 15.0MB ± 0% 14.1MB ± 0% -6.18%
name old pcln-bytes new pcln-bytes delta
BinGoSize 3.22MB ± 0% 2.48MB ± 0% -22.92%
cmd/compile: don't emit stack maps for write barrier calls
These are necessarily deeply non-preemptible, so there's no point in
emitting stack maps for them. We already mark them as unsafe points,
so this only affects the runtime, since user code does not emit stack
maps at unsafe points. SSAGenState.PrepareCall also excludes them when
it's sanity checking call stack maps.
Right now this only drops a handful of unnecessary stack maps from the
runtime, but we're about to start emitting stack maps only at calls
for user code, too. At that point, this will matter much more.
The compiler currently conflates whether a Value has a stack map with
whether it's an unsafe point. For the most part, unsafe-points don't
have stack maps, so this is mostly fine, but call instructions can be
both an unsafe-point *and* have a stack map. For example, none of the
instructions in a nosplit function should be preemptible, but calls
must still have stack maps in case the called function grows the stack
or get preempted.
Currently, the compiler can't distinguish this case, so calls in
nosplit functions are marked as safe-points just because they have
stack maps. This is particularly problematic if a nosplit function
calls another nosplit function, since this can introduce a preemption
point where there should be none.
We realized this was a problem for split-stack prologues a while back,
and CL 207349 changed the encoding of unsafe-points to use the
register map index instead of the stack map index so we could record
both a stack map and an unsafe-point at the same instruction. But this
was never extended into the compiler.
This CL fixes this problem in the compiler. We make LivenessIndex
slightly more abstract by separating unsafe-point marks from stack and
register map indexes. We map this to the PCDATA encoding later when
producing Progs. This isn't enough to fix the whole problem for
nosplit functions, because obj still adds prologues and marks those as
preemptible, but it's a step in the right direction.
I checked this CL by comparing maps before and after this change in
the runtime and net/http. In net/http, unsafe-points match exactly; at
anything that isn't an unsafe-point, both the stack and register maps
are unchanged by this CL. In the runtime, at every point that was a
safe-point before this change, the stack maps agree (and mostly the
runtime doesn't have register maps at all now). In both, all CALLs
(except write barrier calls) have stack maps.
Currently, this function conflates two (easily conflated!) concepts:
whether a Value is a safe-point and whether it has a stack map. In
particular, call Values may not be a safe-point, but may need a stack
map anyway in case the called function grows the stack.
Hence, rename this function to "hasStackMap", since that's really what
it represents.
PanicBounds and PanicExtend are lowered to runtime calls (with a
non-Go ABI), but are not currently marked as calls. Since liveness
analysis only emits stack maps at calls in the runtime, this means
these panic call sites in the runtime won't get a stack map. These
almost immediately turn into throws in the runtime, but there's still
a chance they'll try to grow the stack first, which would lead to a
different panic.
To fix this, mark these operations as calls.
Outside the runtime, we currently emit stack maps for everything that
isn't an unsafe-point, so these panic calls get stack maps by default.
However, we're about to move to emitting stack maps only at call
sites, at which point this will start to matter outside the runtime as
well.
I confirmed that this has no effect on anything but PCDATA/FUNCDATA in
runtime and net/http.
runtime: use conservative scanning for debug calls
A debugger can inject a call at almost any PC, which causes
significant complications with stack scanning and growth. Currently,
the runtime solves this using precise stack maps and register maps at
nearly all PCs, but these extra maps require roughly 5% of the binary.
These extra maps were originally considered worth this space because
they were intended to be used for non-cooperative preemption, but are
now used only for debug call injection.
This CL switches from using precise maps to instead using conservative
frame scanning, much like how non-cooperative preemption works. When a
call is injected, the runtime flushes all potential pointer registers
to the stack, and then treats that frame as well as the interrupted
frame conservatively.
The limitation of conservative frame scanning is that we cannot grow
the goroutine stack. That's doable because the previous CL switched to
performing debug calls on a new goroutine, where they are free to grow
the stack.
With this CL, there are no remaining uses of precise register maps
(though we still use the unsafe-point information that's encoded in
the register map PCDATA stream), and stack maps are only used at call
sites.
runtime: perform debug call injection on a new goroutine
Currently, when a debugger injects a call, that call happens on the
goroutine where the debugger injected it. However, this requires
significant runtime complexity that we're about to remove.
To prepare for this, this CL switches to a different approach that
leaves the interrupted goroutine parked and runs the debug call on a
new goroutine. When the debug call returns, it resumes the original
goroutine.
This should be essentially transparent to debuggers. It follows the
exact same call injection protocol and ensures the whole protocol
executes indivisibly on a single OS thread. The only difference is
that the current G and stack now change part way through the protocol.
Currently, newproc1 allocates, initializes, and schedules a new
goroutine. We're about to change debug call injection in a way that
will need to create a new goroutine without immediately scheduling it.
To prepare for that, make scheduling the responsibility of newproc1's
caller. Currently, there's exactly one caller (newproc), so this
simply shifts that responsibility.
Martin Möhrmann [Wed, 29 Apr 2020 10:23:32 +0000 (12:23 +0200)]
bytes, strings: align requirements for functions passed to FieldFuncs
golang.org/cl/229763 removed the documentation of requirements of
the function passed to FieldsFunc. The current implementation does
not require functions to return consistent results but this had not
been the case for previous implementations.
Add the requirement for consistent results back to the documentation
to allow for future implementations to be more allocation efficient
for an output with more than 32 fields. This is possible with a two
pass algorithm first determining the number of fields used to allocate
the output slice and then splitting the input into fields.
While at it align the documentation of bytes.FieldsFunc with
strings.FieldFunc.
Fixes #38630
Change-Id: Iabbf9ca3dff0daa41f4ec930a21a3dd98e19f122
Reviewed-on: https://go-review.googlesource.com/c/go/+/230797
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Jay Conrod [Thu, 19 Dec 2019 18:43:24 +0000 (13:43 -0500)]
cmd/go: trim source paths when compiling C with -trimpath
When then go command is run with -trimpath, it will now use
-fdebug-prefix-map when invoking the C compiler (if supported) to
replace the source root directory with a dummy root directory.
This should prevent source directories from appearing either literally
or in compressed DWARF in linked binaries.
Updates #36072
Change-Id: Iedd08d5e886f81e981f11248a1be4ed4f58bdd29
Reviewed-on: https://go-review.googlesource.com/c/go/+/212101
Run-TryBot: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
Richard Miller [Wed, 29 Apr 2020 10:20:28 +0000 (11:20 +0100)]
cmd/go/internal/modload: use lockedfile to read path-replacement go.mod files
When parsing go.mod files found via file-path replacements, it's safer to
use lockedfile.Read instead of ioutil.ReadFile, in case of overwriting by
other concurrent go commands.
Change-Id: I7dcac3bb5ada84bee1eb634b39f813c461ef103a
Reviewed-on: https://go-review.googlesource.com/c/go/+/230838 Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Jay Conrod <jayconrod@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Previously, non-standard attributes in Name.Names were being
omitted when printed using Name.String(). Now, any non-standard
attributes that would not already be printed in Name.String()
are being added temporarily to Name.ExtraNames to be printed.
cmd/compile,cmd/internal/obj/ppc64: use mod instructions on power9
This updates the PPC64.rules file to use the MOD instructions
that are available in power9. Prior to power9 this is done
using a longer sequence with multiply and divide.
Included in this change is removal of the REM* opcode variations
that set the CC or OV bits since their settings are based
on the DIV and are not appropriate for the REM.
Nigel Tao [Mon, 27 Apr 2020 23:32:00 +0000 (09:32 +1000)]
image: guard against NewXxx integer overflow
Prior to this commit, NewXxx could panic when passed an image.Rectangle
with one of width or height being negative. But it might not panic if
both were negative, because (bpp * w * h) could still be positive. After
this commit, it will panic if both are negative.
With overflow, NewXxx might not have panicked if (bpp * w * h), the
length passed to "make([]uint8, length)", was still non-negative (after
truncation), but even if w and h were valid (non-negative), the overall
byte slice wasn't long enough. Iterating over the pixels would possibly
panic later with index out of bounds. This change moves the panic
earlier, closer to where the mistake is.
Change-Id: I011feb2d53515fc3f0fe72bb6c23b3953772c577
Reviewed-on: https://go-review.googlesource.com/c/go/+/230220 Reviewed-by: Rob Pike <r@golang.org>
Ruixin Bao [Wed, 29 Apr 2020 01:19:17 +0000 (18:19 -0700)]
cmd/compile: adopt strong aux typing for some s390x rules
Convert some optimizations rules to strongly-typed versions. Similar to
CL 230338, this CL only converts rules that need no additional changes
(i.e: only need to change '->' to '=>').
This CL covers the rules from line 800 - 1219.
Passes toolstash-check
Change-Id: I94181a809fa38918b78301f1c0c680b7a8ab552f
Reviewed-on: https://go-review.googlesource.com/c/go/+/230738 Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Ian Gudger [Fri, 17 Apr 2020 11:04:23 +0000 (04:04 -0700)]
net: add (*Resolver).LookupIP
Previously, looking up only IPv4 or IPv6 addresses was only possible
with DefaultResolver via ResolveIPAddr. Add this functionality to the
Resolver type with a new method, LookupIP. This largely brings Resolver
functionally to parity with the global functions. The name LookupIP is
used over ResolveIPAddr to be consistent with the other Resolver
methods.
There are two main benefits to (*Resolver).LookupIP over
(*Resolver).LookupHost. First is an ergonomic benefit. Wanting a
specific family of address is common enough to justify a method, evident
by the existence of ResolveIPAddr. Second, this opens the possibility of
not performing unnecessary DNS requests when only a specific family of
addresses are needed. This optimization is left to follow up work.
Daniel Theophanes [Tue, 28 Apr 2020 14:31:12 +0000 (07:31 -0700)]
database/sql: document Connect and Close may need a timeout
Opening a connection with Connect should still create a derived
context with a timeout because some clients will not use a timeout
and the connection pool may open a connection asynchronously.
Likewise, if a connection close makes a network operation it should
provide some type of sane timeout for the operation.
Michael Anthony Knyszek [Tue, 28 Apr 2020 15:14:56 +0000 (15:14 +0000)]
runtime: fix block leak due to race in span set
The span set data structure may leak blocks due to a race in the logic
to check whether it's safe to free a block. The simplest example of this
race is between two poppers:
1. Popper A claims slot spanSetEntries-2.
2. Popper B claims slot spanSetEntries-1.
3. Popper A gets descheduled before it subtracts from block.used.
4. Popper B subtracts from block.used, sees that claimed
spanSetEntries-1, but also that block.used != 0, so it returns.
5. Popper A comes back and subtracts from block.used, but it didn't
claim spanSetEntries-1 so it also returns.
The spine is left with a stale block pointer and the block later gets
overwritten by pushes, never to be re-used again.
The problem here is that we designate the claimer of slot
spanSetEntries-1 to be the one who frees the block, but that may not be
the thread that actually does the last subtraction from block.used.
Fixing this problem is tricky, and the fundamental problem there is that
block.used is not stable: it may be observed to be zero, but that
doesn't necessarily mean you're the last popper!
Do something simpler: keep a counter of how many pops have happened to a
given block instead of block.used. This counter monotonically increases
when a pop is _completely done_. Because this counter is monotonically
increasing, and only increases when a popper is done, then we know for
sure whichever popper is the last to increase it (i.e. its value is
spanSetBlockEntries) is also the last popper in the block. Because the
race described above still exists, the last popper may not be the one
which claimed the last slot in the block, but we know for certain nobody
else is popping from that block anymore so we can safely free it.
Finally, because pops serialize with pushes to the same slot, we need
not worry about concurrent pushers at all.
Updates #37487.
Change-Id: I6697219372774c8ca7d8ee6895eaa230a64ce9e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/230497
Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Michael Anthony Knyszek [Tue, 28 Apr 2020 16:40:38 +0000 (16:40 +0000)]
runtime: flush mcaches to mcentral before reading memstats
Currently mcaches are flushed to mcentral after a bunch of memstats have
already been read. This is not safe (in the sense that it doesn't ensure
consisent memstats) since memstats may in general change when mcentral
data structures are manipulated.
Note that prior to the new mcentral implementation this was not a
problem because mcentral operations happened to never modify certain
memstats. As of the new mcentral implementation, we might for example
persistentalloc when uncaching a span, which would change memstats. This
can cause a skew between the value of sys (which currently is calculated
before mcaches are flushed) and the value of gc_sys and other_sys.
Fix this by moving mcache flushing to the very top of updatememstats.
Also leave a comment explaining that this must be done first, in
general, because mcentrals make no guarantee that they will not
influence memstats (and doing so would be unnecessarily restrictive).
This is a second attempt at CL 230024, with
cmd/go/testdata/script/mod_retention.txt updated to perform a
version-independent comparison on the 'go' version added to a go.mod
file that lacks one.
Fixes #38708
Change-Id: I15dcd83b51ed5ec57946b419bcbaec41e85a46f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/230382
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Ruixin Bao [Mon, 27 Apr 2020 23:20:45 +0000 (16:20 -0700)]
cmd/compile: adopt strong aux typing for some s390x rules
Convert some optimizations rules to strongly-typed versions. So far, I
have only converted the rules that need no additional changes (i.e: only
need to change '->' to "=>").
This CL covers the rules from line 478 - line 800 in S390X.rules file.
Some compare and branch rules also fall in this range, but they were
already done previously in another CL.
Passes toolstash-check.
Change-Id: I9167c5f1a32f4fd6c29bacc13fff95e83b0533e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/230338 Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
go/types: improve errors for method calls with pointer receivers
The compiler has better error messages for methods called without a
pointer receiver when one is expected. This change is similar to
CL 229801, but for method calls.
Also, added better error messages for functions called with the wrong
capitalization. I left the third TODO in this switch statement almost
as-is because I'm not sure that the extra complexity is worth it -
I adjusted the error to look like the one the compiler reports.
os, internal/poll, internal/syscall/unix: use copy_file_range on Linux
Linux 4.5 introduced (and Linux 5.3 refined) the copy_file_range
system call, which allows file systems the opportunity to implement
copy acceleration techniques. This commit adds support for
copy_file_range(2) to the os package.
Introduce a new ReadFrom method on *os.File, which makes *os.File
implement the io.ReaderFrom interface. If dst and src are both files,
this enables io.Copy(dst, src) to call dst.ReadFrom(src), which, in
turn, will call copy_file_range(2) if possible. If copy_file_range(2)
is not supported by the host kernel, or if either of dst or src
refers to a non-regular file, ReadFrom falls back to the regular
io.Copy code path.
Add internal/poll.CopyFileRange, which acquires locks on the
appropriate poll.FDs and performs the actual work, as well as
internal/syscall/unix.CopyFileRange, which wraps the copy_file_range
system call itself at the lowest level.
Rework file layout in internal/syscall/unix to accomodate the
additional system call numbers needed for copy_file_range.
Merge these definitions with the ones used by getrandom(2) into
sysnum_linux_$GOARCH.go files.
A note on additional optimizations: if dst and src both refer to pipes
in the invocation dst.ReadFrom(src), we could, in theory, use the
existing splice(2) code in package internal/poll to splice directly
from src to dst. Attempting this runs into trouble with the poller,
however. If we call splice(src, dst) and see EAGAIN, we cannot know
if it came from src not being ready for reading or dst not being
ready for writing. The write end of src and the read end of dst are
not under our control, so we cannot reliably use the poller to wait
for readiness. Therefore, it seems infeasible to use the new ReadFrom
method to splice between pipes directly. In conclusion, for now, the
only optimization enabled by the new ReadFrom method on *os.File is
the copy_file_range optimization.
Fixes #36817.
Change-Id: I696372639fa0cdf704e3f65414f7321fc7d30adb
Reviewed-on: https://go-review.googlesource.com/c/go/+/229101
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
IntSize is an untyped constant that does not need explicit conversion.
Annotating IntSize as an int and running github.com/mdempsky/unconvert
reveals these two cases.
Michael Anthony Knyszek [Wed, 22 Apr 2020 21:36:11 +0000 (21:36 +0000)]
runtime: ensure allocToCache updates searchAddr in a valid way
Currently allocToCache assumes it can move the search address past the
block it allocated the cache from, which violates the property that
searchAddr should always point to mapped memory (i.e. memory represented
by pageAlloc.inUse).
This bug was already fixed once for pageAlloc.alloc in the Go 1.14
release via CL 216697, but that changed failed to take into account
allocToCache.
Fixes #38605.
Change-Id: Id08180aa10d19dc0f9f551a1d9e327a295560dff
Reviewed-on: https://go-review.googlesource.com/c/go/+/229577
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
Ruixin(Peter) Bao [Mon, 27 Apr 2020 19:23:37 +0000 (15:23 -0400)]
hash/crc32: simplify hasVX checking on s390x
Originally, we use an assembly function that returns a boolean result to
tell whether the machine has vector facility or not. It is now no longer
needed when we can directly use cpu.S390X.HasVX variable. This CL
also removes the last occurence of hasVectorFacility function on s390x.
Change-Id: Id20cb746c21eacac5e13344b362e2d87adfe4317
Reviewed-on: https://go-review.googlesource.com/c/go/+/230337 Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Ruixin(Peter) Bao [Mon, 27 Apr 2020 13:51:01 +0000 (09:51 -0400)]
math/big: simplify hasVX checking on s390x
Originally, we use an assembly function that returns a boolean result to
tell whether the machine has vector facility or not. It is now no longer
needed when we can directly use cpu.S390X.HasVX variable.
Change-Id: Ic1dae851982532bcfd9a9453416c112347f21d87
Reviewed-on: https://go-review.googlesource.com/c/go/+/230318 Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Ruixin Bao [Mon, 27 Apr 2020 19:02:52 +0000 (12:02 -0700)]
math: simplify hasVX checking on s390x
Originally, we use an assembly function that returns a boolean result to
tell whether the machine has vector facility or not. It is now no longer
needed when we can directly use cpu.S390X.HasVX variable.
Change-Id: Ic3ffeb9e63238ef41406d97cdc42502145ddb454
Reviewed-on: https://go-review.googlesource.com/c/go/+/230319 Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Ruixin(Peter) Bao [Mon, 21 Oct 2019 17:18:31 +0000 (13:18 -0400)]
crypto/ed25519: implement ed25519 on s390x using KDSA instruction
This CL allows the usage of KDSA instruction when it is available. The
instruction is designed to be resistant to side channel attacks and
offers performance improvement for ed25519.
Benchmarks:
name old time/op new time/op delta
Signing-8 120µs ±20% 62µs ±12% -48.40% (p=0.000 n=10+10)
Verification-8 325µs ±17% 69µs ±10% -78.80% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
Signing-8 448B ± 0% 0B -100.00% (p=0.000 n=10+10)
Verification-8 288B ± 0% 0B -100.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Signing-8 5.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
Verification-8 2.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
Change-Id: I0330ce83d807370b419ce638bc2cae4cb3c250dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/202578
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>
Dan Scales [Mon, 13 Apr 2020 15:37:03 +0000 (08:37 -0700)]
runtime: added several new lock-rank partial order edges
Several new ones came from my testing (long, repeated runs) and one (assistQueue ->
spine) came from the staticlockranking builder (filed as issue 38441).
Fixes #38441
Change-Id: I4268da0d8b8cc51251eba6bd936110c8ab4c4e61
Reviewed-on: https://go-review.googlesource.com/c/go/+/229480
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
Michael Anthony Knyszek [Fri, 24 Apr 2020 20:06:57 +0000 (20:06 +0000)]
runtime: bound small object sweeping to 100 spans when allocating
Currently, the small object sweeper will sweep until it finds a free
slot or there are no more spans of that size class to sweep. In dense
heaps, this can cause sweeping for a given size class to take
unbounded time, and gets worse with larger heaps.
This CL limits the small object sweeper to try at most 100 spans
before giving up and allocating a fresh span. Since it's already shown
that 100 spans are completely full at that point, the space overhead
of this fresh span is at most 1%.
This CL is based on an experimental CL by Austin Clements (CL 187817)
and is updated to be part of the mcentral implementation, gated by
go115NewMCentralImpl.
Michael Anthony Knyszek [Thu, 20 Feb 2020 20:58:45 +0000 (20:58 +0000)]
runtime: add new mcentral implementation
Currently mcentral is implemented as a couple of linked lists of spans
protected by a lock. Unfortunately this design leads to significant lock
contention.
The span ownership model is also confusing and complicated. In-use spans
jump between being owned by multiple sources, generally some combination
of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache.
So first to address contention, this change replaces those linked lists
with gcSweepBufs which have an atomic fast path. Then, we change up the
ownership model: a span may be simultaneously owned only by an mcentral
and the page reclaimer. Otherwise, an mcentral (which now consists of
sweep bufs), a sweeper, or an mcache are the sole owners of a span at
any given time. This dramatically simplifies reasoning about span
ownership in the runtime.
As a result of this new ownership model, sweeping is now driven by
walking over the mcentrals rather than having its own global list of
spans. Because we no longer have a global list and we traditionally
haven't used the mcentrals for large object spans, we no longer have
anywhere to put large objects. So, this change also makes it so that we
keep large object spans in the appropriate mcentral lists.
In terms of the static lock ranking, we add the spanSet spine locks in
pretty much the same place as the mcentral locks, since they have the
potential to be manipulated both on the allocation and sweep paths, like
the mcentral locks.
This new implementation is turned on by default via a feature flag
called go115NewMCentralImpl.
Benchmark results for 1 KiB allocation throughput (5 runs each):
Michael Anthony Knyszek [Thu, 20 Feb 2020 21:28:02 +0000 (21:28 +0000)]
runtime: implement the spanSet data structure
This change implements the spanSet data structure which is based off of
the gcSweepBuf data structure. While the general idea is the same (one
has two of these which one switches between every GC cycle; one to push
to and one to pop from), there are some key differences.
Firstly, we never have a need to iterate over this data structure so
delete numBlocks and block. Secondly, we want to be able to pop from the
front of the structure concurrently with pushes to the back. As a result
we need to maintain both a head and a tail and this change introduces an
atomic headTail structure similar to the one used by sync.Pool. It also
implements popfirst in a similar way.
As a result of this headTail, we need to be able to explicitly reset the
length, head, and tail when it goes empty at the end of sweep
termination, so add a reset method.
Michael Anthony Knyszek [Thu, 13 Feb 2020 21:13:40 +0000 (21:13 +0000)]
runtime: manage a pool of spanSetBlocks and free them eagerly
This change adds a global pool of spanSetBlocks to the spanSet data
structure and adds support for eagerly freeing these blocks back to the
pool if the block goes empty.
This change prepares us to use this data structure in more places in the
runtime by allowing reuse of spanSetBlock.
Michael Anthony Knyszek [Thu, 20 Feb 2020 21:22:03 +0000 (21:22 +0000)]
runtime: add spanSet data structure
This change copies the gcSweepBuf data structure into a new file and
renames it spanSet. It will serve as the basis for a heavily modified
version of the gcSweepBuf data structure for the new mcentral
implementation.
We move it into a separate file now for two reasons:
1. We will need both implementations as they will coexist simultaneously
for a time.
2. By creating it now in a new change it'll make future changes which
modify it easier to review (rather than introducing the new file then).
Joel Sing [Mon, 30 Mar 2020 15:00:50 +0000 (02:00 +1100)]
cmd/compile: implement multi-control branches for riscv64
Implement multi-control branches for riscv64, switching to using the BNEZ
pseudo-instruction when rewriting conditionals. This will allow for further
branch optimisations to later be performed via rewrites.
Change-Id: I7f2c69f3c77494b403f26058c6bc8432d8070ad0
Reviewed-on: https://go-review.googlesource.com/c/go/+/226399 Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
cmd/compile: eliminate some array equality alg loops
type T [3]string
Prior to this change, we generated this equality alg for T:
func eqT(p, q *T) (r bool) {
for i := range *p {
if len(p[i]) == len(q[i]) {
} else {
return
}
}
for j := range *p {
if runtime.memeq(p[j].ptr, q[j].ptr, len(p[j])) {
} else {
return
}
}
return true
}
That first loop can be profitably eliminated;
it's cheaper to spell out 3 length equality checks.
We now generate:
func eqT(p, q *T) (r bool) {
if len(p[0]) == len(q[0]) &&
len(p[1]) == len(q[1]) &&
len(p[2]) == len(q[2]) {
} else {
return
}
for i := 0; i < len(p); i++ {
if runtime.memeq(p[j].ptr, q[j].ptr, len(p[j])) {
} else {
return
}
}
return true
}
We now also eliminate loops for small float arrays as well,
and for any array of size 1.
These cutoffs were selected to minimize code size on amd64
at this moment, for lack of a more compelling methodology.
Any smallish number would do.
The switch from range loops to plain for loops allowed me
to use a temp instead of a named var, which eliminated
a pointless argument to checkAll.
The code to construct them is also a bit clearer, in my opinion.
cmd/compile: improve equality algs for arrays of strings
type T [8]string
Prior to this change, we generated this equality algorithm for T:
func eqT(p, q *T) (r bool) {
for i := range *p {
if p[i] == q[i] {
} else {
return
}
}
return true
}
This change splits this into two loops, so that we can do the
cheap (length) half early and only then do the expensive (contents) half.
We now generate:
func eqT(p, q *T) (r bool) {
for i := range *p {
if len(p[i]) == len(q[i]) {
} else {
return
}
}
for j := range *p {
if runtime.memeq(p[j].ptr, q[j].ptr, len(p[j])) {
} else {
return
}
}
return true
}
The generated code is typically ~17% larger because it contains
two loops instead of one. In the future, we might want to unroll
the first loop when the array is small.
cmd/compile: improve equality algs for arrays of interfaces
type T [8]interface{}
Prior to this change, we generated this equality algorithm for T:
func eqT(p, q *T) bool {
for i := range *p {
if p[i] != q[i] {
return false
}
}
return true
}
This change splits this into two loops, so that we can do the
cheap (type) half early and only then do the expensive (data) half.
We now generate:
func eqT(p, q *T) (r bool) {
for i := range *p {
if p[i].type == q[i].type {
} else {
return
}
}
for j := range *p {
if runtime.efaceeq(p[j].type, p[j].data, q[j].data) {
} else {
return
}
}
return true
}
The use of a named return value and a bare return is to work
around some typechecking problems that stymied me.
The structure of using equals and else (instead of not equals and then)
was for implementation convenience and clarity. As a bonus,
it generates slightly shorter code on AMD64, because zeroing a register
to return is cheaper than writing $1 to it.
The generated code is typically ~17% larger because it contains
two loops instead of one. In the future, we might want to unroll
the first loop when the array is small.
This change splits the two halves of the interface equality,
so that we can do the cheap (type) half early and the expensive
(data) half late. We now generate:
This change splits the two halves of the string equality,
so that we can do the cheap (length) half early and the expensive
(contents) half late. We now generate:
Daniel Martí [Sun, 26 Apr 2020 20:44:36 +0000 (21:44 +0100)]
cmd/compile: remove If type in rulegen
We only generate if statements via CondBreak, which is nice as the
control flow is simple and easy to work with. It seems like the If type
was added but never used, so remove it to avoid confusion.
We had a TODO about replacing CondBreak with If instead. I gave that a
try, but it doesn't seem worth the effort. The code gets more complex
and we don't really win anything in return.
While at it, don't use op strings as format strings in exprf. This
doesn't cause any issue at the moment, but it's best to be explicit
about the operator not containing any formatting verbs.
Change-Id: Ib59ad72d3628bf91594efc609e222232ad1e8748
Reviewed-on: https://go-review.googlesource.com/c/go/+/230257 Reviewed-by: Keith Randall <khr@golang.org>
Michael Munday [Thu, 23 Apr 2020 06:33:14 +0000 (23:33 -0700)]
cmd/compile: adopt strong aux typing for some s390x rules
Apply strong aux typing to lowering rules that do not require
modification beyond substituting -> for =>. Other lowering rules
and all the optimization rules will follow. I'm breaking it up
to allow toolstash-check to pass on the big CLs.
Passes toolstash-check -all.
Change-Id: I6f1340058a8eb5a1390411e59fcbea9d7f777e58
Reviewed-on: https://go-review.googlesource.com/c/go/+/229400
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Daniel Martí [Sat, 18 Apr 2020 16:34:00 +0000 (17:34 +0100)]
cmd/compile: minor rulegen simplifications
The commuteDepth variable is no longer necessary; remove it.
Else branches after a log.Fatal call are unnecessary.
Also make the unbalanced return an integer, so we can differentiate
positive from negative cases. We only want to continue a rule with the
following lines if this balance is positive, for example.
While at it, make the balance loop stop when it goes negative, to not
let ")(" seem balanced.
Tyson Andre [Sat, 25 Apr 2020 21:39:19 +0000 (21:39 +0000)]
net/http/httputil: fix typo in unit test name
Everywhere else is using "cancellation"
The reasoning is mentioned in 170060
> Though there is variation in the spelling of canceled,
> cancellation is always spelled with a double l.
>
> Reference: https://www.grammarly.com/blog/canceled-vs-cancelled/
Tyson Andre [Sat, 25 Apr 2020 20:53:42 +0000 (20:53 +0000)]
math/cmplx: fix typo in code comment
Everywhere else is using "cancellation" as of 2019
The reasoning is mentioned in 170060.
> Though there is variation in the spelling of canceled,
> cancellation is always spelled with a double l.
>
> Reference: https://www.grammarly.com/blog/canceled-vs-cancelled/