Matthew Dempsky [Fri, 3 Mar 2017 21:38:49 +0000 (13:38 -0800)]
cmd/compile/internal/gc: replace Node.Ullman with Node.HasCall
Since switching to SSA, the only remaining use for the Ullman field
was in tracking whether or not an expression contained a function
call. Give it a new name and encode it in our fancy new bitset field.
David Lazar [Mon, 20 Feb 2017 14:55:54 +0000 (09:55 -0500)]
cmd/compile: include position info in exported function bodies
This gives accurate line numbers to inlined functions from another
package. Previously AST nodes from another package would get the line
number of the import statement for that package.
The following benchmark results show how the size of package export data
is impacted by this change. These benchmarks were created by compiling
the go1 benchmark and running `go tool pack x` to extract the export
data from the resulting .a files.
David Lazar [Fri, 17 Feb 2017 21:20:52 +0000 (16:20 -0500)]
cmd/internal/obj: avoid duplicate file name symbols
The meaning of Version=1 was overloaded: it was reserved for file name
symbols (to avoid conflicts with non-file name symbols), but was also
used to mean "give me a fresh version number for this symbol."
With the new inlining tree, the same file name symbol can appear in
multiple entries, but each one would become a distinct symbol with its
own version number.
Now, we avoid duplicating symbols by using Version=0 for file name
symbols and we avoid conflicts with other symbols by prefixing the
symbol name with "gofile..".
David Lazar [Fri, 17 Feb 2017 21:07:47 +0000 (16:07 -0500)]
cmd/compile: copy literals when inlining
Without this, literals keep their original source positions through
inlining, which results in strange jumps in line numbers of inlined
function bodies. By copying literals, inlining can update their source
position like other nodes.
Fixes #15453.
Change-Id: Iad5d9bbfe183883794213266dc30e31bab89ee69
Reviewed-on: https://go-review.googlesource.com/37232
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
David Lazar [Fri, 17 Feb 2017 17:28:05 +0000 (12:28 -0500)]
cmd/compile,link: generate PC-value tables with inlining information
In order to generate accurate tracebacks, the runtime needs to know the
inlined call stack for a given PC. This creates two tables per function
for this purpose. The first table is the inlining tree (stored in the
function's funcdata), which has a node containing the file, line, and
function name for every inlined call. The second table is a PC-value
table that maps each PC to a node in the inlining tree (or -1 if the PC
is not the result of inlining).
To give the appearance that inlining hasn't happened, the runtime also
needs the original source position information of inlined AST nodes.
Previously the compiler plastered over the line numbers of inlined AST
nodes with the line number of the call. This meant that the PC-line
table mapped each PC to line number of the outermost call in its inlined
call stack, with no way to access the innermost line number.
Now the compiler retains line numbers of inlined AST nodes and writes
the innermost source position information to the PC-line and PC-file
tables. Some tools and tests expect to see outermost line numbers, so we
provide the OutermostLine function for displaying line info.
To keep track of the inlined call stack for an AST node, we extend the
src.PosBase type with an index into a global inlining tree. Every time
the compiler inlines a call, it creates a node in the global inlining
tree for the call, and writes its index to the PosBase of every inlined
AST node. The parent of this node is the inlining tree index of the
call. -1 signifies no parent.
For each function, the compiler creates a local inlining tree and a
PC-value table mapping each PC to an index in the local tree. These are
written to an object file, which is read by the linker. The linker
re-encodes these tables compactly by deduplicating function names and
file names.
This change increases the size of binaries by 4-5%. For example, this is
how the go1 benchmark binary is impacted by this change:
Johan Brandhorst [Wed, 21 Dec 2016 13:49:04 +0000 (13:49 +0000)]
net/http/httptest: add Client and Certificate methods to Server
Adds a function for easily accessing the x509.Certificate
of a Server, if there is one. Also adds a helper function
for getting a http.Client suitable for use with the server.
This makes the steps required to test a httptest
TLS server simpler.
Cherry Zhang [Fri, 3 Feb 2017 21:18:01 +0000 (16:18 -0500)]
cmd/compile: remove zeroing after newobject
The Zero op right after newobject has been removed. But this rule
does not cover Store of constant zero (for SSA-able types). Add
rules to cover Store op as well.
Cherry Zhang [Fri, 3 Mar 2017 18:53:13 +0000 (13:53 -0500)]
cmd/compile: fix optimization of Zero newobject on amd64p32
On amd64p32, PtrSize and RegSize don't agree, and function return
value is aligned with RegSize. Fix this rule. Other architectures
are not affected, where PtrSize and RegSize are the same.
Matthew Dempsky [Fri, 3 Mar 2017 10:43:44 +0000 (02:43 -0800)]
cmd/compile/internal/gc: remove OHMUL Op
Previously the compiler rewrote constant division into OHMUL
operations, but that rewriting was moved to SSA in CL 37015. Now OHMUL
is unused, so we can get rid of it.
Change-Id: Ib6fc7c2b6435510bafb5735b3b4f42cfd8ed8cdb
Reviewed-on: https://go-review.googlesource.com/37750
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
Austin Clements [Mon, 6 Feb 2017 00:34:16 +0000 (19:34 -0500)]
runtime: clarify access to mheap_.busy
There are two accesses to mheap_.busy that are guarded by checks
against len(mheap_.free). This works because both lists are (and must
be) the same length, but it makes the code less clear. Change these to
use len(mheap_.busy) so the access more clearly parallels the check.
Austin Clements [Fri, 23 Dec 2016 01:00:18 +0000 (18:00 -0700)]
runtime: simplify sweep allocation counting
Currently sweep counts the number of allocated objects, computes the
number of free objects from that, then re-computes the number of
allocated objects from that. Simplify and clean this up by skipping
these intermediate steps.
Austin Clements [Tue, 31 Jan 2017 16:46:36 +0000 (11:46 -0500)]
runtime: don't rescan finalizers queue during mark termination
Currently we scan the finalizers queue both during concurrent mark and
during mark termination. This costs roughly 20ns per queued finalizer
and about 1ns per unused finalizer queue slot (allocated queue length
never decreases), which can drive up STW time if there are many
finalizers.
However, we only add finalizers to this queue during sweeping, which
means that the second scan will never find anything new. Hence, we can
fix this by simply not scanning the finalizers queue during mark
termination. This brings the STW time under the 100µs goal even with
1,000,000 queued finalizers.
Austin Clements [Wed, 22 Feb 2017 21:13:06 +0000 (16:13 -0500)]
cmd/compile: accept string debug flags
The compiler's -d flag accepts string-valued flags, but currently only
for SSA debug flags. Extend it to support string values for other
flags. This also makes the syntax somewhat more sane so flag=value and
flag:value now both accept integers and strings.
Cherry Zhang [Fri, 3 Feb 2017 00:47:59 +0000 (19:47 -0500)]
cmd/compile: get rid of "volatile" in SSA
A value is "volatile" if it is a pointer to the argument region
on stack which will be clobbered by function call. This is used
to make sure the value is safe when inserting write barrier calls.
The writebarrier pass can tell whether a value is such a pointer.
Therefore no need to mark it when building SSA and thread this
information through.
Will Storey [Mon, 20 Feb 2017 05:24:17 +0000 (21:24 -0800)]
image/gif: handle an extra data sub-block byte.
This changes the decoder's behaviour when there is stray/extra data
found after an image is decompressed (e.g., data sub-blocks after an LZW
End of Information Code). Instead of raising an error, we silently skip
over such data until we find the end of the image data marked by a Block
Terminator. We skip at most one byte as sample problem GIFs exhibit this
property.
GIFs should not have and do not need such stray data (though the
specification is arguably ambiguous). However GIFs with such properties
have been seen in the wild.
Fixes #16146
Change-Id: Ie7e69052bab5256b4834992304e6ca58e93c1879
Reviewed-on: https://go-review.googlesource.com/37258 Reviewed-by: Nigel Tao <nigeltao@golang.org>
Run-TryBot: Nigel Tao <nigeltao@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
runtime/pprof: fix data race between Profile.Add and Profile.WriteTo
p.m is accessed in WriteTo without holding p.mu.
Move the access inside the critical section.
The race detector catches this bug using this program:
package main
import (
"os"
"runtime/pprof"
"time"
)
func main() {
p := pprof.NewProfile("ABC")
go func() {
p.WriteTo(os.Stdout, 1)
time.Sleep(time.Second)
}()
p.Add("abc", 0)
time.Sleep(time.Second)
}
$ go run -race x.go
==================
WARNING: DATA RACE
Write at 0x00c42007c240 by main goroutine:
runtime.mapassign()
/Users/josh/go/tip/src/runtime/hashmap.go:485 +0x0
runtime/pprof.(*Profile).Add()
/Users/josh/go/tip/src/runtime/pprof/pprof.go:281 +0x255
main.main()
/Users/josh/go/tip/src/p.go:15 +0x9d
Previous read at 0x00c42007c240 by goroutine 6:
runtime/pprof.(*Profile).WriteTo()
/Users/josh/go/tip/src/runtime/pprof/pprof.go:314 +0xc5
main.main.func1()
/Users/josh/go/tip/src/x.go:12 +0x69
Goroutine 6 (running) created at:
main.main()
/Users/josh/go/tip/src/x.go:11 +0x6e
==================
ABC profile: total 1
1 @ 0x110ccb4 0x111aeee 0x1055053 0x107f031
Robert Griesemer [Thu, 2 Mar 2017 19:13:06 +0000 (11:13 -0800)]
go/types: don't exclude package unsafe from a Package's Imports list
There's no good reason to exclude it and it only makes the code more
complicated and less consistent. Having it in the list provides an
easy way to detect if a package uses operations from package unsafe.
Change-Id: I2f9b0485db0a680bd82f3b93a350b048db3f7701
Reviewed-on: https://go-review.googlesource.com/37694 Reviewed-by: Alan Donovan <adonovan@google.com>
Robert Griesemer [Wed, 1 Mar 2017 23:35:24 +0000 (15:35 -0800)]
go/types: gotype to always report the same first error now
The old code may have reported different errors given an
erroneous package depending on the order in which files
were parsed concurrently. The new code always reports
errors in "file order", independent of processing order.
Also:
- simplified parsing code and internal concurrency control
- removed -seq flag which didn't really add useful functionality
Change-Id: I18e24e630f458f2bc107a7b83926ae761d63c334
Reviewed-on: https://go-review.googlesource.com/37655 Reviewed-by: Alan Donovan <adonovan@google.com>
Brad Fitzpatrick [Mon, 27 Feb 2017 05:41:50 +0000 (05:41 +0000)]
net/http: clean up Transport.RoundTrip error handling
If I put a 10 millisecond sleep at testHookWaitResLoop, before the big
select in (*persistConn).roundTrip, two flakes immediately started
happening, TestTransportBodyReadError (#19231) and
TestTransportPersistConnReadLoopEOF.
The problem was that there are many ways for a RoundTrip call to fail
(errors reading from Request.Body while writing the response, errors
writing the response, errors reading the response due to server
closes, errors due to servers sending malformed responses,
cancelations, timeouts, etc.), and many of those failures then tear
down the TCP connection, causing more failures, since there are always
at least three goroutines involved (reading, writing, RoundTripping).
Because the errors were communicated over buffered channels to a giant
select, the error returned to the caller was a function of which
random select case was called, which was why a 10ms delay before the
select brought out so many bugs. (several fixed in my previous CLs the past
few days).
Instead, track the error explicitly in the transportRequest, guarded
by a mutex.
In addition, this CL now:
* differentiates between the two ways writing a request can fail: the
io.Copy reading from the Request.Body or the io.Copy writing to the
network. A new io.Reader type notes read errors from the
Request.Body. The read-from-body vs write-to-network errors are now
prioritized differently.
* unifies the two mapRoundTripErrorFromXXX methods into one
mapRoundTripError method since their logic is now the same.
* adds a (*Request).WithT(*testing.T) method in export_test.go, usable
by tests, to call t.Logf at points during RoundTrip. This is disabled
behind a constant except when debugging.
* documents and deflakes TestClientRedirectContext
I've tested this CL with high -count values, with/without -race,
with/without delays before the select, etc. So far it seems robust.
Fixes #19231 (TestTransportBodyReadError flake)
Updates #14203 (source of errors unclear; they're now tracked more)
Updates #15935 (document Transport errors more; at least understood more now)
Mike Danese [Wed, 1 Mar 2017 18:43:57 +0000 (10:43 -0800)]
crypto/tls: make Config.Clone also clone the GetClientCertificate field
Using GetClientCertificate with the http client is currently completely
broken because inside the transport we clone the tls.Config and pass it
off to the tls.Client. Since tls.Config.Clone() does not pass forward
the GetClientCertificate field, GetClientCertificate is ignored in this
context.
Fixes #19264
Change-Id: Ie214f9f0039ac7c3a2dab8ffd14d30668bdb4c71 Signed-off-by: Mike Danese <mikedanese@google.com>
Reviewed-on: https://go-review.googlesource.com/37541 Reviewed-by: Filippo Valsorda <hi@filippo.io> Reviewed-by: Adam Langley <agl@golang.org>
Run-TryBot: Adam Langley <agl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
cmd/vet: refactor to support alternative importers
Instead of constructing the importer in init, do it lazily as needed.
This lets us select the importer using a command line flag.
The addition of the command line flag will come in a follow-up CL.
The current StdSizes most closely matches
the gc compiler, and the uses I know of that care
which compiler the sizes are for are all for
the gc compiler, so call the existing
implementation "gc".
Russ Cox [Thu, 2 Mar 2017 02:09:52 +0000 (21:09 -0500)]
time: strip monotonic time in t.Round, t.Truncate
The original analysis of the Go corpus assumed that these
stripped monotonic time. During the design discussion we
decided to try not stripping monotonic time here, but existing
code works better if we do.
See the discussion on golang.org/issue/18991 for more details.
For #18991.
Change-Id: I04d355ffe56ca0317acdd2ca76cb3033c277f6d1
Reviewed-on: https://go-review.googlesource.com/37542 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Philip Hofer [Tue, 28 Feb 2017 22:35:37 +0000 (14:35 -0800)]
cmd/internal/obj/arm: improve static branch prediction for wrapper prologue
This is a follow-up to CL 36893.
Move the unlikely branch in the wrapper prologue to the end
of the function, where it has minimal impact on the instruction
cache. Static branch prediction is also less likely to choose
a forward branch.
Ian Lance Taylor [Thu, 2 Mar 2017 01:31:06 +0000 (17:31 -0800)]
cmd/go: fix TestFFLAGS for Fortran compilers that accept unknown options
The test assumed that passing an unknown option to the Fortran
compiler would cause the compiler to fail. Unfortunately it appears
that some succeed. It's irrelevant to the actual test, which is
verifying that the flag was indeed passed.
Fixes #19080.
Change-Id: Ib9e89447a2104e4742f4b98938373fc2522772aa
Reviewed-on: https://go-review.googlesource.com/37658
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Keith Randall [Sun, 19 Feb 2017 17:37:22 +0000 (09:37 -0800)]
cmd/compile: fix disassembly of invalid instructions
Make sure that if we encode an explicit base register, we print it.
That will ensure that if we make an Addr with an auto variable but
a base that isn't SP, then it will be obvious from the disassembly.
Brad Fitzpatrick [Wed, 1 Mar 2017 17:44:11 +0000 (17:44 +0000)]
net/http: deflake TestClientRedirect308NoGetBody
In an unrelated CL I found a way to increase the likelihood of latent
flaky tests and found this one.
This is just like yesterday's https://golang.org/cl/37624 and dozens
before it (all remnants from the great net/http test parallelization
of Nov 2016 in https://golang.org/cl/32684).
Lynn Boger [Wed, 22 Feb 2017 18:32:54 +0000 (13:32 -0500)]
runtime, cmd/go: roll back stale message, test detail
Some debugging code was recently added to:
1) provide more detail for the stale reason when it is
determined that a package is stale
2) provide file and package time and date information when
it is determined that runtime.a is stale
Lynn Boger [Fri, 10 Feb 2017 20:40:44 +0000 (15:40 -0500)]
cmd/compile: use reg moves for int <-> float conversions on ppc64x
This makes a change in the SSA code generated for OpPPC64Xf2i64
and OpPPC64Xi2f64 to use register based instructions to convert
between float and integer. This will require at least power8.
Currently the conversion is done by storing to and loading
from memory, which is more expensive.
Change-Id: I8e1cf14d3880d7cd720dc5188dd174cba1f7fef7
Reviewed-on: https://go-review.googlesource.com/36725 Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>
Kevin Burke [Wed, 1 Mar 2017 00:28:34 +0000 (16:28 -0800)]
os: add OpenFile example for appending data
Fixes #19329.
Change-Id: I6d8bb112a56d751a6d3ea9bd6021803cb9f59234
Reviewed-on: https://go-review.googlesource.com/37619 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Heschi Kreinick [Mon, 27 Feb 2017 23:26:33 +0000 (18:26 -0500)]
testing: fix Benchmark() to start at 1 iteration, not 100
The run1 call removed in golang.org/cl/36990 was necessary to
initialize the duration of the benchmark. With it gone, the math in
launch() starts from 100. This doesn't work out well for second-long
benchmark methods. Put it back.
Alex Brainman [Wed, 8 Feb 2017 01:47:43 +0000 (12:47 +1100)]
cmd/link: write dwarf sections
Also stop skipping TestExternalLinkerDWARF and
TestDefaultLinkerDWARF.
Fixes #10776.
Change-Id: Ia596a684132e3cdee59ce5539293eedc1752fe5a
Reviewed-on: https://go-review.googlesource.com/36983 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Alex Brainman [Wed, 8 Feb 2017 01:30:30 +0000 (12:30 +1100)]
cmd/link: write dwarf relocations
For #10776.
Change-Id: I11dd441d8e5d6316889ffa8418df8b58c57c677d
Reviewed-on: https://go-review.googlesource.com/36982 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Matthew Dempsky [Tue, 28 Feb 2017 23:51:29 +0000 (15:51 -0800)]
cmd/compile/internal/gc: separate builtin and real runtime packages
The builtin runtime package definitions intentionally diverge from the
actual runtime package's, but this only works as long as they never
overlap.
To make it easier to expand the builtin runtime package, this CL now
loads their definitions into a logically separate "go.runtime"
package. By resetting the package's Prefix field to "runtime", any
references to builtin definitions will still resolve against the real
package runtime.
Fixes #14482.
Passes toolstash -cmp.
Change-Id: I539c0994deaed4506a331f38c5b4d6bc8c95433f
Reviewed-on: https://go-review.googlesource.com/37538
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Heschi Kreinick [Fri, 17 Feb 2017 21:52:16 +0000 (16:52 -0500)]
cmd/compile, cmd/asm: remove Link.Plists
Link.Plists never contained more than one Plist, and sometimes none.
Passing around the Plist being worked on is straightforward and makes
the data flow easier to follow.
Robert Griesemer [Sat, 25 Feb 2017 02:13:29 +0000 (18:13 -0800)]
compress/flate: use math/bits.Reverse8/16 instead of local implementation
No measurable impact on performance (specifically, no degradation).
Reverse is used in Huffman en/de-coding. For completeness, here are
all the speed-related benchmark results:
Martin Möhrmann [Tue, 28 Feb 2017 20:21:45 +0000 (21:21 +0100)]
strings: fix handling of invalid UTF-8 sequences in Map
The new Map implementation introduced in golang.org/cl/33201
did not differentiate if an invalid UTF-8 sequence was decoded
or the RuneError rune. It would therefore always advance by
3 bytes (which is the length of the RuneError rune) instead
of 1 for an invalid sequences. This cl adds a check to correctly
determine the length of bytes needed to advance to the next rune.
Fixes #19330.
Change-Id: I1e7f9333f3ef6068ffc64015bb0a9f32b0b7111d
Reviewed-on: https://go-review.googlesource.com/37597
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Keith Randall [Tue, 28 Feb 2017 22:01:59 +0000 (14:01 -0800)]
cmd/compile: simplify load+op rules
There's no need to use @block rules, as canMergeLoad makes sure that
the load and op are already in the same block.
With no @block needed, we also don't need to set the type explicitly.
It can just be inherited from the op being rewritten.
runtime: evacuate old map buckets more consistently
During map growth, buckets are evacuated in two ways.
When a value is altered, its containing bucket is evacuated.
Also, an evacuation mark is maintained and advanced every time.
Prior to this CL, the evacuation mark was always incremented,
even if the next bucket to be evacuated had already been evacuated.
This CL changes evacuation mark advancement to skip previously
evacuated buckets. This has the effect of making map evacuation both
more aggressive and more consistent.
Aggressive map evacuation is good. While the map is growing,
map accesses must check two buckets, which may be far apart in memory.
Map growth also delays garbage collection.
And if map evacuation is not aggressive enough, there is a risk that
a populate-once read-many map may be stuck permanently in map growth.
This CL does not eliminate that possibility, but it shrinks the window.
philhofer [Mon, 20 Feb 2017 16:43:54 +0000 (08:43 -0800)]
cmd/compile/ssa: more aggressive constant folding
Add rewrite rules that canonicalize the location
of constants in expressions, and fold conststants
that appear in operations that can be trivially
reassociated.
After this change, the compiler constant-folds
expressions like "4 + x - 1" and "4 & x & 1"
cmd/compile, runtime: specialize convT2x, don't alloc for zero vals
Prior to this CL, all runtime conversions
from a concrete value to an interface went
through one of two runtime calls: convT2E or convT2I.
However, in practice, basic types are very common.
Specializing convT2x for those basic types allows
for a more efficient implementation for those types.
For basic scalars and strings, allocation and copying
can use the same methods as normal code.
For pointer-free types, allocation can occur without
zeroing, and copying can take place without GC calls.
For slices, copying is cheaper and simpler.
While compiling make.bash, 93% of all convT2x calls
are now to one of these specialized convT2x call.
Within specialized convT2x routines, it is cheap to check
for a zero value, in a way that it is not in general.
When we detect a zero value there, we return a pointer
to zeroVal, rather than allocating.
Cherry Zhang [Mon, 20 Feb 2017 04:40:24 +0000 (23:40 -0500)]
cmd/compile: update signature of runtime.memclr*
runtime.memclr* functions have signatures
func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
func memclrHasPointers(ptr unsafe.Pointer, n uintptr)
Update compiler's copy. Also teach gc/mkbuiltin.go to handle
unsafe.Pointer. The import statement and its support is not
really necessary, but just to make it look like real Go code.
cmd/vet/all: move suspicious shift whitelists to 64 bit
This is an inconsequential consequence of updating
math/big to use math/bits.
Better would be to teach the vet shift test
to size int/uint/uintptr to the platform in use,
eliminating the whole category of "might be too small".
Filed #19321 for that.
Michael Munday [Mon, 13 Feb 2017 03:12:12 +0000 (22:12 -0500)]
cmd/compile: emit fused multiply-{add,subtract} instructions on s390x
Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:
- (a * b) + c // can emit fused multiply-add
- float64(a * b) + c // cannot emit fused multiply-add
float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).
Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.
Improves the performance of the floating point implementation of
poly1305:
Carlo Alberto Ferraris [Fri, 24 Feb 2017 23:48:00 +0000 (08:48 +0900)]
bytes: make bytes.Buffer cache-friendly
During benchmark of an internal tool we found out that (*Buffer).Reset() was
surprisingly showing up in CPU profiles.
This CL contains two related changes aimed at speeding up Reset():
1. Create a fast path for Truncate(0) by moving the logic to Reset()
(this makes Reset() a simple leaf func that gets inlined since it
gets compiled to 3 MOVx instructions). Accordingly change calls in
the rest of the Buffer methods to call Reset() instead of Truncate(0).
2. Reorder the fields in the Buffer struct so that frequently accessed
fields are packed together (buf, off, lastRead). This also make them
likely to be in the same cacheline.
Ideally it would be advisable to have Buffer{} cacheline-aligned, but I
couldn't find a way to do this without changing the size of the bootstrap
array (but this will cause some regressions, because it will make duffcopy
show up in CPU profiles where it wasn't showing up before).
go1 benchmarks are not really affected, but some other benchmarks that
exercise Buffer more show improvements:
Change-Id: I86d7d9d2cac65335baf62214fbb35ba0fd8f9528
Reviewed-on: https://go-review.googlesource.com/37416
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>