Brad Fitzpatrick [Tue, 13 Aug 2013 21:48:08 +0000 (14:48 -0700)]
archive/zip: remove an allocation, speed up a test
Update #6138
TestOver65kFiles spends all its time garbage collecting.
Removing the 1.4 MB of allocations per each of the 65k
files brings this from 34 seconds to 0.23 seconds.
Rob Pike [Tue, 13 Aug 2013 21:03:18 +0000 (07:03 +1000)]
cmd/go: nicer error diagnosis in go test
Before,
go test -bench .
would just dump the long generic "go help" message. Confusing and
unhelpful. Now the message is short and on point and also reminds the
user about the oft-forgotten "go help testflag".
% go test -bench
go test: missing argument for flag bench
run "go help test" or "go help testflag" for more information
%
Dmitriy Vyukov [Tue, 13 Aug 2013 18:14:04 +0000 (22:14 +0400)]
runtime: more reliable preemption
Currently it's possible that a goroutine
that periodically executes non-blocking
cgo/syscalls is never preempted.
This change splits scheduler and syscall
ticks to prevent such situation.
Dmitriy Vyukov [Tue, 13 Aug 2013 18:12:02 +0000 (22:12 +0400)]
runtime: do no lose CPU profiling signals
Currently we lose lots of profiling signals.
Most notably, GC is not accounted at all.
But stack splits, scheduler, syscalls, etc are lost as well.
This creates seriously misleading profile.
With this change all profiling signals are accounted.
Now I see these additional entries that were previously absent:
161 29.7% 29.7% 164 30.3% syscall.Syscall
12 2.2% 50.9% 12 2.2% scanblock
11 2.0% 55.0% 11 2.0% markonly
10 1.8% 58.9% 10 1.8% sweepspan
2 0.4% 85.8% 2 0.4% runtime.newstack
It is still impossible to understand what causes stack splits,
but at least it's clear how many time is spent on them.
Update #2197.
Update #5659.
Alberto García Hierro [Tue, 13 Aug 2013 16:42:21 +0000 (12:42 -0400)]
cmd/cgo: Add support for C function pointers
* Add a new kind of Name, "fpvar" which stands for function pointer variable
* When walking the AST, find functions used as expressions and create a new Name object for them
* Track functions which are only used in expr contexts, and avoid generating bridge code for them
Dmitriy Vyukov [Tue, 13 Aug 2013 10:14:24 +0000 (14:14 +0400)]
runtime: eliminate excessive notewakeup calls in timers
If the timer goroutine is wakeup by timeout,
other goroutines will still notewakeup because sleeping is still set.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/12763043
Russ Cox [Tue, 13 Aug 2013 04:09:31 +0000 (00:09 -0400)]
cmd/gc: add temporary-merging optimization pass
The compilers assume they can generate temporary variables
as needed to preserve the right semantics or simplify code
generation and the back end will still generate good code.
This turns out not to be true. The back ends will only
track the first 128 variables per function and give up
on the remainder. That needs to be fixed too, in a later CL.
This CL merges temporary variables with equal types and
non-overlapping lifetimes using the greedy algorithm in
Poletto and Sarkar, "Linear Scan Register Allocation",
ACM TOPLAS 1999.
The result can be striking in the right functions.
Top 20 frame size changes in a 6g godoc binary by bytes saved:
Russ Cox [Tue, 13 Aug 2013 02:02:10 +0000 (22:02 -0400)]
cmd/gc: move flow graph into portable opt
Now there's only one copy of the flow graph construction
and dominator computation, and different optimizations
can attach different annotations to the instructions.
Rob Pike [Tue, 13 Aug 2013 01:32:32 +0000 (11:32 +1000)]
go/build: change the wording of NoGoError and comment it better
Out of context, it can be very confusing because there can be lots of Go
files in the directory, but the error message says there aren't.
Elias Naur [Tue, 13 Aug 2013 01:11:05 +0000 (11:11 +1000)]
text/template: Make function call builtin handle nil errors correctly
The call builtin unconditionally tries to convert a second return value from a function to the error type. This fails in case nil is returned, effectively making call useless for functions returning two values.
This CL adds a nil check for the second return value, and adds a test.
Note that for regular function and method calls the nil error case is handled correctly and is verified by a test.
Volker Dobler [Mon, 12 Aug 2013 22:14:34 +0000 (15:14 -0700)]
net/http: do not send malformed cookie domain attribute
Malformed domain attributes are not sent in a Set-Cookie header.
Instead the domain attribute is dropped which turns the cookie
into a host-only cookie. This is much safer than dropping characters
from domain attribute.
Domain attributes with a leading dot '.' are still allowed, even
if discouraged by RFC 6265 section 4.1.1.
Dmitriy Vyukov [Mon, 12 Aug 2013 17:48:19 +0000 (21:48 +0400)]
runtime: remove unused m->racepc
The original plan was to collect allocation stacks
for all memory blocks. But it was never implemented
and it's not in near plans and it's unclear how to do it at all.
Russ Cox [Mon, 12 Aug 2013 01:46:38 +0000 (21:46 -0400)]
cmd/6g: move opt instruction decode into common function
Add new proginfo function that returns information about a
Prog*. The information includes various instruction
description bits as well as a list of required registers set
and used and indexing registers used.
Convert the large instruction switches to use proginfo.
This information was formerly duplicated in multiple
optimization passes, inconsistently. For example, the
information about which registers an instruction requires
appeared three times for most instructions.
Most of the switches were incomplete or incorrect in some way.
For example, the switch in copyu did not list cases for INCB,
JPS, MOVAPD, MOVBWSX, MOVBWZX, PCDATA, POPQ, PUSHQ, STD,
TESTB, TESTQ, and XCHGL. Those were all falling into the
"unknown instruction" default case and stopping the rewrite,
perhaps unnecessarily. Similarly, the switch in needc only
listed a handful of the instructions that use or set the carry bit.
We still need to decide whether to use proginfo to generalize
a few of the remaining smaller switches in peep.c.
If this goes well, we'll make similar changes in 8g and 5g.
Russ Cox [Sat, 10 Aug 2013 03:10:58 +0000 (23:10 -0400)]
cmd/gc: zero pointers on entry to function
On entry to a function, zero the results and zero the pointer
section of the local variables.
This is an intermediate step on the way to precise collection
of Go frames.
This can incur a significant (up to 30%) slowdown, but it also ensures
that the garbage collector never looks at a word in a Go frame
and sees a stale pointer value that could cause a space leak.
(C frames and assembly frames are still possibly problematic.)
This CL is required to start making collection of interface values
as precise as collection of pointer values are today.
Since we have to dereference the interface type to understand
whether the value is a pointer, it is critical that the type field be
initialized.
A future CL by Carl will make the garbage collection pointer
bitmaps context-sensitive. At that point it will be possible to
remove most of the zeroing. The only values that will still need
zeroing are values whose addresses escape the block scoping
of the function but do not escape to the heap.
Mikio Hara [Sat, 10 Aug 2013 00:46:22 +0000 (09:46 +0900)]
net: move InvalidAddrError type into net.go
Probably we should remove this type before Go 1 contract has settled,
but too late. Instead, keep InvalidAddrError close to package generic
error types.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/12670044
Carl Shapiro [Fri, 9 Aug 2013 23:48:12 +0000 (16:48 -0700)]
cmd/cc, cmd/gc, runtime: Uniquely encode iface and eface pointers in the pointer map.
Prior to this change, pointer maps encoded the disposition of
a word using a single bit. A zero signaled a non-pointer
value and a one signaled a pointer value. Interface values,
which are a effectively a union type, were conservatively
labeled as a pointer.
This change widens the logical element size of the pointer map
to two bits per word. As before, zero signals a non-pointer
value and one signals a pointer value. Additionally, a two
signals an iface pointer and a three signals an eface pointer.
Following other changes to the runtime, values two and three
will allow a type information to drive interpretation of the
subsequent word so only those interface values containing a
pointer value will be scanned.
Russ Cox [Fri, 9 Aug 2013 22:34:08 +0000 (18:34 -0400)]
go/build: add AllTags to Package
AllTags lists all the tags that can affect the decision
about which files to include. Tools scanning packages
can use this to decide how many variants there are
and what they are.
Russ Cox [Fri, 9 Aug 2013 22:33:57 +0000 (18:33 -0400)]
encoding/json: escape & always
There are a few different places in the code that escape
possibly-problematic characters like < > and &.
This one was the only one missing &, so add it.
This means that if you Marshal a string, you get the
same answer you do if you Marshal a string and
pass it through the compactor. (Ironically, the
compaction makes the string longer.)
Because html/template invokes json.Marshal to
prepare escaped strings for JavaScript, this changes
the form of some of the escaped strings, but not
their meaning.
Carl Shapiro [Fri, 9 Aug 2013 20:02:33 +0000 (13:02 -0700)]
cmd/cc: use a temporary bitmap when constructing pointer maps
This change makes the way cc constructs pointer maps closer to
what gc does and is being done in preparation for changes to
the internal content of the pointer map such as a change to
distinguish interface pointers from ordinary pointers.
Dmitriy Vyukov [Fri, 9 Aug 2013 17:43:00 +0000 (21:43 +0400)]
net: add special netFD mutex
The mutex, fdMutex, handles locking and lifetime of sysfd,
and serializes Read and Write methods.
This allows to strip 2 sync.Mutex.Lock calls,
2 sync.Mutex.Unlock calls, 1 defer and some amount
of misc overhead from every network operation.
On linux/amd64, Intel E5-2690:
benchmark old ns/op new ns/op delta
BenchmarkTCP4Persistent 9595 9454 -1.47%
BenchmarkTCP4Persistent-2 8978 8772 -2.29%
BenchmarkTCP4ConcurrentReadWrite 4900 4625 -5.61%
BenchmarkTCP4ConcurrentReadWrite-2 2603 2500 -3.96%
In general it strips 70-500 ns from every network operation depending
on processor model. On my relatively new E5-2690 it accounts to ~5%
of network op cost.
Brad Fitzpatrick [Fri, 9 Aug 2013 16:46:47 +0000 (09:46 -0700)]
encoding/json: faster encoding
The old code was caching per-type struct field info. Instead,
cache type-specific encoding funcs, tailored for that
particular type to avoid unnecessary reflection at runtime.
Once the machine is built once, future encodings of that type
just run the func.
benchmark old ns/op new ns/op delta
BenchmarkCodeEncoder 4842493936975320 -23.64%
benchmark old MB/s new MB/s speedup
BenchmarkCodeEncoder 40.07 52.48 1.31x
Additionally, the numbers seem stable now at ~52 MB/s, whereas
the numbers for the old code were all over the place: 11 MB/s,
40 MB/s, 13 MB/s, 39 MB/s, etc. In the benchmark above I compared
against the best I saw the old code do.
R=rsc, adg
CC=gobot, golang-dev, r
https://golang.org/cl/9129044