Dmitriy Vyukov [Tue, 21 Jan 2014 09:06:57 +0000 (13:06 +0400)]
runtime: do not collect GC roots explicitly
Currently we collect (add) all roots into a global array in a single-threaded GC phase.
This hinders parallelism.
With this change we just kick off parallel for for number_of_goroutines+5 iterations.
Then parallel for callback decides whether it needs to scan stack of a goroutine
scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
This requires to store all goroutines in an array instead of a linked list
(to allow direct indexing).
This CL also removes DebugScan functionality. It is broken because it uses
unbounded stack, so it can not run on g0. When it was working, I've found
it helpless for debugging issues because the two algorithms are too different now.
This change would require updating the DebugScan, so it's simpler to just delete it.
With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.
Dmitriy Vyukov [Tue, 21 Jan 2014 07:20:23 +0000 (11:20 +0400)]
runtime: per-P defer pool
Instead of a per-goroutine stack of defers for all sizes,
introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes.
For a program that starts 1e6 goroutines and then joins then:
old: rss=6.6g virtmem=10.2g time=4.85s
new: rss=4.5g virtmem= 8.2g time=3.48s
Dmitriy Vyukov [Tue, 21 Jan 2014 06:53:51 +0000 (10:53 +0400)]
runtime: zero 2-word memory blocks in-place
Currently for 2-word blocks we set the flag to clear the flag. Makes no sense.
In particular on 32-bits we call memclr always.
Dmitriy Vyukov [Tue, 21 Jan 2014 06:24:42 +0000 (10:24 +0400)]
runtime: ensure fair scheduling during frequent GCs
What was happenning is as follows:
Each writer goroutine always triggers GC during its scheduling quntum.
After GC goroutines are shuffled so that the timer goroutine is always second in the queue.
This repeats infinitely, causing timer goroutine starvation.
Fixes #7126.
Keith Randall [Fri, 17 Jan 2014 22:48:45 +0000 (14:48 -0800)]
runtime, cmd/gc: Get rid of vararg channel calls.
Vararg C calls present a problem for the GC because the
argument types are not derivable from the signature. Remove
them by passing pointers to channel elements instead of the
channel elements directly.
The compiler change is an ugly hack.
We can do better.
««« original CL description
syscall: mark arguments to Syscall as noescape
Heap arguments to "async" syscalls will break when/if we have moving GC anyway.
With this change is must not break until moving GC, because a user must
reference the object in Go to preserve liveness. Otherwise the code is broken already.
Reduces number of leaked params from 125 to 36 on linux.
Rob Pike [Fri, 17 Jan 2014 21:19:00 +0000 (13:19 -0800)]
syscall: allocate 64 bits of "basep" for Getdirentries
Recent crashes on 386 Darwin appear to be caused by this system call
smashing the stack. Phenomenology shows that allocating more data
here addresses the probem.
The guess is that since the actual system call is getdirentries64, 64 is
what we should allocate.
Dmitriy Vyukov [Fri, 17 Jan 2014 16:18:37 +0000 (20:18 +0400)]
syscall: mark arguments to Syscall as noescape
Heap arguments to "async" syscalls will break when/if we have moving GC anyway.
With this change is must not break until moving GC, because a user must
reference the object in Go to preserve liveness. Otherwise the code is broken already.
Reduces number of leaked params from 125 to 36 on linux.
Luke Curley [Fri, 17 Jan 2014 16:07:04 +0000 (11:07 -0500)]
crypto/cipher: improved cbc performance
decrypt: reduced the number of copy calls from 2n to 1.
encrypt: reduced the number of copy calls from n to 1.
Encryption is straight-forward: use dst instead of tmp when
xoring the block with the iv.
Decryption now loops backwards through the blocks abusing the
fact that the previous block's ciphertext (src) is the iv. This
means we don't need to copy the iv every time, in addition to
using dst instead of tmp like encryption.
Rob Pike [Thu, 16 Jan 2014 23:57:32 +0000 (15:57 -0800)]
net/rpc: fix inconsistency in documentation of Service.Register
Falsely claimed an old, no longer true condition that the first argument
must be a pointer.
Fixes #6697
Keith Randall [Thu, 16 Jan 2014 21:35:29 +0000 (13:35 -0800)]
reflect: Remove imprecise techniques from channel/select operations.
Reflect used to communicate to the runtime using interface words,
which is bad for precise GC because sometimes iwords hold a pointer
and sometimes they don't. This change rewrites channel and select
operations to always pass pointers to the runtime.
reflect.Select gets somewhat more expensive, as we now do an allocation
per receive case instead of one allocation whose size is the max of
all the received types. This seems unavoidable to get preciseness
(unless we move the allocation into selectgo, which is a much bigger
change).
Brad Fitzpatrick [Thu, 16 Jan 2014 19:43:52 +0000 (11:43 -0800)]
net/http: don't allow Content-Type or body on 204 and 1xx
Status codes 204, 304, and 1xx don't allow bodies. We already
had a function for this, but we were hard-coding just 304
(StatusNotModified) in a few places. Use the function
instead, and flesh out tests for all codes.
Fixes #6685
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/53290044
Rob Pike [Thu, 16 Jan 2014 17:48:23 +0000 (09:48 -0800)]
fmt: fix bug printing large zero-padded hexadecimal
We forgot to include the width of "0x" when computing the crossover
from internal buffer to allocated buffer.
Also add a helper function to the test for formatting large zero-padded
test strings.
This CL makes the bitmaps a little more precise about variables
that have their address taken but for which the address does not
escape to the heap, so that the variables are kept in the stack frame
rather than allocated on the heap.
The code before this CL handled these variables by treating every
return statement as using every such variable and depending on
liveness analysis to essentially treat the variable as live during the
entire function. That approach has false positives and (worse) false
negatives. That is, it's both sloppy and buggy:
func f(b1, b2 bool) { // x live here! (sloppy)
if b2 {
print(0) // x live here! (sloppy)
return
}
var z **int
x := new(int)
*x = 42
z = &x
print(**z) // x live here (conservative)
if b2 {
print(1) // x live here (conservative)
return
}
for {
print(**z) // x not live here (buggy)
}
}
The first two liveness annotations (marked sloppy) are clearly
wrong: x cannot be live if it has not yet been declared.
The last liveness annotation (marked buggy) is also wrong:
x is live here as *z, but because there is no return statement
reachable from this point in the code, the analysis treats x as dead.
This CL changes the liveness calculation to mark such variables
live exactly at points in the code reachable from the variable
declaration. This keeps the conservative decisions but fixes
the sloppy and buggy ones.
The CL also detects ambiguously live variables, those that are
being marked live but may not actually have been initialized,
such as in this example:
func f(b1 bool) {
var z **int
if b1 {
x := new(int)
*x = 42
z = &x
} else {
y := new(int)
*y = 54
z = &y
}
print(**z) // x, y live here (conservative)
}
Since the print statement is reachable from the declaration of x,
x must conservatively be marked live. The same goes for y.
Although both x and y are marked live at the print statement,
clearly only one of them has been initialized. They are both
"ambiguously live".
These ambiguously live variables cause problems for garbage
collection: the collector cannot ignore them but also cannot
depend on them to be initialized to valid pointer values.
Ambiguously live variables do not come up too often in real code,
but recent changes to the way map and interface runtime functions
are invoked has created a large number of ambiguously live
compiler-generated temporary variables. The next CL will adjust
the analysis to understand these temporaries better, to make
ambiguously live variables fairly rare.
Once ambiguously live variables are rare enough, another CL will
introduce code at the beginning of a function to zero those
slots on the stack. At that point the garbage collector and the
stack copying routines will be able to depend on the guarantee that
if a slot is marked as live in a liveness bitmap, it is initialized.
Dmitriy Vyukov [Thu, 16 Jan 2014 08:17:00 +0000 (12:17 +0400)]
runtime: use lock-free ring for work queues
Use lock-free fixed-size ring for work queues
instead of an unbounded mutex-protected array.
The ring has single producer and multiple consumers.
If the ring overflows, work is put onto global queue.
benchmark old ns/op new ns/op delta
BenchmarkMatmult 7 5 -18.12%
BenchmarkMatmult-4 2 2 -18.98%
BenchmarkMatmult-16 1 0 -12.84%
Keith Randall [Wed, 15 Jan 2014 21:56:59 +0000 (13:56 -0800)]
reflect: add precise GC info for Call argument frame.
Give proper types to the argument/return areas
allocated for reflect calls. Avoid use of iword to
manipulate receivers, which may or may not be pointers.
Rob Pike [Wed, 15 Jan 2014 17:13:52 +0000 (09:13 -0800)]
cmd/pack: rewrite in Go
Replace the pack command, a C program, with a clean reimplementation in Go.
It does not need to reproduce the full feature set and it is no longer used by
the build chain, but has a role in looking inside archives created by the build
chain directly.
Since it's not in C, it is no longer build by dist, so remove it from cmd/dist and
make it a "tool" in cmd/go terminology.
Michael Kelly [Tue, 14 Jan 2014 20:55:12 +0000 (12:55 -0800)]
net/http: escape contents of the directory indexes generated by FileServer
Previously, filenames containing special characters could:
1) Escape the <a> tag, with a file called something like: ">foo
2) Break the links in the index by prematurely ending the path portion
of the url, with a file called: foo?bar
In order to avoid a forbidden dependency on the html package, I'm
using htmlReplacer from net/http/server.go, which is equivalent to
html.EscapeString.
This change also expands fakeFile.Readdir to better emulate
os.File.Readdir.
runtime: change map iteration randomization to use intra-bucket offset
Map iteration previously started from a random bucket, but walked each
bucket from the beginning. Now, iteration always starts from the first
bucket and walks each bucket starting at a random offset. For
performance, the random offset is selected at the start of iteration
and reused for each bucket.
Iteration over a map with 8 or fewer elements--a single bucket--will
now be non-deterministic. There will now be only 8 different possible
map iterations.
Significant benchmark changes, on my OS X laptop (rough but consistent):
benchmark old ns/op new ns/op delta
BenchmarkMapIter 128 121 -5.47%
BenchmarkMapIterEmpty 4.26 4.45 +4.46%
BenchmarkNewEmptyMap 114 111 -2.63%
Brad Fitzpatrick [Tue, 14 Jan 2014 19:10:56 +0000 (11:10 -0800)]
cmd/gofmt: remove -tabwidth and -tabs flags
Having these flags misleads people into thinking they're acceptable
for code that "must be gofmt'd".
If an organization wishes to use gofmt internally with
different settings, they can fork gofmt trivially. But "gofmt"
as used by the community with open source Go code should not
support these old knobs.
Brad Fitzpatrick [Tue, 14 Jan 2014 17:46:40 +0000 (09:46 -0800)]
net/http: fix another data race when sharing Request.Body
Fix another issue (similar to Issue 6995) where there was a
data race when sharing a server handler's Request.Body with
another goroutine that out-lived the Handler's goroutine.
In some cases we were not closing the incoming Request.Body
(which would've required reading it until the end) if we
thought it we thought we were going to be forcibly closing the
underlying net.Conn later anyway. But that optimization
largely moved to the transfer.go *body later, and locking was
added to *body which then detected read-after-close, so now
calling the (*body).Close always is both cheap and correct.
No new test because TestTransportAndServerSharedBodyRace caught it,
albeit only sometimes. Running:
while ./http.test -test.cpu=8 -test.run=TestTransportAndServerSharedBodyRace; do true; done
... would reliably cause a race before, but not now.
Russ Cox [Tue, 14 Jan 2014 15:43:13 +0000 (10:43 -0500)]
cmd/gc: return canonical Node* from temp
For historical reasons, temp was returning a copy
of the created Node*, not the original Node*.
This meant that if analysis recorded information in the
returned node (for example, n->addrtaken = 1), the
analysis would not show up on the original Node*, the
one kept in fn->dcl and consulted during liveness
bitmap creation.
Correct this, and watch for it when setting addrtaken.
Russ Cox [Tue, 14 Jan 2014 04:07:57 +0000 (23:07 -0500)]
cmd/link: implement and test automatic symbols
Related changes included in this CL:
- Add explicit start symbol to Prog.
- Add omitRuntime bool to Prog.
- Introduce p.Packages[""] to hold automatic symbols
- Add SymOrder to Prog to preserve symbol order.
- Add layout test (and fix bug that was putting everything in text section).
Russ Cox [Tue, 14 Jan 2014 04:07:40 +0000 (23:07 -0500)]
cmd/link: replace golden binary files with hex dumps
The hex dumps will diff better, and I hope they will avoid
a repeat of http://bugs.debian.org/716853.
The CL will probably show the testdata diffs as "binary",
but in fact the binary versions are being replaced by
textual hex dumps (output of hexdump -C).
Robert Griesemer [Mon, 13 Jan 2014 19:10:45 +0000 (11:10 -0800)]
go/scanner: minimal non-terminated literals
Consume as little as possible input when encountering
non-terminated rune, string, and raw string literals.
The old code consumed at least one extra character
which could lead to worse error recovery when parsing
erroneous sources.
Also made error messages in those cases more consistent.
Joel Sing [Mon, 13 Jan 2014 00:24:56 +0000 (11:24 +1100)]
syscall: remove getsockname workaround for openbsd
Remove the getsockname workaround for unix domain sockets on OpenBSD.
This was fixed in OpenBSD 5.2 and we now have a minimum requirement
for OpenBSD 5.4-current.
Brad Fitzpatrick [Fri, 10 Jan 2014 20:19:36 +0000 (12:19 -0800)]
database/sql: avoiding fmt.Sprintf while scanning, avoid allocs with RawBytes
A user reported heavy contention on fmt's printer cache. Avoid
fmt.Sprint. We have to do reflection anyway, and there was
already an asString function to use strconv, so use it.
This CL also eliminates a redundant allocation + copy when
scanning into *[]byte (avoiding the intermediate string)
and avoids an extra alloc when assigning to a caller's RawBytes
(trying to reuse the caller's memory).
Brad Fitzpatrick [Thu, 9 Jan 2014 23:05:09 +0000 (15:05 -0800)]
net/http: use TCP keep-alives for ListenAndServe and ListenAndServeTLS
Our default behavior for the common cases shouldn't lead to
leaked TCP connections (e.g. from people closing laptops) when
their Go servers are exposed to the open Internet without a
proxy in front.
Too many users on golang-nuts have learned this the hard way.
No API change. Only ListenAndServe and ListenAndServeTLS are
updated.
Ian Lance Taylor [Thu, 9 Jan 2014 23:00:00 +0000 (15:00 -0800)]
runtime: fix 32-bit malloc for pointers >= 0x80000000
The spans array is allocated in runtime·mallocinit. On a
32-bit system the number of entries in the spans array is
MaxArena32 / PageSize, which (2U << 30) / (1 << 12) == (1 << 19).
So we are allocating an array that can hold 19 bits for an
index that can hold 20 bits. According to the comment in the
function, this is intentional: we only allocate enough spans
(and bitmaps) for a 2G arena, because allocating more would
probably be wasteful.
But since the span index is simply the upper 20 bits of the
memory address, this scheme only works if memory addresses are
limited to the low 2G of memory. That would be OK if we were
careful to enforce it, but we're not. What we are careful to
enforce, in functions like runtime·MHeap_SysAlloc, is that we
always return addresses between the heap's arena_start and
arena_start + MaxArena32.
We generally get away with it because we start allocating just
after the program end, so we only run into trouble with
programs that allocate a lot of memory, enough to get past
address 0x80000000.
This changes the code that computes a span index to subtract
arena_start on 32-bit systems just as we currently do on
64-bit systems.
Adam Langley [Thu, 9 Jan 2014 18:38:11 +0000 (13:38 -0500)]
crypto/tls: support renegotiation extension.
The renegotiation extension was introduced[1] due to an attack by Ray in
which a client's handshake was spliced into a connection that was
renegotiating, thus giving an attacker the ability to inject an
arbitary prefix into the connection.
Go has never supported renegotiation as a server and so this attack
doesn't apply. As a client, it's possible that at some point in the
future the population of servers will be sufficiently updated that
it'll be possible to reject connections where the server hasn't
demonstrated that it has been updated to address this problem.
We're not at that point yet, but it's good for Go servers to support
the extension so that it might be possible to do in the future.