There are currently several schemes for acquiring a TLS
slot to save the g register. None of them appear to work
for android. The closest are linux and darwin.
Linux uses a linker TLS relocation. This is not supported
by the android linker.
Darwin uses a fixed offset, and calls pthread_key_create
until it gets the slot it wants. As the runtime loads
late in the android process lifecycle, after an
arbitrary number of other libraries, we cannot rely on
any particular slot being available.
So we call pthread_key_create, take the first slot we are
given, and put it in runtime.tlsg, which we turn into a
regular variable in cmd/ld.
runtime: delete unnecessary confusing code
The code in GC that handles gp->gobuf.ctxt is wrong,
because it does not mark the ctxt object itself,
if just queues the ctxt object for scanning.
So the ctxt object can be collected as garbage.
However, Gobuf.ctxt is void*, so it's always marked and
scanned through G.
crypto/x509: fix format strings in test
Currently it says:
--- PASS: TestDecrypt-2 (0.11s)
pem_decrypt_test.go:17: test 0. %!s(x509.PEMCipher=1)
--- PASS: TestEncrypt-2 (0.00s)
pem_decrypt_test.go:42: test 0. %!s(x509.PEMCipher=1)
runtime: make runtime·usleep and runtime·osyield callable from cgo callback
runtime·usleep and runtime·osyield fall back to calling an
assembly wrapper for the libc functions in the absence of a m,
so they can be called in cgo callback context.
A temporary 512 bytes buffer is allocated for every call to
readHeader. This buffer isn't returned to the caller and it could
be reused to lower the number of memory allocations.
This CL improves it by using a pool and zeroing out the buffer before
putting it back into the pool.
benchmark old ns/op new ns/op delta
BenchmarkListFiles100k 545249903538832687 -1.18%
benchmark old allocs new allocs delta
BenchmarkListFiles100k 21051672005692 -4.73%
benchmark old bytes new bytes delta
BenchmarkListFiles100k 10590347254831527 -48.22%
This improvement is very important if your code has to deal with a lot
of tarballs which contain a lot of files.
Adam Langley [Wed, 2 Jul 2014 22:28:57 +0000 (15:28 -0700)]
crypto/rsa: fix out-of-bound access with short session keys.
Thanks to Cedric Staub for noting that a short session key would lead
to an out-of-bounds access when conditionally copying the too short
buffer over the random session key.
2. Add /*c2go */ comment giving effect of #define.
This is necessary for function-like #defines and
non-enum-able #defined constants.
(Not all compilers handle negative or large enums.)
3. Add extra braces in struct initializer.
(c2go does not implement the full rules.)
This is enough to let c2go typecheck the source tree.
There may be more changes once it is doing
other semantic analyses.
liblink, runtime: preliminary support for plan9/amd64
A TLS slot is reserved by _rt0_.*_plan9 as an automatic and
its address (which is static on Plan 9) is saved in the
global _privates symbol. The startup linkage now is exactly
like that from Plan 9 libc, and the way we access g is
exactly as if we'd have used privalloc(2).
Aside from making the code more standard, this change
drastically simplifies it, both for 386 and for amd64, and
makes the Plan 9 code in liblink common for both 386 and
amd64.
The amd64 runtime code was cleared of nxm assumptions, and
now runs on the standard Plan 9 kernel.
runtime: properly restore registers in Solaris runtime·sigtramp
We restored registers correctly in the usual case where the thread
is a Go-managed thread and called runtime·sighandler, but we
failed to do so when runtime·sigtramp was called on a cgo-created
thread. In that case, runtime·sigtramp called runtime·badsignal,
a Go function, and did not restore registers after it returned
LGTM=rsc, dave
R=rsc, dave
CC=golang-codereviews, minux.ma
https://golang.org/cl/105280050
Shenghou Ma [Tue, 1 Jul 2014 22:24:43 +0000 (18:24 -0400)]
misc/nacl, syscall: lazily initialize fs on nacl.
On amd64, the real time is reduced from 176.76s to 140.26s.
On ARM, the real time is reduced from 921.61s to 726.30s.
David Crawshaw [Tue, 1 Jul 2014 21:21:50 +0000 (17:21 -0400)]
all: add GOOS=android
As android and linux have significant overlap, and
because build tags are a poor way to represent an
OS target, this CL introduces an exception into
go/build: linux is treated as a synonym for android
when matching files.
Move decAlloc calls a bit higher in the call tree.
Cleans code marginally, improves speed marginally.
The benchmarks are noisy but the median time from
20 consective 1-second runs improves by about 2%.
Rob Pike [Tue, 1 Jul 2014 16:21:25 +0000 (09:21 -0700)]
misc: delete editor and shell support
We are not the right people to support editor plugins, and the profusion
of editors in this CL demonstrates the unreality of pretending to do so.
People are free to create and advertise their own repos with support.
For discussion: https://groups.google.com/forum/#!topic/golang-dev/SA7fD470FxU
««« original CL description
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Keith Randall [Tue, 1 Jul 2014 01:59:24 +0000 (18:59 -0700)]
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Rob Pike [Mon, 30 Jun 2014 22:47:11 +0000 (15:47 -0700)]
encoding/gob: simplify allocation in decode.
The old code's structure needed to track indirections because of the
use of unsafe. That is no longer necessary, so we can remove all
that tracking. The code cleans up considerably but is a little slower.
We may be able to recover that performance drop. I believe the
code quality improvement is worthwhile regardless.
Rob Pike [Mon, 30 Jun 2014 18:06:47 +0000 (11:06 -0700)]
encoding/gob: remove unsafe, use reflection.
This removes a major unsafe thorn in our side, a perennial obstacle
to clean garbage collection.
Not coincidentally: In cleaning this up, several bugs were found,
including code that reached inside by-value interfaces to create
pointers for pointer-receiver methods. Unsafe code is just as
advertised.
Performance of course suffers, but not too badly. The Pipe number
is more indicative, since it's doing I/O that simulates a network
connection. Plus these are end-to-end, so each end suffers
only half of this pain.
The edit is pretty much a line-by-line conversion, with a few
simplifications and a couple of new tests. There may be more
performance to gain.
David Symonds [Sat, 28 Jun 2014 10:47:06 +0000 (20:47 +1000)]
flag: add a little more doc comment to Duration.
The only text that describes the accepted format is in the package doc,
which is far away from these functions. The other flag types don't need
this explicitness because they are more obvious.
Dmitriy Vyukov [Sat, 28 Jun 2014 01:19:02 +0000 (18:19 -0700)]
runtime: make garbage collector faster by deleting code again
Remove GC bitmap backward scanning.
This was already done once in https://golang.org/cl/5530074/
Still makes GC a bit faster.
On the garbage benchmark, before:
gc-pause-one=237345195
gc-pause-total=4746903
cputime=32427775
time=32458208
after:
gc-pause-one=235484019
gc-pause-total=4709680
cputime=31861965
time=31877772
Also prepares mgc0.c for future changes.
Dmitriy Vyukov [Thu, 26 Jun 2014 18:40:48 +0000 (11:40 -0700)]
runtime: say when a goroutine is locked to OS thread
Say when a goroutine is locked to OS thread in crash reports
and goroutine profiles.
It can be useful to understand what goroutines consume OS threads
(syscall and locked), e.g. if you forget to call UnlockOSThread
or leak locked goroutines.
Russ Cox [Thu, 26 Jun 2014 15:54:39 +0000 (11:54 -0400)]
all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
Dmitriy Vyukov [Wed, 25 Jun 2014 03:37:28 +0000 (20:37 -0700)]
index/suffixarray: reduce size of a benchmark
A single iteration of BenchmarkSaveRestore runs for 5 seconds
on my freebsd machine. 5 seconds looks like too long for a single
iteration.
This is the only benchmark that times out on freebsd-amd64-race builder.
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/107340044
Robert Griesemer [Tue, 24 Jun 2014 23:25:09 +0000 (16:25 -0700)]
spec: receiver declaration is just a parameter declaration
This CL removes the special syntax for method receivers and
makes it just like other parameters. Instead, the crucial
receiver-specific rules (exactly one receiver, receiver type
must be of the form T or *T) are specified verbally instead
of syntactically.
This is a fully backward-compatible (and minor) syntax
relaxation. As a result, the following syntactic restrictions
(which are completely irrelevant) and which were only in place
for receivers are removed:
a) receiver types cannot be parenthesized
b) receiver parameter lists cannot have a trailing comma
The result of this CL is a simplication of the spec and the
implementation, with no impact on existing (or future) code.
Noteworthy:
- gc already permits a trailing comma at the end of a receiver
declaration:
func (recv T,) m() {}
This is technically a bug with the current spec; this CL will
legalize this notation.
- gccgo produces a misleading error when a trailing comma is used:
error: method has multiple receivers
(even though there's only one receiver)
- Compilers and type-checkers won't need to report errors anymore
if receiver types are parenthesized.
Fixes #4496.
LGTM=iant, rsc
R=r, rsc, iant, ken
CC=golang-codereviews
https://golang.org/cl/101500044
The number of estimated iterations required to reach the benchtime is multiplied by a safety margin (to avoid falling just short) and then rounded up to a readable number. With an accurate estimate, in the worse case, the resulting number of iterations could be 3.75x more than necessary: 1.5x for safety * 2.5x to round up (e.g. from 2eX+1 to 5eX).
This CL reduces the safety margin to 1.2x. Experimentation showed a diminishing margin of return past 1.2x, although the average case continued to show improvements down to 1.05x.
This CL also reduces the maximum round-up multiplier from 2.5x (from 2eX+1 to 5eX) to 2x, by allowing the number of iterations to be of the form 3eX.
Both changes improve benchmark wall clock times, and the effects are cumulative.
From 1.5x to 1.2x safety margin:
package old s new s delta
bytes 163 125 -23%
encoding/json 27 21 -22%
net/http 42 36 -14%
runtime 463 418 -10%
strings 82 65 -21%
Allowing 3eX iterations:
package old s new s delta
bytes 163 134 -18%
encoding/json 27 23 -15%
net/http 42 36 -14%
runtime 463 422 -9%
strings 82 72 -12%
Combined:
package old s new s delta
bytes 163 112 -31%
encoding/json 27 20 -26%
net/http 42 30 -29%
runtime 463 346 -25%
strings 82 60 -27%
Rui Ueyama [Mon, 23 Jun 2014 19:06:26 +0000 (12:06 -0700)]
runtime: speed up amd64 memmove
MOV with SSE registers seems faster than REP MOVSQ if the
size being copied is less than about 2K. Previously we
didn't use MOV if the memory region is larger than 256
byte. This patch improves the performance of 257 ~ 2048
byte non-overlapping copy by using MOV.
Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem).
Dmitriy Vyukov [Fri, 20 Jun 2014 23:36:21 +0000 (16:36 -0700)]
runtime/race: update runtime to tip
This requires minimal changes to the runtime hooks. In particular,
synchronization events must be done only on valid addresses now,
so I've added the additional checks to race.c.
Dmitriy Vyukov [Fri, 20 Jun 2014 05:04:10 +0000 (22:04 -0700)]
runtime: remove obsolete afterprologue check
Afterprologue check was required when did not know
about return arguments of functions and/or they were not zeroed.
Now 100% precision is required for stacks due to stack copying,
so it must work w/o afterprologue one way or another.
I can limit this change for 1.3 to merely adding a TODO,
but this check is super confusing so I don't want this knowledge to get lost.
Nigel Tao [Thu, 19 Jun 2014 01:39:03 +0000 (11:39 +1000)]
image/jpeg: use a look-up table to speed up Huffman decoding. This
requires a decoder to do its own byte buffering instead of using
bufio.Reader, due to byte stuffing.
benchmark old MB/s new MB/s speedup
BenchmarkDecodeBaseline 33.40 50.65 1.52x
BenchmarkDecodeProgressive 24.34 31.92 1.31x
On 6g, unsafe.Sizeof(huffman{}) falls from 4872 to 964 bytes, and
the decoder struct contains 8 of those.