Shenghou Ma [Sat, 17 Nov 2012 18:23:34 +0000 (02:23 +0800)]
crypto/md5: speed up aligned writes and test/bench unaligned writes
Write() can safely use uint32 loads when input is aligned.
Also add test and benchmarks for unaligned writes.
Benchmark result obtained by Dave Cheney on ARMv5TE @ 1.2GHz:
benchmark old ns/op new ns/op delta
BenchmarkHash8Bytes 4104 3417 -16.74%
BenchmarkHash1K 22061 11208 -49.20%
BenchmarkHash8K 146630 65148 -55.57%
BenchmarkHash8BytesUnaligned 4128 3436 -16.76%
BenchmarkHash1KUnaligned 22054 21473 -2.63%
BenchmarkHash8KUnaligned 146658 146909 +0.17%
Robert Griesemer [Fri, 16 Nov 2012 21:17:12 +0000 (13:17 -0800)]
go/printer: leave indentation alone when printing nodes from different files
ASTs may be created by various tools and built from nodes of
different files. An incorrectly constructed AST will likely
not print at all, but a (structurally) correct AST with bad
position information should still print structurally correct.
One heuristic used was to reset indentation when the filename
in the position information of nodes changed. However, this
can lead to wrong indentation for structurally correct ASTs.
Dmitriy Vyukov [Fri, 16 Nov 2012 16:06:11 +0000 (20:06 +0400)]
runtime: hide mheap from race detector
This significantly decreases amount of shadow memory
mapped by race detector.
I haven't tested on Windows, but on Linux it reduces
virtual memory size from 1351m to 330m for fmt.test.
Fixes #4379.
Dmitriy Vyukov [Fri, 16 Nov 2012 08:06:48 +0000 (12:06 +0400)]
syscall: fix data races in LazyDLL/LazyProc
Reincarnation of https://golang.org/cl/6817086 (sent from another account).
It is ugly because sync.Once will cause allocation of a closure.
Fixes #4343.
Marcel van Lohuizen [Thu, 15 Nov 2012 21:23:56 +0000 (22:23 +0100)]
exp/locale/collate: changed implementation of Compare and CompareString to
compare incrementally. Also modified collation API to be more high-level
by removing the need for an explicit buffer to be passed as an argument.
This considerably speeds up Compare and CompareString. This change also eliminates
the need to reinitialize the normalization buffer for each use of an iter. This
also significantly improves performance for Key and KeyString.
Dave Cheney [Thu, 15 Nov 2012 00:40:10 +0000 (11:40 +1100)]
run.{bash,bat,rc}: unset GOMAXPROCS before ../test
test/run.go already executes tests in parallel where
possible. An unknown GOMAXPROCS value during the tests
is known to cause failures with tests that measure
allocations.
Joel Sing [Wed, 14 Nov 2012 16:36:19 +0000 (03:36 +1100)]
debug/elf: fix offset for GNU version symbols
Since we no longer skip the first entry when reading a symbol table,
we no longer need to allow for the offset difference when processing
the GNU version symbols.
Joel Sing [Wed, 14 Nov 2012 15:24:14 +0000 (02:24 +1100)]
debug/elf: do not skip first symbol in the symbol table
Do not skip the first symbol in the symbol table. Any other indexes
into the symbol table (for example, indexes in relocation entries)
will now refer to the symbol following the one that was intended.
Add an object that contains debug relocations, which debug/dwarf
failed to decode correctly. Extend the relocation tests to cover
this case.
Note that the existing tests passed since the symbol following the
symbol that required relocation is also of type STT_SECTION.
Dmitriy Vyukov [Wed, 14 Nov 2012 12:58:10 +0000 (16:58 +0400)]
runtime/race: more precise handling of finalizers
Currently race detector runtime just disables race detection in the finalizer goroutine.
It has false positives when a finalizer writes to shared memory -- the race with finalizer is reported in a normal goroutine that accesses the same memory.
After this change I am going to synchronize the finalizer goroutine with the rest of the world in racefingo(). This is closer to what happens in reality and so
does not have false positives.
And also add README file with instructions how to build the runtime.
Dmitriy Vyukov [Wed, 14 Nov 2012 12:51:23 +0000 (16:51 +0400)]
runtime: add RaceRead/RaceWrite functions
It allows to catch e.g. a data race between atomic write and non-atomic write,
or Mutex.Lock() and mutex overwrite (e.g. mu = Mutex{}).
This is a simplified version of earlier versions of this CL
and now only fixes obviously incorrect things, without
changing the locking on bodyEOFReader.
I'd like to see if this is sufficient before changing the
locking.
Update #4191
R=golang-dev, rsc, dave
CC=golang-dev
https://golang.org/cl/6739055
Russ Cox [Tue, 13 Nov 2012 18:06:29 +0000 (13:06 -0500)]
reflect: add ArrayOf, ChanOf, MapOf, SliceOf
In order to add these, we need to be able to find references
to such types that already exist in the binary. To do that, introduce
a new linker section holding a list of the types corresponding to
arrays, chans, maps, and slices.
To offset the storage cost of this list, and to simplify the code,
remove the interface{} header from the representation of a
runtime type. It was used in early versions of the code but was
made obsolete by the kind field: a switch on kind is more efficient
than a type switch.
In the godoc binary, removing the interface{} header cuts two
words from each of about 10,000 types. Adding back the list of pointers
to array, chan, map, and slice types reintroduces one word for
each of about 500 types. On a 64-bit machine, then, this CL *removes*
a net 156 kB of read-only data from the binary.
This CL does not include the needed support for precise garbage
collection. I have created issue 4375 to track that.
This CL also does not set the 'algorithm' - specifically the equality
and copy functions - for a new array correctly, so I have unexported
ArrayOf for now. That is also part of issue 4375.
Rémy Oudompheng [Tue, 13 Nov 2012 06:39:18 +0000 (07:39 +0100)]
cmd/8g: eliminate obviously useless temps before regopt.
This patch introduces a sort of pre-regopt peephole optimization.
When a temporary is introduced that just holds a value for the
duration of the next instruction and is otherwise unused, we
elide it to make the job of regopt easier.
Since x86 has very few registers, this situation happens very
often. The result is large savings in stack variables for
arithmetic-heavy functions.
crypto/aes
benchmark old ns/op new ns/op delta
BenchmarkEncrypt 1301 392 -69.87%
BenchmarkDecrypt 1309 368 -71.89%
BenchmarkExpand 2913 1036 -64.44%
benchmark old MB/s new MB/s speedup
BenchmarkEncrypt 12.29 40.74 3.31x
BenchmarkDecrypt 12.21 43.37 3.55x
crypto/md5
benchmark old ns/op new ns/op delta
BenchmarkHash8Bytes 1761 914 -48.10%
BenchmarkHash1K 16912 5570 -67.06%
BenchmarkHash8K 123895 38286 -69.10%
benchmark old MB/s new MB/s speedup
BenchmarkHash8Bytes 4.54 8.75 1.93x
BenchmarkHash1K 60.55 183.83 3.04x
BenchmarkHash8K 66.12 213.97 3.24x
Mikio Hara [Tue, 13 Nov 2012 03:56:28 +0000 (12:56 +0900)]
net: protocol specific listen functions return a proper local socket address
When a nil listener address is passed to some protocol specific
listen function, it will create an unnamed, unbound socket because
of the nil listener address. Other listener functions may return
invalid address error.
This CL allows to pass a nil listener address to all protocol
specific listen functions to fix above inconsistency. Also make it
possible to return a proper local socket address in case of a nil
listner address.
Mikio Hara [Tue, 13 Nov 2012 03:26:20 +0000 (12:26 +0900)]
net: make LocalAddr on multicast UDPConn return a listening address
The package go.net/ipv4 allows to exist a single UDP listener
that join multiple different group addresses. That means that
LocalAddr on multicast UDPConn returns a first joined group
address is not desirable.
Brad Fitzpatrick [Mon, 12 Nov 2012 23:20:18 +0000 (15:20 -0800)]
net/http: handle 413 responses more robustly
When HTTP bodies were too large and we didn't want to finish
reading them for DoS reasons, we previously found it necessary
to send a FIN and then pause before closing the connection
(which might send a RST) if we wanted the client to have a
better chance at receiving our error response. That was Issue 3595.
This issue adds the same fix to request headers which
are too large, which might fix the Windows flakiness
we observed on TestRequestLimit at:
http://build.golang.org/log/146a2a7d9b24441dc14602a1293918191d4e75f1
Rémy Oudompheng [Mon, 12 Nov 2012 22:56:11 +0000 (23:56 +0100)]
cmd/5g, cmd/6g: pass the full torture test.
The patch adds more cases to agenr to allocate registers later,
and makes 6g generate addresses for sgen in something else than
SI and DI. It avoids a complex save/restore sequence that
amounts to allocate a register before descending in subtrees.
David Symonds [Mon, 12 Nov 2012 22:08:33 +0000 (09:08 +1100)]
exp/types: avoid init race in check_test.go.
There was an init race between
check_test.go:init
universe.go:def
use of Universe
and
universe.go:init
creation of Universe
The order in which init funcs are executed in a package is unspecified.
The test is not currently broken in the golang.org environment
because the go tool compiles the test with non-test sources before test sources,
but other environments may, say, sort the source files before compiling,
and thus trigger this race, causing a nil pointer panic.
Roger Peppe [Mon, 12 Nov 2012 15:31:23 +0000 (15:31 +0000)]
crypto/x509: implement EncryptPEMBlock
Arbitrary decisions: order of the arguments and the
fact it takes a block-type argument (rather than
leaving to user to fill it in later); I'm happy whatever
colour we want to paint it.
We also change DecryptPEMBlock so that it won't
panic when the IV has the wrong size.
Dmitriy Vyukov [Fri, 9 Nov 2012 10:00:41 +0000 (14:00 +0400)]
cmd/go: fix selection of packages for testing
Currently it works incorrectly if user specifies own build tags
and with race detection (e.g. runtime/race is not selected,
because it contains only test files with +build race).
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/6814107
Shenghou Ma [Fri, 9 Nov 2012 06:19:07 +0000 (14:19 +0800)]
runtime: use vDSO clock_gettime for time.now & runtime.nanotime on Linux/amd64
Performance improvement aside, time.Now() now gets real nanosecond resolution
on supported systems.
Benchmark done on Core i7-2600 @ 3.40GHz with kernel 3.5.2-gentoo.
original vDSO gettimeofday:
BenchmarkNow 100000000 27.4 ns/op
new vDSO gettimeofday fallback:
BenchmarkNow 100000000 27.6 ns/op
new vDSO clock_gettime:
BenchmarkNow 100000000 24.4 ns/op
Russ Cox [Wed, 7 Nov 2012 20:15:21 +0000 (15:15 -0500)]
cmd/gc: fix escape analysis bug
The code assumed that the only choices were EscNone, EscScope, and EscHeap,
so that it makes sense to set EscScope only if the current setting is EscNone.
Now that we have the many variants of EscReturn, this logic is false, and it was
causing important EscScopes to be ignored in favor of EscReturn.
Dmitriy Vyukov [Wed, 7 Nov 2012 19:59:09 +0000 (23:59 +0400)]
runtime/race: add Windows support
This is copy of https://golang.org/cl/6810080
but sent from another account (dvyukov@gmail.com is not in CONTRIBUTORS).
Roger Peppe [Wed, 7 Nov 2012 15:16:34 +0000 (15:16 +0000)]
crypto/x509: fix DecryptPEMBlock
The current implement can fail when the
block size is not a multiple of 8 bytes.
This CL makes it work, and also checks that the
data is in fact a multiple of the block size.
Russ Cox [Wed, 7 Nov 2012 14:59:19 +0000 (09:59 -0500)]
cmd/gc: annotate local variables with unique ids for inlining
Avoids problems with local declarations shadowing other names.
We write a more explicit form than the incoming program, so there
may be additional type annotations. For example:
int := "hello"
j := 2
would normally turn into
var int string = "hello"
var j int = 2
but the int variable shadows the int type in the second line.
This CL marks all local variables with a per-function sequence number,
so that this would instead be:
Dmitriy Vyukov [Wed, 7 Nov 2012 08:48:58 +0000 (12:48 +0400)]
runtime/race: lazily allocate shadow memory
Currently race detector runtime maps shadow memory eagerly at process startup.
It works poorly on Windows, because Windows requires reservation in swap file
(especially problematic if several Go program runs at the same, each consuming GBs
of memory).
With this change race detector maps shadow memory lazily, so Go runtime must notify
about all new heap memory.
It will help with Windows port, but also eliminates scary 16TB virtual mememory
consumption in top output (which sometimes confuses some monitoring scripts).
Dmitriy Vyukov [Wed, 7 Nov 2012 08:06:27 +0000 (12:06 +0400)]
cmd/gc: racewalk: do not double function calls
Current racewalk transformation looks like:
x := <-makeChan().c
\/\/\/\/\/\/\/\/\/
runtime.raceread(&makeChan().c)
x := <-makeChan().c
and so makeChan() is called twice.
With this CL the transformation looks like:
x := <-makeChan().c
\/\/\/\/\/\/\/\/\/
chan *tmp = &(makeChan().c)
raceread(&*tmp)
x := <-(*tmp)
Fixes #4245.