Dmitriy Vyukov [Tue, 19 Aug 2014 07:46:05 +0000 (11:46 +0400)]
runtime: fix memstats
Newly allocated memory is subtracted from inuse, while it was never added to inuse.
Span leftovers are subtracted from both inuse and idle,
while they were never added.
Fixes #8544.
Fixes #8430.
Russ Cox [Tue, 19 Aug 2014 01:13:11 +0000 (21:13 -0400)]
cmd/gc, runtime: refactor interface inlining decision into compiler
We need to change the interface value representation for
concurrent garbage collection, so that there is no ambiguity
about whether the data word holds a pointer or scalar.
This CL does NOT make any representation changes.
Instead, it removes representation assumptions from
various pieces of code throughout the tree.
The isdirectiface function in cmd/gc/subr.c is now
the only place that decides that policy.
The policy propagates out from there in the reflect
metadata, as a new flag in the internal kind value.
A follow-up CL will change the representation by
changing the isdirectiface function. If that CL causes
problems, it will be easy to roll back.
Update #8405.
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, r
https://golang.org/cl/129090043
Jeff R. Allen [Mon, 18 Aug 2014 21:41:28 +0000 (14:41 -0700)]
bzip2: improve performance
Improve performance of move-to-front by using cache-friendly
copies instead of doubly-linked list. Simplify so that the
underlying slice is the object. Remove the n=0 special case,
which was actually slower with the copy approach.
Dmitriy Vyukov [Mon, 18 Aug 2014 12:52:31 +0000 (16:52 +0400)]
runtime: implement transfer cache
Currently we do the following dance after sweeping a span:
1. lock mcentral
2. remove the span from a list
3. unlock mcentral
4. unmark span
5. lock mheap
6. insert the span into heap
7. unlock mheap
8. lock mcentral
9. observe empty list
10. unlock mcentral
11. lock mheap
12. grab the span
13. unlock mheap
14. mark span
15. lock mcentral
16. insert the span into empty list
17. unlock mcentral
This change short-circuits this sequence to nothing,
that is, we just cache and use the span after sweeping.
This gives us functionality similar (even better) to tcmalloc's transfer cache.
benchmark old ns/op new ns/op delta
BenchmarkMalloc8 22.2 19.5 -12.16%
BenchmarkMalloc16 31.0 26.6 -14.19%
Dmitriy Vyukov [Mon, 18 Aug 2014 12:33:39 +0000 (16:33 +0400)]
runtime: don't acquirem on malloc fast path
Mallocgc must be atomic wrt GC, but for performance reasons
don't acquirem/releasem on fast path. The code does not have
split stack checks, so it can't be preempted by GC.
Functions like roundup/add are inlined. And onM/racemalloc are nosplit.
Also add debug code that checks these assumptions.
Dmitriy Vyukov [Sat, 16 Aug 2014 05:07:55 +0000 (09:07 +0400)]
runtime: mark with non-atomic operations when GOMAXPROCS=1
Perf builders show 3-5% GC pause increase with GOMAXPROCS=1 when marking with atomic ops:
http://goperfd.appspot.com/perfdetail?commit=a8a6e765d6a87f7ccb71fd85a60eb5a821151f85&commit0=3b864e02b987171e05e2e9d0840b85b5b6476386&kind=builder&builder=linux-amd64&benchmark=http
Henning Schmiedehausen [Fri, 15 Aug 2014 22:19:02 +0000 (15:19 -0700)]
cmd/dist: goc2c ignores GOROOT_FINAL
When building golang, the environment variable GOROOT_FINAL can be set
to indicate a different installation location from the build
location. This works fine, except that the goc2c build step embeds
line numbers in the resulting c source files that refer to the build
location, no the install location.
This would not be a big deal, except that in turn the linker uses the
location of runtime/string.goc to embed the gdb script in the
resulting binary and as a net result, the debugger now complains that
the script is outside its load path (it has the install location
configured).
See https://code.google.com/p/go/issues/detail?id=8524 for the full
description.
Dmitriy Vyukov [Fri, 15 Aug 2014 18:36:12 +0000 (22:36 +0400)]
runtime: fix getgcmask
bv.data is an array of uint32s but the code was using
offsets computed for an array of bytes.
Add a test for stack GC info.
Fixes #8531.
Dmitriy Vyukov [Thu, 14 Aug 2014 17:38:24 +0000 (21:38 +0400)]
runtime: mark objects with non-atomic operations
On the go.benchmarks/garbage benchmark with GOMAXPROCS=16:
old ns/op new ns/op delta
time 13922541353170 -2.81%
cputime 2199575121373999 -2.83%
gc-pause-one 1504481213050524 -13.26%
gc-pause-total 213636 185317 -13.26%
Matthew Dempsky [Wed, 13 Aug 2014 18:16:30 +0000 (11:16 -0700)]
cmd/cgo, debug/dwarf: fix translation of zero-size arrays
In cgo, now that recursive calls to typeConv.Type() always work,
we can more robustly calculate the array sizes based on the size
of our element type.
Also, in debug/dwarf, the decision to call zeroType is made
based on a type's usage within a particular struct, but dwarf.Type
values are cached in typeCache, so the modification might affect
uses of the type in other structs. Current compilers don't appear
to share DWARF type entries for "[]foo" and "[0]foo", but they also
don't consistently share type entries in other cases. Arguably
modifying the types is an improvement in some cases, but varying
translated types according to compiler whims seems like a bad idea.
Lastly, also in debug/dwarf, zeroType only needs to rewrite the
top-level dimension, and only if the rest of the array size is
non-zero.
Dmitriy Vyukov [Wed, 13 Aug 2014 16:42:55 +0000 (20:42 +0400)]
runtime: keep objects in free lists marked as allocated.
Restore https://golang.org/cl/41040043 after GC rewrite.
Original description:
On the plus side, we don't need to change the bits on malloc and free.
On the downside, we need to mark objects in the free lists during GC.
But the free lists are small at GC time, so it should be a net win.
benchmark old ns/op new ns/op delta
BenchmarkMalloc8 21.9 20.4 -6.85%
BenchmarkMalloc16 31.1 29.6 -4.82%
Rob Pike [Wed, 13 Aug 2014 00:04:45 +0000 (17:04 -0700)]
all: copy cmd/ld/textflag.h into pkg/GOOS_GOARCH
The file is used by assembly code to define symbols like NOSPLIT.
Having it hidden inside the cmd directory makes it hard to access
outside the standard repository.
Solution: As with a couple of other files used by cgo, copy the
file into the pkg directory and add a -I argument to the assembler
to access it. Thus one can write just
#include "textflag.h"
in .s files.
The names in runtime are not updated because in the boot sequence the
file has not been copied yet when runtime is built. All other .s files
in the repository are updated.
Changes to doc/asm.html, src/cmd/dist/build.c, and src/cmd/go/build.go
are hand-made. The rest are just the renaming done by a global
substitution. (Yay sam).
Rob Pike [Tue, 12 Aug 2014 22:45:35 +0000 (15:45 -0700)]
doc/compat1.html: link to go.sys
You talked me into it. This and other links should be updated
once the new import paths for the subrepos are established.
Dmitriy Vyukov [Tue, 12 Aug 2014 21:02:01 +0000 (01:02 +0400)]
runtime/pprof: fix data race
It's unclear why we do this broken double-checked locking.
The mutex is not held for the whole duration of CPU profiling.
Fixes #8365.
This function does not have a declaration/prototype in a.h, and it is used only
in buf.c, so it is local to it and thus can be marked as private by adding
'static' to it.
Joel Stemmer [Fri, 8 Aug 2014 19:42:20 +0000 (12:42 -0700)]
time: Fix missing colon when formatting time zone offsets with seconds
When formatting time zone offsets with seconds using the stdISO8601Colon
and stdNumColon layouts, the colon was missing between the hour and minute
parts.
Fixes #8497.
LGTM=r
R=golang-codereviews, iant, gobot, r
CC=golang-codereviews
https://golang.org/cl/126840043
Dmitriy Vyukov [Fri, 8 Aug 2014 16:52:11 +0000 (20:52 +0400)]
runtime: bump MaxGcprocs to 32
There was a number of improvements related to GC parallelization:
1. Parallel roots/stacks scanning.
2. Parallel stack shrinking.
3. Per-thread workbuf caches.
4. Workset reduction.
Currently 32 threads work well.
go.benchmarks:garbage benchmark on 2 x Intel Xeon E5-2690 (16 HT cores)
Dmitriy Vyukov [Fri, 8 Aug 2014 16:13:57 +0000 (20:13 +0400)]
runtime: fix data race in stackalloc
Stack shrinking happens during mark phase,
and it assumes that it owns stackcache in mcache.
Stack cache flushing also happens during mark phase,
and it accesses stackcache's w/o any synchronization.
This leads to stackcache corruption:
http://goperfd.appspot.com/log/309af5571dfd7e1817259b9c9cf9bcf9b2c27610
Russ Cox [Fri, 8 Aug 2014 15:20:45 +0000 (11:20 -0400)]
syscall: ignore EINVAL/ENOENT from readdirent on OS X 10.10
On OS X 10.10 Yosemite, if you have a directory that can be returned
in a single getdirentries64 call (for example, a directory with one file),
and you read from the directory at EOF twice, you get EOF both times:
fd = open("dir")
getdirentries64(fd) returns data
getdirentries64(fd) returns 0 (EOF)
getdirentries64(fd) returns 0 (EOF)
But if you remove the file in the middle between the two calls, the
second call returns an error instead.
fd = open("dir")
getdirentries64(fd) returns data
getdirentries64(fd) returns 0 (EOF)
remove("dir/file")
getdirentries64(fd) returns ENOENT/EINVAL
Whether you get ENOENT or EINVAL depends on exactly what was
in the directory. It is deterministic, just data-dependent.
This only happens in small directories. A directory containing more data
than fits in a 4k getdirentries64 call will return EOF correctly.
(It's not clear if the criteria is that the directory be split across multiple
getdirentries64 calls or that it be split across multiple file system blocks.)
We could change package os to avoid the second read at EOF,
and maybe we should, but that's a bit involved.
For now, treat the EINVAL/ENOENT as EOF.
With this CL, all.bash passes on my MacBook Air running
OS X 10.10 (14A299l) and Xcode 6 beta 5 (6A279r).
I tried filing an issue with Apple using "Feedback Assistant", but it was
unable to send the report and lost it.
C program reproducing the issue, also at http://swtch.com/~rsc/readdirbug.c:
Ian Lance Taylor [Thu, 7 Aug 2014 19:38:39 +0000 (12:38 -0700)]
cmd/go: pass --build-id=none when generating a cgo .o
Some systems, like Ubuntu, pass --build-id when linking. The
effect is to put a note in the output file. This is not
useful when generating an object file with the -r option, as
it eventually causes multiple build ID notes in the final
executable, all but one of which are for tiny portions of the
file and are therefore useless.
Disable that by passing an explicit --build-id=none when
linking with -r on systems that might do this.
Dmitriy Vyukov [Thu, 7 Aug 2014 17:39:32 +0000 (21:39 +0400)]
encoding/gob: make benchmarks parallel
There are lots of internal synchronization in gob,
so it makes sense to have parallel benchmarks.
Also add a benchmark with slices and interfaces.
LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/115960043
Russ Cox [Thu, 7 Aug 2014 16:33:19 +0000 (12:33 -0400)]
go/build: look in $GOROOT/src/cmd/foo/bar for import cmd/foo/bar
This lets us have non-main packages like cmd/internal or cmd/nm/internal/whatever.
The src/pkg migration (see golang.org/s/go14mainrepo) will allow this
as a natural side effect. The explicit change here just allows use of the
effect a little sooner.
Russ Cox [Thu, 7 Aug 2014 16:33:06 +0000 (12:33 -0400)]
cmd/addr2line, cmd/nm: factor object reading into cmd/internal/objfile
To do in another CL: make cmd/objdump use cmd/internal/objfile too.
There is a package placement decision in this CL:
cmd/internal/objfile instead of internal/objfile.
I chose to put internal under cmd to make clear (and enforce)
that no standard library packages should use this
(it's a bit dependency-heavy).