encoding/json: fix decoding of types with '[]byte' as underlying type
All slice types which have elements of kind reflect.Uint8 are marshalled
into base64 for compactness. When decoding such data into a custom type
based on []byte the decoder checked the slice kind instead of the slice
element kind, so no appropriate decoder was found.
Fixed by letting the decoder check slice element kind like the encoder.
This guarantees that already encoded data can still be successfully
decoded.
Name will be converted from an anonymous to a
named field in a subsequent, automated CL.
No functional changes. Passes toolstash -cmp.
This reduces the size of gc.Node from 424 to 400 bytes.
This in turn reduces the permanent (pprof -inuse_space)
memory usage while compiling the test/rotate?.go tests:
This CL was generated by updating Val in go.go
and then running:
sed -i "" 's/\.U\.[SBXFC]val = /.U = /' *.go
sed -i "" 's/\.U\.Sval/.U.\(string\)/g' *.go *.y
sed -i "" 's/\.U\.Bval/.U.\(bool\)/g' *.go *.y
sed -i "" 's/\.U\.Xval/.U.\(\*Mpint\)/g' *.go *.y
sed -i "" 's/\.U\.Fval/.U.\(\*Mpflt\)/g' *.go *.y
sed -i "" 's/\.U\.Cval/.U.\(\*Mpcplx\)/g' *.go *.y
No functional changes. Passes toolstash -cmp.
This reduces the size of gc.Node from 424 to 392 bytes.
This in turn reduces the permanent (pprof -inuse_space)
memory usage while compiling the test/rotate?.go tests:
CL 8445 was similar to this; gri asked that Val's implementation
be hidden first. CLs 8912, 9263, and 9267 have at least
isolated the changes to the cmd/internal/gc package.
Russ Cox [Thu, 14 May 2015 21:27:04 +0000 (17:27 -0400)]
runtime: allocate map element zero values for reflect-created types on demand
Preallocating them in reflect means that
(1) if you say _ = PtrTo(ArrayOf(1000000000, reflect.TypeOf(byte(0)))), you just allocated 1GB of data
(2) if you say it again, that's *another* GB of data.
The only use of t.zero in the runtime is for map elements.
Delay the allocation until the creation of a map with that element type,
and share the zeros.
The one downside of the shared zero is that it's not garbage collected,
but it's also never written, so the OS should be able to handle it fairly
efficiently.
Change-Id: I56b098a091abf3ac0945de28ebef9a6c08e76614
Reviewed-on: https://go-review.googlesource.com/10111 Reviewed-by: Keith Randall <khr@golang.org>
Patrick Mezard [Sat, 9 May 2015 13:51:45 +0000 (15:51 +0200)]
internal/syscall/windows/registry: fix read overrun in GetStringsValue
According to MSDN RegQueryValueEx page:
If the data has the REG_SZ, REG_MULTI_SZ or REG_EXPAND_SZ type, the
string may not have been stored with the proper terminating null
characters. Therefore, even if the function returns ERROR_SUCCESS, the
application should ensure that the string is properly terminated before
using it; otherwise, it may overwrite a buffer. (Note that REG_MULTI_SZ
strings should have two terminating null characters.)
Test written by Alex Brainman <alex.brainman@gmail.com>
Change-Id: I8c0852e0527e27ceed949134ed5e6de944189986
Reviewed-on: https://go-review.googlesource.com/9806 Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
Shenghou Ma [Tue, 12 May 2015 05:13:09 +0000 (01:13 -0400)]
syscall: add test for Flock_t roundtrip
See CL 9962 for the rationale.
Change-Id: I73c714fce258430eea1e61d3835f5c8e9014ca1f Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9925 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Shenghou Ma [Fri, 15 May 2015 01:01:52 +0000 (21:01 -0400)]
syscall: add explicit build tags
Auto-generated using the following bash script:
for i in z*_*_*.go; do
goosgoarch=`basename ${i/${i/_*/}_/} .go`
goos=${goosgoarch/_*/}
goarch=${goosgoarch/*_/}
echo $i $goos $goarch
[ "$goos" = "windows" ] && continue
sed -i -e "/^package /i\/\/ +build $goarch,$goos\n" "$i"
done
Change-Id: I756fee551d1698080e4591fed8f058ae0450aaa5 Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/10113 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Shenghou Ma [Thu, 14 May 2015 07:26:31 +0000 (03:26 -0400)]
syscall: fix F_SETLK{,W} on linux/ppc64
Change-Id: Ia81675b0f01ceafada32bdd2bc59088016a7421e
Reviewed-on: https://go-review.googlesource.com/10043 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Russ Cox [Sun, 10 May 2015 17:43:51 +0000 (13:43 -0400)]
runtime: keep pointer bits set always in 1-word spans
It's dumb to clear them in initSpan, set them in heapBitsSetType,
clear them in heapBitsSweepSpan, set them again in heapBitsSetType,
clear them again in heapBitsSweepSpan, and so on.
Set them in initSpan and be done with it (until the span is reused
for objects of a different size).
This avoids an atomic operation in a common case (one-word allocation).
Suggested by rlh.
Russ Cox [Mon, 11 May 2015 00:22:32 +0000 (20:22 -0400)]
runtime: rewrite addb/subtractb to be simpler to compile; introduce add1, subtract1
This reduces the depth of the inlining at a particular call site.
The inliner introduces many temporary variables, and the compiler can do
a better job with fewer. Being verbose in the bodies of these helper functions
seems like a reasonable tradeoff: the uses are still just as readable, and
they run faster in some important cases.
Brad Fitzpatrick [Wed, 13 May 2015 19:41:56 +0000 (12:41 -0700)]
net/http: flush request body chunks in Transport
The Transport's writer to the remote server is wrapped in a
bufio.Writer to suppress many small writes while writing headers and
trailers. However, when writing the request body, the buffering may get
in the way if the request body is arriving slowly.
Because the io.Copy from the Request.Body to the writer is already
buffered, the outer bufio.Writer is unnecessary and prevents small
Request.Body.Reads from going to the server right away. (and the
io.Reader contract does say to return when you've got something,
instead of blocking waiting for more). After the body is finished, the
Transport's bufio.Writer is still used for any trailers following.
A previous attempted fix for this made the chunk writer always flush
if the underlying type was a bufio.Writer, but that is not quite
correct. This CL instead makes it opt-in by using a private sentinel
type (wrapping a *bufio.Writer) to the chunk writer that requests
Flushes after each chunk body (the chunk header & chunk body are still
buffered together into one write).
Brad Fitzpatrick [Wed, 13 May 2015 22:28:11 +0000 (15:28 -0700)]
cmd/internal/obj: validate GOARM environment variable's value before use
I was previously setting GOARM=arm5 (due to confusion with previously
seeing buildall.sh's temporary of "arm5" as a GOARCH and
misremembernig), but GOARM=arm5 was acting like GOARM=5 only on
accident. See https://go-review.googlesource.com/#/c/10023/
Instead, fail if GOARM is not a known value.
Change-Id: I9ba4fd7268df233d40b09f0431f37cd85a049847
Reviewed-on: https://go-review.googlesource.com/10024 Reviewed-by: Minux Ma <minux@golang.org>
Mikio Hara [Wed, 13 May 2015 01:06:20 +0000 (10:06 +0900)]
doc: mention net.SocketConn, net.SocketPacketConn in go1.5.txt
Change-Id: I6bda19877ae5148ad349cfb8929f1103740422bb
Reviewed-on: https://go-review.googlesource.com/10005 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Russ Cox [Wed, 6 May 2015 16:35:53 +0000 (12:35 -0400)]
cmd/internal/gc: optimize slice + write barrier
The code generated for a slice x[i:j] or x[i:j:k] computes the entire
new slice (base, len, cap) and then uses it as the evaluation of the
slice expression.
If the slice is part of an update x = x[i:j] or x = x[i:j:k], there are
opportunities to avoid computing some of these fields.
For x = x[0:i], we know that only the len is changing;
base can be ignored completely, and cap can be left unmodified.
For x = x[0:i:j], we know that only len and cap are changing;
base can be ignored completely.
For x = x[i:i], we know that the resulting cap is zero, and we don't
adjust the base during a slice producing a zero-cap result,
so again base can be ignored completely.
No write to base, no write barrier.
The old slice code was trying to work at a Go syntax level, mainly
because that was how you wrote code just once instead of once
per architecture. Now the compiler is factored a bit better and we
can implement slice during code generation but still have one copy
of the code. So the new code is working at that lower level.
(It must, to update only parts of the result.)
Matthew Dempsky [Tue, 21 Apr 2015 00:17:24 +0000 (17:17 -0700)]
spec: fix binary expression grammar rule
The spec explains later in the "Operator precedence" section that *
has a higher precedence than +, but the current production rule
requires that "1 + 2 * 3" be parsed as "(1 + 2) * 3", instead of the
intended "1 + (2 * 3)".
The new production rule better matches cmd/internal/gc/go.y's grammar:
Rick Hudson [Thu, 7 May 2015 21:19:30 +0000 (17:19 -0400)]
runtime: reduce thrashing of gs between ps
One important use case is a pipeline computation that pass values
from one Goroutine to the next and then exits or is placed in a
wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable
another Goroutine and then immediately make P1 available to execute
it. We need to prevent other Ps from stealing the G that P1 is about
to execute. Otherwise the Gs can thrash between Ps causing unneeded
synchronization and slowing down throughput.
Fix this by changing the stealing logic so that when a P attempts to
steal the only G on some other P's run queue, it will pause
momentarily to allow the victim P to schedule the G.
As part of optimizing stealing we also use a per P victim queue
move stolen gs. This eliminates the zeroing of a stack local victim
queue which turned out to be expensive.
This CL is a necessary but not sufficient prerequisite to changing
the default value of GOMAXPROCS to something > 1 which is another
CL/discussion.
For highly serialized programs, such as GoroutineRing below this can
make a large difference. For larger and more parallel programs such
as the x/benchmarks there is no noticeable detriment.
FileConn and FilePacketConn APIs accept user-configured socket
descriptors to make them work together with runtime-integrated network
poller, but there's a limitation. The APIs reject protocol sockets that
are not supported by standard library. It's very hard for the net,
syscall packages to look after all platform, feature-specific sockets.
This change allows various platform, feature-specific socket descriptors
to use runtime-integrated network poller by using SocketConn,
SocketPacketConn APIs that bridge between the net, syscall packages and
platforms.
New exposed APIs:
pkg net, func SocketConn(*os.File, SocketAddr) (Conn, error)
pkg net, func SocketPacketConn(*os.File, SocketAddr) (PacketConn, error)
pkg net, type SocketAddr interface { Addr, Raw }
pkg net, type SocketAddr interface, Addr([]uint8) Addr
pkg net, type SocketAddr interface, Raw(Addr) []uint8
Fixes #10565.
Change-Id: Iec57499b3d84bb5cb0bcf3f664330c535eec11e3
Reviewed-on: https://go-review.googlesource.com/9275 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Tue, 12 May 2015 03:49:32 +0000 (20:49 -0700)]
syscall: mkerrors.sh: don't define _FILE_OFFSET_BITS if __LP64__
If __LP64__ is defined then the type "long" is 64-bits, and there is
no need to explicitly request _FILE_OFFSET_BITS == 64. This changes
the definitions of F_GETLK, F_SETLK, and F_SETLKW on PPC to the values
that the kernel requires. The values used in C when _FILE_OFFSET_BITS
== 64 are corrected by the glibc fcntl function before making the
system call.
With this change, regenerate ppc64le files on Ubuntu trusty.
Change-Id: I8dddbd8a6bae877efff818f5c5dd06291ade3238
Reviewed-on: https://go-review.googlesource.com/9962 Reviewed-by: Minux Ma <minux@golang.org>
Hyang-Ah (Hana) Kim [Tue, 12 May 2015 20:47:40 +0000 (16:47 -0400)]
misc/cgo/testcshared: fix test for android.
On android the generated header files are located in
pkg/$(go env GOOS)_$(go env GOARCH)_testcshared.
The test was broken since https://go-review.googlesource.com/9798.
The installation path differs based on codegenArgs
(around src/cmd/go/build.go line 389), and the codegenArgs
is platform dependent.
Change-Id: I01ae9cb957fb7676e399f3b8c067f24c5bd20b9d
Reviewed-on: https://go-review.googlesource.com/9980 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Austin Clements [Mon, 11 May 2015 22:53:49 +0000 (18:53 -0400)]
runtime: don't run runq tests on the system stack
Running these tests on the system stack is problematic because they
allocate Ps, which are large enough to overflow the system stack if
they are stack-allocated. It used to be necessary to run these tests
on the system stack because they were written in C, but since this is
no longer the case, we can fix this problem by simply not running the
tests on the system stack.
This also means we no longer need the hack in one of these tests that
forces the allocated Ps to escape to the heap, so eliminate that as
well.
Russ Cox [Tue, 12 May 2015 19:51:22 +0000 (15:51 -0400)]
cmd/5g: fix build
The line in cgen.go was lost during the ginscmp CL.
The ggen.go change is not strictly necessary, but
it makes the 5g -S output for x[0] match what it said
before the ginscmp CL.
Andrew Williams [Sun, 25 Jan 2015 18:53:34 +0000 (12:53 -0600)]
syscall: relocate linux death signal code
Fix bug on Linux SysProcAttr handling: setting both Pdeathsig and
Credential caused Pdeathsig to be ignored. This is because the kernel
clears the deathsignal field when performing a setuid/setgid
system call.
Avoid this by moving Pdeathsig handling after Credential handling.
Fixes #9686
Change-Id: Id01896ad4e979b8c448e0061f00aa8762ca0ac94
Reviewed-on: https://go-review.googlesource.com/3290 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Russ Cox [Wed, 6 May 2015 16:34:30 +0000 (12:34 -0400)]
cmd/internal/gc: optimize append + write barrier
The code generated for x = append(x, v) is roughly:
t := x
if len(t)+1 > cap(t) {
t = grow(t)
}
t[len(t)] = v
len(t)++
x = t
We used to generate this code as Go pseudocode during walk.
Generate it instead as actual instructions during gen.
Doing so lets us apply a few optimizations. The most important
is that when, as in the above example, the source slice and the
destination slice are the same, the code can instead do:
t := x
if len(t)+1 > cap(t) {
t = grow(t)
x = {base(t), len(t)+1, cap(t)}
} else {
len(x)++
}
t[len(t)] = v
That is, in the fast path that does not reallocate the array,
only the updated length needs to be written back to x,
not the array pointer and not the capacity. This is more like
what you'd write by hand in C. It's faster in general, since
the fast path elides two of the three stores, but it's especially
faster when the form of x is such that the base pointer write
would turn into a write barrier. No write, no barrier.
See next CL for combination with a similar optimization for slice.
The benchmarks that are slower in this CL are still faster overall
with the combination of the two.
Change-Id: I2a7421658091b2488c64741b4db15ab6c3b4cb7e
Reviewed-on: https://go-review.googlesource.com/9812 Reviewed-by: David Chase <drchase@google.com>
David du Colombier [Tue, 12 May 2015 16:20:04 +0000 (18:20 +0200)]
runtime: fix signal handling on Plan 9
Once added to the signal queue, the pointer passed to the
signal handler could no longer be valid. Instead of passing
the pointer to the note string, we recopy the value of the
note string to a static array in the signal queue.
Russ Cox [Fri, 1 May 2015 00:35:47 +0000 (20:35 -0400)]
cmd/internal/gc: avoid turning 'x = f()' into 'tmp = f(); x = tmp' for simple x
This slows down more things than I expected, but it also speeds things up,
and it reduces stack frame sizes and the load on the optimizer, so it's still
likely a net win.
Russ Cox [Mon, 11 May 2015 20:12:01 +0000 (16:12 -0400)]
cmd/internal/gc: detect bad append(f()) during type check
Today's earlier fix can stay, but it's a band-aid over the real problem,
which is that bad code was slipping through the type checker
into the back end (and luckily causing a type error there).
I discovered this because my new append does not use the same
temporaries and failed the test as written.
Fixes #9521.
Change-Id: I7e33e2ea15743406e15c6f3fdf73e1edecda69bd
Reviewed-on: https://go-review.googlesource.com/9921 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Jens Frederich [Thu, 19 Feb 2015 20:37:38 +0000 (21:37 +0100)]
cmd/go: "go get" don't ignore git default branch
Any Git branch can be the default branch not only master. Removing
hardwired 'checkout master', and using 'checkout {tag}' is the best
choice. It works with and without a master branch. Furthermore it
resolves the Github default branch issue. Changing Github default
branch is effectively changing HEAD.
Patrick Mezard [Tue, 12 May 2015 06:19:00 +0000 (08:19 +0200)]
time: fix registry zone info lookup on Windows
registry.ReadSubKeyNames requires QUERY access right in addition to
ENUMERATE_SUB_KEYS.
This was making TestLocalZoneAbbr fail on Windows 7 in Paris/Madrid
timezone. It succeeded on Windows 8 because timezone name changed from
"Paris/Madrid" to "Romance Standard Time", the latter being matched by
an abbrs entry.
Alex Brainman [Tue, 12 May 2015 01:34:23 +0000 (11:34 +1000)]
net: relax error checking in TestAcceptIgnoreSomeErrors
TestAcceptIgnoreSomeErrors was created to test that network
accept function ignores some errors. But conditions created
by the test also affects network reads. Change the test to
ignore these read errors when acceptable.
Shenghou Ma [Tue, 12 May 2015 00:59:59 +0000 (20:59 -0400)]
syscall: fix running mkall.sh on linux/{ppc64,ppc64le}
Change-Id: I58c6e914d0e977d5748c87d277e30c933ed86f99
Reviewed-on: https://go-review.googlesource.com/9924 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Michael Hudson-Doyle [Sat, 11 Apr 2015 04:05:21 +0000 (12:05 +0800)]
cmd/internal/ld, runtime: abort on shared library ABI mismatch
This:
1) Defines the ABI hash of a package (as the SHA1 of the __.PKGDEF)
2) Defines the ABI hash of a shared library (sort the packages by import
path, concatenate the hashes of the packages and SHA1 that)
3) When building a shared library, compute the above value and define a
global symbol that points to a go string that has the hash as its value.
4) When linking against a shared library, read the abi hash from the
library and put both the value seen at link time and a reference
to the global symbol into the moduledata.
5) During runtime initialization, check that the hash seen at link time
still matches the hash the global symbol points to.
Change-Id: Iaa54c783790e6dde3057a2feadc35473d49614a5
Reviewed-on: https://go-review.googlesource.com/8773 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
Michael Hudson-Doyle [Mon, 11 May 2015 23:59:14 +0000 (11:59 +1200)]
runtime: fix addmoduledata to follow the platform ABI
addmoduledata is called from a .init_array function and need to follow the
platform ABI. It contains accesses to global data which are rewritten to use
R15 by the assembler, and as R15 is callee-save we need to save it.
Change-Id: I03893efb1576aed4f102f2465421f256f3bb0f30
Reviewed-on: https://go-review.googlesource.com/9941 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Rahul Chaudhry [Fri, 8 May 2015 19:33:30 +0000 (12:33 -0700)]
cmd/dist: de-dup iOS detection
Change-Id: I89778988baec1cf4a35d9342c7dbe8c4c08ff3cd
Reviewed-on: https://go-review.googlesource.com/9893
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Michael Hudson-Doyle [Sun, 10 May 2015 23:24:10 +0000 (11:24 +1200)]
cmd/internal/ld: change Cpos to not flush the output buffer
DWARF generation appears to assume Cpos is cheap and this makes linking godoc
about 8% faster and linking the standard library into a single shared library
about 22% faster on my machine.
Updates #10571
Change-Id: I3f81efd0174e356716e7971c4f59810b72378177
Reviewed-on: https://go-review.googlesource.com/9913 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Daniel Morsing [Mon, 11 May 2015 09:31:24 +0000 (10:31 +0100)]
net/http: silence race detector on client header timeout test
When running the client header timeout test, there is a race between
us timing out and waiting on the remaining requests to be serviced. If
the client times out before the server blocks on the channel in the
handler, we will be simultaneously adding to a waitgroup with the
value 0 and waiting on it when we call TestServer.Close().
This is largely a theoretical race. We have to time out before we
enter the handler and the only reason we would time out if we're
blocked on the channel. Nevertheless, make the race detector happy
by turning the close into a channel send. This turns the defer call
into a synchronization point and we can be sure that we've entered
the handler before we close the server.
Russ Cox [Fri, 8 May 2015 02:40:54 +0000 (22:40 -0400)]
runtime: use heap bitmap for typedmemmove
The current implementation of typedmemmove walks the ptrmask
in the type to find out where pointers are. This led to turning off
GC programs for the Go 1.5 dev cycle, so that there would always
be a ptrmask. Instead of also interpreting the GC programs,
interpret the heap bitmap, which we know must be available and
up to date. (There is no point to write barriers when writing outside
the heap.)
This CL is only about correctness. The next CL will optimize the code.
Russ Cox [Fri, 8 May 2015 02:39:57 +0000 (22:39 -0400)]
runtime: zero entire bitmap for object, even past dead marker
We want typedmemmove to use the heap bitmap to determine
where pointers are, instead of reinterpreting the type information.
The heap bitmap is simpler to access.
In general, typedmemmove will need to be able to look up the bits
for any word and find valid pointer information, so fill even after the
dead marker. Not filling after the dead marker was an optimization
I introduced only a few days ago, when reintroducing the dead marker
code. At the time I said it probably wouldn't last, and it didn't.
Russ Cox [Thu, 7 May 2015 15:03:17 +0000 (11:03 -0400)]
runtime: reorder bits in heap bitmap bytes
The runtime deals with 1-bit pointer bitmaps and 2-bit heap bitmaps
that have entries for both pointers and mark bits.
Each byte in a 1-bit pointer bitmap looks like pppppppp (all pointer bits).
Each byte in a 2-bit heap bitmap looks like mpmpmpmp (mark, pointer, ...).
This means that when converting from 1-bit to 2-bit, as we do
during malloc, we have to pick up 4 bits in pppp form and use
shifts to create the mpmpmpmp form.
This CL changes the 2-bit heap bitmap form to mmmmpppp,
so that 4 bits picked up in 1-bit form can be used directly in
the low bits of the heap bitmap byte, without expansion.
This simplifies the code, and it also happens to be faster.
Russ Cox [Wed, 6 May 2015 16:25:56 +0000 (12:25 -0400)]
cmd/internal/gc: drop unused Reslice field from Node
Dead code.
This field is left over from Go 1.4, when we elided the fake write
barrier in this case. Today, it's unused (always false).
The upcoming append/slice changes handle this case again,
but without needing this field.
Change-Id: Ic6f160b64efdc1bbed02097ee03050f8cd0ab1b8
Reviewed-on: https://go-review.googlesource.com/9789 Reviewed-by: David Chase <drchase@google.com>
Russ Cox [Wed, 6 May 2015 16:22:05 +0000 (12:22 -0400)]
cmd/internal/gc: show register dump before crashing on register left allocated
If you are using -h to get a stack trace at the site of the failure,
Yyerror will never return. Dump the register allocation sites
before calling Yyerror.
Russ Cox [Tue, 5 May 2015 04:26:53 +0000 (00:26 -0400)]
runtime: remove wbshadow mode
The write barrier shadow heap was very useful for
developing the write barriers initially, but it's no longer used,
clunky, and dragging the rest of the implementation down.
The gccheckmark mode will find bugs due to missed barriers
when they result in missed marks; wbshadow mode found the
missed barriers more aggressively, but it required an entire
separate copy of the heap. The gccheckmark mode requires
no extra memory, making it more useful in practice.
It's nice that the microbenchmarks are the ones helped the most,
because those were the ones hurt the most by the conversion from
4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that
process to (compared to CL 9706 patch set 1):
Russ Cox [Tue, 5 May 2015 02:53:54 +0000 (22:53 -0400)]
runtime: reintroduce ``dead'' space during GC scan
Reintroduce an optimization discarded during the initial conversion
from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the
place in the bitmap where there are no more pointers, mark that position
for the GC so that it can avoid scanning past that place.
During heapBitsSetType we can also avoid initializing heap bitmap
beyond that location, which gives a bit of a win compared to Go 1.4.
This particular optimization (not initializing the heap bitmap) may not last:
we might change typedmemmove to use the heap bitmap, in which
case it would all need to be initialized. The early stop in the GC scan
will stay no matter what.
Compared to Go 1.4 (github.com/rsc/go, branch go14bench):
name old mean new mean delta
SetTypeNode64 80.7ns × (1.00,1.01) 57.4ns × (1.00,1.01) -28.83% (p=0.000)
SetTypeNode64Dead 80.5ns × (1.00,1.01) 13.1ns × (0.99,1.02) -83.77% (p=0.000)
SetTypeNode64Slice 2.16µs × (1.00,1.01) 1.54µs × (1.00,1.01) -28.75% (p=0.000)
SetTypeNode64DeadSlice 2.16µs × (1.00,1.01) 1.52µs × (1.00,1.00) -29.74% (p=0.000)
Compared to previous CL:
name old mean new mean delta
SetTypeNode64 56.7ns × (1.00,1.00) 57.4ns × (1.00,1.01) +1.19% (p=0.000)
SetTypeNode64Dead 57.2ns × (1.00,1.00) 13.1ns × (0.99,1.02) -77.15% (p=0.000)
SetTypeNode64Slice 1.56µs × (1.00,1.01) 1.54µs × (1.00,1.01) -0.89% (p=0.000)
SetTypeNode64DeadSlice 1.55µs × (1.00,1.01) 1.52µs × (1.00,1.00) -2.23% (p=0.000)
This is the last CL in the sequence converting from the 4-bit heap
to the 2-bit heap, with all the same optimizations reenabled.
Compared to before that process began (compared to CL 9701 patch set 1):
The slowdown of the first few benchmarks seems to be due to the new
atomic operations for certain small size allocations. But the larger
benchmarks mostly improve, probably due to the decreased memory
pressure from having half as much heap bitmap.
CL 9706, which removes the (never used anymore) wbshadow mode,
gets back what is lost in the early microbenchmarks.
Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d
Reviewed-on: https://go-review.googlesource.com/9705 Reviewed-by: Rick Hudson <rlh@golang.org>
Russ Cox [Mon, 4 May 2015 15:30:10 +0000 (11:30 -0400)]
runtime: optimize heapBitsSetType
For the conversion of the heap bitmap from 4-bit to 2-bit fields,
I replaced heapBitsSetType with the dumbest thing that could possibly work:
two atomic operations (atomicand8+atomicor8) per 2-bit field.
This CL replaces that code with a proper implementation that
avoids the atomics whenever possible. Benchmarks vs base CL
(before the conversion to 2-bit heap bitmap) and vs Go 1.4 below.
Compared to Go 1.4, SetTypePtr (a 1-pointer allocation)
is 10ns slower because a race against the concurrent GC requires the
use of an atomicor8 that used to be an ordinary write. This slowdown
was present even in the base CL.
Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation)
is 10ns slower because it too needs a new atomic, because with the
denser representation, the byte on the end of the allocation is now shared
with the object next to it; this was not true with the 4-bit representation.
Excluding these two (fundamental) slowdowns due to the use of atomics,
the new code is noticeably faster than both Go 1.4 and the base CL.
The next CL will reintroduce the ``typeDead'' optimization.
Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5).
There are fewer benchmarks vs Go 1.4 because unrolling directly
into the heap bitmap is not yet implemented, so those would not
be meaningful comparisons.
These benchmarks were not present in Go 1.4 as distributed.
The backport to Go 1.4 is in github.com/rsc/go's go14bench branch,
commit 71d5ee5.
Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6
Reviewed-on: https://go-review.googlesource.com/9704 Reviewed-by: Rick Hudson <rlh@golang.org>
Russ Cox [Mon, 4 May 2015 14:19:24 +0000 (10:19 -0400)]
runtime: use 2-bit heap bitmap (in place of 4-bit)
Previous CLs changed the representation of the non-heap type bitmaps
to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap
stored a 2-bit type for each word and a mark bit and checkmark bit
for the first word of the object. (There used to be additional per-word bits.)
Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not,
and the other used for mark, checkmark, and "keep scanning forward
to find pointers in this object." See comments for details.
This CL replaces heapBitsSetType with very slow but obviously correct code.
A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.)
runtime: use 1-bit pointer bitmaps in type representation
The type information in reflect.Type and the GC programs is now
1 bit per word, down from 2 bits.
The in-memory unrolled type bitmap representation are now
1 bit per word, down from 4 bits.
The conversion from the unrolled (now 1-bit) bitmap to the
heap bitmap (still 4-bit) is not optimized. A followup CL will
work on that, after the heap bitmap has been converted to 2-bit.
The typeDead optimization, in which a special value denotes
that there are no more pointers anywhere in the object, is lost
in this CL. A followup CL will bring it back in the final form of
heapBitsSetType.
Russ Cox [Sun, 3 May 2015 02:59:35 +0000 (22:59 -0400)]
runtime: add benchmark of heapBitsSetType
There was an old benchmark that measured this indirectly
via allocation, but I don't understand how to factor out the
allocation cost when interpreting the numbers.
Replace with a benchmark that only calls heapBitsSetType,
that does not allocate. This was not possible when the
benchmark was first written, because heapBitsSetType had
not been factored out of mallocgc.
Daniel Morsing [Thu, 30 Apr 2015 14:32:54 +0000 (15:32 +0100)]
runtime: enable profiling on g0
Since we now have stack information for code running on the
systemstack, we can traceback over it. To make cpu profiles useful,
add a case in gentraceback to jump over systemstack switches.
Mikio Hara [Sun, 10 May 2015 14:11:04 +0000 (23:11 +0900)]
net: increase timeout in TestWriteTimeoutFluctuation on darwin/arm
On darwin/arm, the test sometimes fails with:
Process 557 resuming
--- FAIL: TestWriteTimeoutFluctuation (1.64s)
timeout_test.go:706: Write took over 1s; expected 0.1s
FAIL
Process 557 exited with status = 1 (0x00000001)
go_darwin_arm_exec: timeout running tests
This change increaes timeout on iOS builders from 1s to 3s as a
temporarily fix.
Updates #10775.
Change-Id: Ifdaf99cf5b8582c1a636a0f7d5cc66bb276efd72
Reviewed-on: https://go-review.googlesource.com/9915 Reviewed-by: Minux Ma <minux@golang.org>
ExpandString correctly loops on the syscall until it reaches the
required buffer size but truncates it before converting it back to
string. The truncation limit is increased to 2^15 bytes which is the
documented maximum ExpandEnvironmentStrings output size.
This fixes TestExpandString on systems where len($PATH) > 1024.
Change-Id: I2a6f184eeca939121b458bcffe1a436a50f3298e
Reviewed-on: https://go-review.googlesource.com/9805 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>