Emmanuel Odeke [Wed, 4 May 2016 07:42:13 +0000 (01:42 -0600)]
runtime: print signal name in panic, if name is known
Adds a small function signame that infers a signal name
from the signal table, otherwise will fallback to using
hex(sig) as previously. No signal table is present for
Windows hence it will always print the hex value.
```shell
$ go run main.go &
$ kill -11 <pid>
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e
pc=0xc71db]
...
```
Fixes #13969
Change-Id: Ie6be312eb766661f1cea9afec352b73270f27f9d
Reviewed-on: https://go-review.googlesource.com/22753 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:
- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync
- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.
- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.
Fixes #15469
Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:
If b has exactly one predecessor, as happens
frequently with static calls, we can make
lookupVarOutgoing generate less garbage.
Instead of generating a value that is just
going to be an OpCopy and then get eliminated,
loop. This can lead to lots of looping.
However, this loop is way cheaper than generating
lots of ssa.Values and then eliminating them.
For a subset of the code in #15537:
Before:
28.31 real 36.17 user 1.68 sys 2282450944 maximum resident set size
After:
9.63 real 11.66 user 0.51 sys 638144512 maximum resident set size
Updates #15537.
Excitingly, it appears that this also helps
regular code:
Ian Lance Taylor [Wed, 4 May 2016 20:27:27 +0000 (13:27 -0700)]
runtime: put tracebackctxt C functions in .c file
Since tracebackctxt.go uses //export functions, the C functions can't be
externally visible in the C comment. The code was using attributes to
work around that, but that failed on Windows.
Change-Id: If4449fd8209a8998b4f6855ea89e5db1471b2981
Reviewed-on: https://go-review.googlesource.com/22786 Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Alex Brainman [Tue, 3 May 2016 06:37:33 +0000 (16:37 +1000)]
debug/pe: unexport newly introduced identifiers
CLs 22181, 22332 and 22336 intorduced new functionality to be used
in cmd/link (see issue #15345 for details). But we didn't have chance
to use new functionality yet. Unexport newly introduced identifiers,
so we don't have to commit to the API until we actually tried it.
Rename File.COFFSymbols into File._COFFSymbols,
COFFSymbol.FullName into COFFSymbol._FullName,
Section.Relocs into Section._Relocs,
Reloc into _Relocs,
File.StringTable into File._StringTable and
StringTable into _StringTable.
Updates #15345
Change-Id: I770eeb61f855de85e0c175225d5d1c006869b9ec
Reviewed-on: https://go-review.googlesource.com/22720 Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Michael Munday [Wed, 4 May 2016 18:26:46 +0000 (14:26 -0400)]
cmd/compile: fix uint64 to float casts on ppc64
Adds the FCFIDU instruction and uses it instead of the FCFID
instruction for unsigned integer to float casts. This change means
that unsigned integers do not have to be cast to signed integers
before being cast to a floating point value. Therefore it is no
longer necessary to insert instructions to detect and fix
values that overflow int64.
The previous code generating the uint64 to int64 cast handled
overflow by truncating the uint64 value. This truncation can
change the result of the rounding performed by the integer to
float cast.
The FCFIDU instruction was added in Power ISA 2.06B.
Cherry Zhang [Tue, 3 May 2016 20:49:54 +0000 (13:49 -0700)]
runtime/cgo: add context argument to crosscall2 on mips64
Change-Id: Id018516075842afd8af12fbf207763a851d5a851
Reviewed-on: https://go-review.googlesource.com/22754 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Martin Möhrmann [Tue, 3 May 2016 09:10:26 +0000 (11:10 +0200)]
misc/cgo/fortran: fix gfortran compile test
Fixes #14544
Change-Id: I58b0b164ebbfeafe4ab32039a063df53e3018a6d
Reviewed-on: https://go-review.googlesource.com/22730 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Sean Lake <odysseus9672@gmail.com>
Michael Hudson-Doyle [Tue, 3 May 2016 23:23:24 +0000 (11:23 +1200)]
cmd/link: always read type data for dynimport symbols
Consider three shared libraries:
libBase.so -- defines a type T
lib2.so -- references type T
lib3.so -- also references type T, and something from lib2
lib2.so will contain a type symbol for T in its symbol table, but no
definition. If, when linking lib3.so the linker reads the symbols from lib2.so
before libBase.so, the linker didn't read the type data and later crashed.
The fix is trivial but the test change is a bit messy because the order the
linker reads the shared libraries in ends up depending on the order of the
import statements in the file so I had to rename one of the test packages so
that gofmt doesn't fix the test by accident...
Fixes #15516
Change-Id: I124b058f782c900a3a54c15ed66a0d91d0cde5ce
Reviewed-on: https://go-review.googlesource.com/22744
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Vishvananda Ishaya [Wed, 17 Feb 2016 01:58:11 +0000 (17:58 -0800)]
net: allow netgo to use lookup from nsswitch.conf
Change https://golang.org/cl/8945 allowed Go to use its own DNS resolver
instead of libc in a number of cases. The code parses nsswitch.conf and
attempts to resolve things in the same order. Unfortunately, builds with
netgo completely ignore this parsing and always search via
hostLookupFilesDNS.
This commit modifies the logic to allow binaries built with netgo to
parse nsswitch.conf and attempt to resolve using the order specified
there. If the parsing results in hostLookupCGo, it falls back to the
original hostLookupFilesDNS. Tests are also added to ensure that both
the parsing and the fallback work properly.
Fixes #14354
Change-Id: Ib079ad03d7036a4ec57f18352a15ba55d933f261
Reviewed-on: https://go-review.googlesource.com/19523
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Robert Griesemer [Tue, 3 May 2016 00:03:36 +0000 (17:03 -0700)]
cmd/compile: use correct packages when exporting/importing _ (blank) names
1) Blank parameters cannot be accessed so the package doesn't matter.
Do not export it, and consistently use localpkg when importing a
blank parameter.
2) More accurately replicate fmt.go and parser.go logic when importing
a blank struct field. Blank struct fields get exported without
package qualification.
(This is actually incorrect, even with the old textual export format,
but we will fix that in a separate change. See also issue 15514.)
Fixes #15491.
Change-Id: I7978e8de163eb9965964942aee27f13bf94a7c3c
Reviewed-on: https://go-review.googlesource.com/22714 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
1.7 traces embed symbol info and we now generate symbolized pprof profiles,
so we don't need the binary. Make binary argument optional as 1.5 traces
still need it.
Change-Id: I65eb13e3d20ec765acf85c42d42a8d7aae09854c
Reviewed-on: https://go-review.googlesource.com/22410 Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> Reviewed-by: Austin Clements <austin@google.com>
Dmitry Vyukov [Fri, 26 Feb 2016 20:57:16 +0000 (21:57 +0100)]
runtime: per-P contexts for race detector
Race runtime also needs local malloc caches and currently uses
a mix of per-OS-thread and per-goroutine caches. This leads to
increased memory consumption. But more importantly cache of
synchronization objects is per-goroutine and we don't always
have goroutine context when feeing memory in GC. As the result
synchronization object descriptors leak (more precisely, they
can be reused if another synchronization object is recreated
at the same address, but it does not always help). For example,
the added BenchmarkSyncLeak has effectively runaway memory
consumption (based on a real long running server).
This change updates race runtime with support for per-P contexts.
BenchmarkSyncLeak now stabilizes at ~1GB memory consumption.
Long term, this will allow us to remove race runtime dependency
on glibc (as malloc is the main cornerstone).
I've also implemented a different scheme to pass P context to
race runtime: scheduler notified race runtime about association
between G and P by calling procwire(g, p)/procunwire(g, p).
But it turned out to be very messy as we have lots of places
where the association changes (e.g. syscalls). So I dropped it
in favor of the current scheme: race runtime asks scheduler
about the current P.
Fixes #14533
Change-Id: Iad10d2f816a44affae1b9fed446b3580eafd8c69
Reviewed-on: https://go-review.googlesource.com/19970 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Dmitry Vyukov [Fri, 18 Mar 2016 15:34:11 +0000 (16:34 +0100)]
runtime: fix CPU underutilization
Runqempty is a critical predicate for scheduler. If runqempty spuriously
returns true, then scheduler can fail to schedule arbitrary number of
runnable goroutines on idle Ps for arbitrary long time. With the addition
of runnext runqempty predicate become broken (can spuriously return true).
Consider that runnext is not nil and the main array is empty. Runqempty
observes that the array is empty, then it is descheduled for some time.
Then queue owner pushes another element to the queue evicting runnext
into the array. Then queue owner pops runnext. Then runqempty resumes
and observes runnext is nil and returns true. But there were no point
in time when the queue was empty.
Fix runqempty predicate to not return true spuriously.
Michael Hudson-Doyle [Mon, 2 May 2016 02:46:40 +0000 (14:46 +1200)]
cmd/cgo: an approach to tsan that works with gcc
GCC, unlike clang, does not provide any way for code being compiled to tell if
-fsanitize-thread was passed. But cgo can look to see if that flag is being
passed and generate different code in that case.
Fixes #14602
Change-Id: I86cb5318c2e35501ae399618c05af461d1252d2d
Reviewed-on: https://go-review.googlesource.com/22688
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f
Reviewed-on: https://go-review.googlesource.com/21819 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Ian Lance Taylor [Mon, 2 May 2016 00:03:46 +0000 (17:03 -0700)]
cmd/cgo, misc/cgo/test: make -Wdeclaration-after-statement clean
I got a complaint that cgo output triggers warnings with
-Wdeclaration-after-statement. I don't think it's worth testing for
this--C has permitted declarations after statements since C99--but it is
easy enough to fix. It may break again; so it goes.
This CL also fixes errno handling to avoid getting confused if the tsan
functions happen to change the global errno variable.
Change-Id: I0ec7c63a6be5653ef44799d134c8d27cb5efa441
Reviewed-on: https://go-review.googlesource.com/22686
Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Mikio Hara [Sun, 1 May 2016 19:44:46 +0000 (04:44 +0900)]
net/http: gofmt -w -s
Change-Id: I7e07888e90c7449f119e74b97995efcd7feef76e
Reviewed-on: https://go-review.googlesource.com/22682 Reviewed-by: Ian Lance Taylor <iant@golang.org>
The Transport's automatic gzip uncompression lost information in the
process (the compressed Content-Length, if known). Normally that's
okay, but it's not okay for reverse proxies which have to be able to
generate a valid HTTP response from the Transport's provided
*Response.
Reverse proxies should normally be disabling compression anyway and
just piping the compressed pipes though and not wasting CPU cycles
decompressing them. So also document that on the new Uncompressed
field.
Then, using the new field, fix Response.Write to not inject a bogus
"Connection: close" header when it doesn't see a transfer encoding or
content-length.
Updates #15366 (the http2 side remains, once this is submitted)
Brad Fitzpatrick [Sun, 1 May 2016 04:11:26 +0000 (21:11 -0700)]
net/http: provide access to the listener address an HTTP request arrived on
This adds a context key named LocalAddrContextKey (for now, see #15229) to
let users access the net.Addr of the net.Listener that accepted the connection
that sent an HTTP request. This is similar to ServerContextKey which provides
access to the *Server. (A Server may have multiple Listeners)
Brad Fitzpatrick [Sun, 1 May 2016 02:11:42 +0000 (21:11 -0500)]
net/http: add Transport.IdleConnTimeout
Don't keep idle HTTP client connections open forever. Add a new knob,
Transport.IdleConnTimeout, and make the default be 90 seconds. I
figure 90 seconds is more than a minute, and less than infinite, and I
figure enough code has things waking up once a minute polling APIs.
This also removes the Transport's idleCount field which was unused and
redundant with the size of the idleLRU map (which was actually used).
The B/op number is effectively meaningless. There
is a surprisingly large one-time cost that gets
divided by the number of iterations that your
machine can get through in a second.
This CL discards the first run, which helps.
It is not a panacea. Running with -benchtime=10s
will allow the sync.Pool to be emptied,
which brings the problem back.
However, since there are more iterations to divide
the cost through, it’s not quite as bad,
and running with a high benchtime is rare.
This CL changes the meaning of the B/op number,
which is unfortunate, since it won’t have the
same order of magnitude as previous Go versions.
But it wasn’t really comparable before anyway,
since it didn’t have any reliable meaning at all.
Austin Clements [Sun, 1 May 2016 01:47:30 +0000 (21:47 -0400)]
runtime: update some comments
This updates some comments that became out of date when we moved the
mark bit out of the heap bitmap and started using the high bit for the
first word as a scan/dead bit.
Change-Id: I4a572d16db6114cadff006825466c1f18359f2db
Reviewed-on: https://go-review.googlesource.com/22662 Reviewed-by: Rick Hudson <rlh@golang.org>
MIPS N64 ABI passes arguments in registers R4-R11, return value in R2.
R16-R23, R28, R30 and F24-F31 are callee-save. gcc PIC code expects
to be called with indirect call through R25.
Change-Id: I24f582b4b58e1891ba9fd606509990f95cca8051
Reviewed-on: https://go-review.googlesource.com/19805 Reviewed-by: Minux Ma <minux@golang.org>
cmd/compile: Improve readability of HTML produced by GOSSAFUNC
Factor out the Aux/AuxInt handling in (*Value).LongString() and
use it in (*Value).LongHTML() as well.
This especially improves readability of auxFloat32, auxFloat64,
and auxSymValAndOff values which would otherwise be printed as
opaque integers.
This change also makes LongString() slightly less verbose by
eliding offsets that are zero (as is very often the case).
Additionally, ensure the HTML is interpreted as UTF-8 so that
non-ASCII characters (especially the "middle dots" in some symbols)
show up correctly.
cmd/internal/obj/mips et al.: introduce SB register on mips64x
SB register (R28) is introduced for access external addresses with shorter
instruction sequences. It is loaded at entry points. External data within
2G of SB can be accessed this way.
cmd/internal/obj: relocaltion R_ADDRMIPS is split into two relocations
R_ADDRMIPS and R_ADDRMIPSU, handling the low 16 bits and the "upper" 16
bits of external addresses, respectively, since the instructios may not
be adjacent. It might be better if relocation Variant could be used.
cmd/link/internal/mips64: support new relocations.
cmd/compile/internal/mips64: reserve SB register.
runtime: initialize SB register at entry points.
Change-Id: I5f34868f88c5a9698c042a8a1f12f76806c187b9
Reviewed-on: https://go-review.googlesource.com/19802 Reviewed-by: Minux Ma <minux@golang.org>
Kevin Burke [Sat, 23 Apr 2016 18:00:05 +0000 (11:00 -0700)]
database/sql: clone data for named []byte types
Previously named byte types like json.RawMessage could get dirty
database memory from a call to Scan. These types would activate a
code path that didn't clone the byte data coming from the database
before assigning it. Another thread could then overwrite the byte
array in src, which has unexpected consequences.
Originally reported by Jason Moiron; the patch and test are his
suggestions. Fixes #13905.
With the switch to separate mark bitmaps, the scan/dead bit for the
first word of each object is now unused. Reclaim this bit and use it
as a scan/dead bit, just like words three and on. The second word is
still used for checkmark.
This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
since they no longer need different cases for 1, 2, and 3+ word
objects. They can instead just manipulate the heap bitmap for the
first word and be done with it.
In order to enable this, we change heapBitsSetType and runGCProg to
always set the scan/dead bit to scan for the first word on every code
path. Since these functions only apply to types that have pointers,
there's no need to do this conditionally: it's *always* necessary to
set the scan bit in the first word.
We also change every place that scans an object and checks if there
are more pointers. Rather than only checking morePointers if the word
is >= 2, we now check morePointers if word != 1 (since that's the
checkmark word).
Looking forward, we should probably reclaim the checkmark bit, too,
but that's going to be quite a bit more work.
Tested by setting doubleCheck in heapBitsSetType and running all.bash
on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.
This particularly improves the FmtFprintf* go1 benchmarks, since they
do a large amount of noscan allocation.
runtime: avoid conditional execution in morePointers and isPointer
heapBits.bits is carefully written to produce good machine code. Use
it in heapBits.morePointers and heapBits.isPointer to get good machine
code there, too.
Change-Id: Ia75945ef563956613bf88bbe57800a96455c265d
Reviewed-on: https://go-review.googlesource.com/22661 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Matthew Dempsky [Fri, 29 Apr 2016 22:56:57 +0000 (15:56 -0700)]
root: remove dev.garbage file
Change-Id: I99b2ca52824341d986090f5c78ab4f396594bcdf
Reviewed-on: https://go-review.googlesource.com/22660 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Wed, 27 Apr 2016 21:18:29 +0000 (14:18 -0700)]
cmd/cgo, runtime, runtime/cgo: use cgo context function
Add support for the context function set by runtime.SetCgoTraceback.
The context function was added in CL 17761, without support.
This CL is the support.
This CL has not been tested for real C code, as a working context
function for C code requires unwind support that does not seem to exist.
I wanted to get the CL out before the freeze.
I apologize for the length of this CL. It's mostly plumbing, but
unfortunately the plumbing is processor-specific.
Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90
Reviewed-on: https://go-review.googlesource.com/22508 Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Michael Munday [Mon, 18 Apr 2016 01:26:23 +0000 (21:26 -0400)]
crypto/cipher, crypto/aes: add s390x implementation of AES-CTR
This commit adds the new 'ctrAble' interface to the crypto/cipher
package. The role of ctrAble is the same as gcmAble but for CTR
instead of GCM. It allows block ciphers to provide optimized CTR
implementations.
The primary benefit of adding CTR support to the s390x AES
implementation is that it allows us to encrypt the counter values
in bulk, giving the cipher message instruction a larger chunk of
data to work on per invocation.
The xorBytes assembly is necessary because xorBytes becomes a
bottleneck when CTR is done in this way. Hopefully it will be
possible to remove this once s390x has migrated to the ssa
backend.
name old speed new speed delta
AESCTR1K 160MB/s ± 6% 867MB/s ± 0% +442.42% (p=0.000 n=9+10)
Change-Id: I1ae16b0ce0e2641d2bdc7d7eabc94dd35f6e9318
Reviewed-on: https://go-review.googlesource.com/22195 Reviewed-by: Adam Langley <agl@golang.org>
Michael Munday [Tue, 26 Apr 2016 01:46:02 +0000 (21:46 -0400)]
crypto/cipher, crypto/aes: add s390x implementation of AES-CBC
This commit adds the cbcEncAble and cbcDecAble interfaces that
can be implemented by block ciphers that support an optimized
implementation of CBC. This is similar to what is done for GCM
with the gcmAble interface.
The cbcEncAble, cbcDecAble and gcmAble interfaces all now have
tests to ensure they are detected correctly in the cipher
package.
name old speed new speed delta
AESCBCEncrypt1K 152MB/s ± 1% 1362MB/s ± 0% +795.59% (p=0.000 n=10+9)
AESCBCDecrypt1K 143MB/s ± 1% 1362MB/s ± 0% +853.00% (p=0.000 n=10+9)
Change-Id: I715f686ab3686b189a3dac02f86001178fa60580
Reviewed-on: https://go-review.googlesource.com/22523
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org>
Rick Hudson [Fri, 29 Apr 2016 17:49:18 +0000 (13:49 -0400)]
Merge remote-tracking branch 'origin/dev.garbage'
This commit moves the GC from free list allocation to
bit mark allocation. Instead of using the bitmaps
generated during the mark phases to generate free
list and then using the free lists for allocation we
allocate directly from the bitmaps.
The change in the garbage benchmark
name old time/op new time/op delta
XBenchGarbage-12 2.22ms ± 1% 2.13ms ± 1% -3.90% (p=0.000 n=18+18)
Rick Hudson [Fri, 29 Apr 2016 16:09:36 +0000 (12:09 -0400)]
[dev.garbage] runtime: simplify nextFreeFast so it is inlined
nextFreeFast is currently not inlined by the compiler due
to its size and complexity. This CL simplifies
nextFreeFast by letting the slow path handle (nextFree)
handle a corner cases.
sweep used to skip mcental.freeSpan (and its locking) if it didn't
find any new free objects. We lost that optimization when the
freed-object counting changed in dad83f7 to count total free objects
instead of newly freed objects.
The previous commit brings back counting of newly freed objects, so we
can easily revive this optimization by checking that count (like we
used to) instead of the total free objects count.
Commit 8dda1c4 changed the meaning of "nfree" in sweep from the number
of newly freed objects to the total number of free objects in the
span, but didn't update where sweep added nfree to c.local_nsmallfree.
Hence, we're over-accounting the number of frees. This is causing
TestArrayHash to fail with "too many allocs NNN - hash not balanced".
Fix this by computing the number of newly freed objects and adding
that to c.local_nsmallfree, so it behaves like it used to. Computing
this requires a small tweak to mallocgc: apparently we've never set
s.allocCount when allocating a large object; fix this by setting it to
1 so sweep doesn't get confused.
We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
when we removed the sweep over the mark bitmap. Fix it by
re-introducing the sweep over the bitmap specifically if we're in
allocfreetrace mode. This doesn't have to be even remotely efficient,
since the overhead of allocfreetrace is huge anyway, so we can keep
the code for this down to just a few lines.
Currently we always zero objects when we allocate them. We used to
have an optimization that would not zero objects that had not been
allocated since the whole span was last zeroed (either by getting it
from the system or by getting it from the heap, which does a bulk
zero), but this depended on the sweeper clobbering the first two words
of each object. Hence, we lost this optimization when the bitmap
sweeper went away.
Re-introduce this optimization using a different mechanism. Each span
already keeps a flag indicating that it just came from the OS or was
just bulk zeroed by the mheap. We can simply use this flag to know
when we don't need to zero an object. This is slightly less efficient
than the old optimization: if a span gets allocated and partially
used, then GC happens and the span gets returned to the mcentral, then
the span gets re-acquired, the old optimization knew that it only had
to re-zero the objects that had been reclaimed, whereas this
optimization will re-zero everything. However, in this case, you're
already paying for the garbage collection, and you've only wasted one
zeroing of the span, so in practice there seems to be little
difference. (If we did want to revive the full optimization, each span
could keep track of a frontier beyond which all free slots are zeroed.
I prototyped this and it didn't obvious do any better than the much
simpler approach in this commit.)
This significantly improves BinaryTree17, which is allocation-heavy
(and runs first, so most pages are already zeroed), and slightly
improves everything else.
name old time/op new time/op delta
XBenchGarbage-12 2.15ms ± 1% 2.14ms ± 1% -0.80% (p=0.000 n=17+17)
Nigel Tao [Fri, 29 Apr 2016 07:17:44 +0000 (17:17 +1000)]
compress/flate: use a constant hash table size for Best Speed.
This makes compress/flate's version of Snappy diverge from the upstream
golang/snappy version, but the latter has a goal of matching C++ snappy
output byte-for-byte. Both C++ and the asm version of golang/snappy can
use a smaller N for the O(N) zero-initialization of the hash table when
the input is small, even if the pure Go golang/snappy algorithm cannot:
"var table [tableSize]uint16" zeroes all tableSize elements.
For this package, we don't have the match-C++-snappy goal, so we can use
a different (constant) hash table size.
This is a small win, in terms of throughput and output size, but it also
enables us to re-use the (constant size) hash table between
encodeBestSpeed calls, avoiding the cost of zero-initializing the hash
table altogether. This will be implemented in follow-up commits.
Dave Cheney [Fri, 29 Apr 2016 04:17:04 +0000 (14:17 +1000)]
cmd/compile/internal/gc: bv.go cleanup
Drive by gardening of bv.go.
- Unexport the Bvec type, it is not used outside internal/gc.
(machine translated with gofmt -r)
- Removed unused constants and functions.
(driven by cmd/unused)
Nigel Tao [Tue, 26 Apr 2016 09:16:30 +0000 (19:16 +1000)]
compress/flate: replace "Best Speed" with specialized version
This encoding algorithm, which prioritizes speed over output size, is
based on Snappy's LZ77-style encoder: github.com/golang/snappy
This commit keeps the diff between this package's encodeBestSpeed
function and and Snappy's encodeBlock function as small as possible (see
the diff below). Follow-up commits will improve this package's
performance and output size.
Output size is larger. In the table below, the first column is the input
size, the second column is the output size prior to this commit, the
third column is the output size after this commit.
The diff between github.com/golang/snappy's encodeBlock function and
this commit's encodeBestSpeed function:
1c1,7
< func encodeBlock(dst, src []byte) (d int) {
---
> func encodeBestSpeed(dst []token, src []byte) []token {
> // This check isn't in the Snappy implementation, but there, the caller
> // instead of the callee handles this case.
> if len(src) < minNonLiteralBlockSize {
> return emitLiteral(dst, src)
> }
>
4c10
< // and len(src) <= maxBlockSize and maxBlockSize == 65536.
---
> // and len(src) <= maxStoreBlockSize and maxStoreBlockSize == 65535.
65c71
< if load32(src, s) == load32(src, candidate) {
---
> if s-candidate < maxOffset && load32(src, s) == load32(src, candidate) {
73c79
< d += emitLiteral(dst[d:], src[nextEmit:s])
---
> dst = emitLiteral(dst, src[nextEmit:s])
90c96
< // This is an inlined version of:
---
> // This is an inlined version of Snappy's:
93c99,103
< for i := candidate + 4; s < len(src) && src[i] == src[s]; i, s = i+1, s+1 {
---
> s1 := base + maxMatchLength
> if s1 > len(src) {
> s1 = len(src)
> }
> for i := candidate + 4; s < s1 && src[i] == src[s]; i, s = i+1, s+1 {
96c106,107
< d += emitCopy(dst[d:], base-candidate, s-base)
---
> // matchToken is flate's equivalent of Snappy's emitCopy.
> dst = append(dst, matchToken(uint32(s-base-3), uint32(base-candidate-minOffsetSize))) 114c125
< if uint32(x>>8) != load32(src, candidate) {
---
> if s-candidate >= maxOffset || uint32(x>>8) != load32(src, candidate) { 124c135
< d += emitLiteral(dst[d:], src[nextEmit:])
---
> dst = emitLiteral(dst, src[nextEmit:]) 126c137
< return d
---
> return dst
This change is based on https://go-review.googlesource.com/#/c/21021/ by
Klaus Post, but it is a separate changelist as cl/21021 seems to have
stalled in code review, and the Go 1.7 feature freeze approaches.
Golang-dev discussion:
https://groups.google.com/d/topic/golang-dev/XYgHX9p8IOk/discussion and
of course cl/21021.
[dev.garbage] runtime: use s.base() everywhere it makes sense
Currently we have lots of (s.start << _PageShift) and variants. We now
have an s.base() function that returns this. It's faster and more
readable, so use it.