runtime: faster parallel GC
Use per-thread work buffers instead of global mutex-protected pool. This eliminates contention from parallel scan phase.
benchmark old ns/op new ns/op delta
garbage.BenchmarkTree2-8
97100768 71417553 -26.45%
garbage.BenchmarkTree2LastPause-8
970931485 714103692 -26.45%
garbage.BenchmarkTree2Pause-8
469127802 345029253 -26.45%
garbage.BenchmarkParser-8
2880950854 2715456901 -5.74%
garbage.BenchmarkParserLastPause-8
137047399 103336476 -24.60%
garbage.BenchmarkParserPause-8
80686028 58922680 -26.97%
R=golang-dev, 0xe2.0x9a.0x9b, dave, adg, rsc, iant
CC=golang-dev
https://golang.org/cl/
7816044