runtime: faster entersyscall, exitsyscall
Uses atomic memory accesses to avoid the need to acquire
and release schedlock on fast paths.
benchmark old ns/op new ns/op delta
runtime_test.BenchmarkSyscall 73 31 -56.63%
runtime_test.BenchmarkSyscall-2 538 74 -86.23%
runtime_test.BenchmarkSyscall-3 508 103 -79.72%
runtime_test.BenchmarkSyscall-4 721 97 -86.52%
runtime_test.BenchmarkSyscallWork 920 873 -5.11%
runtime_test.BenchmarkSyscallWork-2 516 481 -6.78%
runtime_test.BenchmarkSyscallWork-3 550 343 -37.64%
runtime_test.BenchmarkSyscallWork-4 632 263 -58.39%
(Intel Core i7 L640 2.13 GHz-based Lenovo X201s)
Reduced a less artificial server benchmark
from 11.5r 12.0u 8.0s to 8.3r 9.1u 1.0s.
R=dvyukov, r, bradfitz, r, iant, iant
CC=golang-dev
https://golang.org/cl/
4723042