There was a number of improvements related to GC parallelization:
1. Parallel roots/stacks scanning.
2. Parallel stack shrinking.
3. Per-thread workbuf caches.
4. Workset reduction.
Currently 32 threads work well.
go.benchmarks:garbage benchmark on 2 x Intel Xeon E5-2690 (16 HT cores)
1 thread/1 processor:
time=
16405255
cputime=
16386223
gc-pause-one=
546793975
gc-pause-total=
3280763
2 threads/1 processor:
time=
9043497
cputime=
18075822
gc-pause-one=
331116489
gc-pause-total=
2152257
4 threads/1 processor:
time=
4882030
cputime=
19421337
gc-pause-one=
174543105
gc-pause-total=
1134530
8 threads/1 processor:
time=
4134757
cputime=
20097075
gc-pause-one=
158680588
gc-pause-total=
1015555
16 threads/1 processor + HT:
time=
2006706
cputime=
31960509
gc-pause-one=
75425744
gc-pause-total=460097
16 threads/2 processors:
time=
1513373
cputime=
23805571
gc-pause-one=
56630946
gc-pause-total=345448
32 threads/2 processors + HT:
time=
1199312
cputime=
37592764
gc-pause-one=
48945064
gc-pause-total=278986
LGTM=rlh
R=golang-codereviews, tracey.brendan, rlh
CC=golang-codereviews, khr, rsc
https://golang.org/cl/
123920043
// Max number of threads to run garbage collection.
// 2, 3, and 4 are all plausible maximums depending
// on the hardware details of the machine. The garbage
- // collector scales well to 8 cpus.
- MaxGcproc = 8,
+ // collector scales well to 32 cpus.
+ MaxGcproc = 32,
};
// Maximum memory allocation size, a hint for callers.