From: Michael Anthony Knyszek Date: Mon, 13 Feb 2023 21:45:38 +0000 (+0000) Subject: runtime: bias the pacer's cons/mark smoothing against noise X-Git-Tag: go1.21rc1~1207 X-Git-Url: http://www.git.cypherpunks.su/?a=commitdiff_plain;h=b513bd808f9018da0609cffefaef451ea5c19a74;p=gostls13.git runtime: bias the pacer's cons/mark smoothing against noise Currently the pacer is designed to pace against the edge. Specifically, it tries to find the sweet spot at which there are zero assists, but simultaneously finishes each GC perfectly on time. This pretty much works, despite the noisiness of the measurement of the cons/mark ratio, which is central to the pacer's function. (And this noise is basically a given; the cons/mark ratio is used as a prediction under a steady-state assumption.) Typically, this means that the GC might assist a little bit more because it started the GC late, or it might execute more GC cycles because it started early. In many cases the magnitude of this variation is small. However, we can't possibly control for all sources of noise, especially since some noise can come from the underlying system. Furthermore, there are inputs to the measurement that have effectively no restrictions on how they vary, and the pacer needs to assume that they're essentially static when they might not be in some applications (i.e. goroutine stacks). The result of high noise is that the variation in when a GC starts is much higher, leading to a significant amount of assists in some GC cycles. While the GC cycle frequency basically averages out in the steady-state in the face of this variation, starting a GC late has the significant drawback of reducing application latencies. This CL thus biases the pacer toward avoiding assists by picking a cons/mark smoothing function that takes the maximum measured cons/mark over 5 cycles total. I picked 5 cycles because empirically this was the best trade-off between window size and smoothness for a uniformly distributed jitter in the cons/mark signal. The cost here is that if there's a significant phase change in the application that makes it less active with the GC, then we'll be using a stale cons/mark measurement for 5 cycles. I suspect this is fine precisely because this only happens when the application becomes less active, i.e. when latency matters less. Another good reason for this particular bias is that even though the GC might start earlier and end earlier on average, resulting in more frequent GC cycles and potentially worse throughput, it also means that it uses less memory used on average. As a result, there's a reasonable workaround in just turning GOGC up slightly to reduce GC cycle frequency and bringing memory (and hopefully throughput) levels back to the same baseline. Meanwhile, there should still be fewer assists than before which is just a clear improvement to latency. Lastly, this CL updates the GC pacer tests to capture this bias against assists and toward GC cycles starting earlier in the face of noise. Sweet benchmarks didn't show any meaningful difference, but real production applications showed a reduction in tail latencies of up to 45%. Updates #56966. Change-Id: I8f03d793f9a1c6e7ef3524d18294dbc0d7de6122 Reviewed-on: https://go-review.googlesource.com/c/go/+/467875 TryBot-Result: Gopher Robot Reviewed-by: Michael Pratt Run-TryBot: Michael Knyszek --- diff --git a/src/runtime/mgcpacer.go b/src/runtime/mgcpacer.go index d2921f4ed3..8b6ad4d66f 100644 --- a/src/runtime/mgcpacer.go +++ b/src/runtime/mgcpacer.go @@ -130,10 +130,10 @@ type gcControllerState struct { // Updated at the end of each GC cycle, in endCycle. consMark float64 - // lastConsMark is the computed cons/mark value for the previous GC - // cycle. Note that this is *not* the last value of cons/mark, but the - // actual computed value. See endCycle for details. - lastConsMark float64 + // lastConsMark is the computed cons/mark value for the previous 4 GC + // cycles. Note that this is *not* the last value of consMark, but the + // measured cons/mark value in endCycle. + lastConsMark [4]float64 // gcPercentHeapGoal is the goal heapLive for when next GC ends derived // from gcPercent. @@ -652,12 +652,19 @@ func (c *gcControllerState) endCycle(now int64, procs int, userForced bool) { currentConsMark := (float64(c.heapLive.Load()-c.triggered) * (utilization + idleUtilization)) / (float64(scanWork) * (1 - utilization)) - // Update our cons/mark estimate. This is the raw value above, but averaged over 2 GC cycles - // because it tends to be jittery, even in the steady-state. The smoothing helps the GC to - // maintain much more stable cycle-by-cycle behavior. + // Update our cons/mark estimate. This is the maximum of the value we just computed and the last + // 4 cons/mark values we measured. The reason we take the maximum here is to bias a noisy + // cons/mark measurement toward fewer assists at the expense of additional GC cycles (starting + // earlier). oldConsMark := c.consMark - c.consMark = (currentConsMark + c.lastConsMark) / 2 - c.lastConsMark = currentConsMark + c.consMark = currentConsMark + for i := range c.lastConsMark { + if c.lastConsMark[i] > c.consMark { + c.consMark = c.lastConsMark[i] + } + } + copy(c.lastConsMark[:], c.lastConsMark[1:]) + c.lastConsMark[len(c.lastConsMark)-1] = currentConsMark if debug.gcpacertrace > 0 { printlock() diff --git a/src/runtime/mgcpacer_test.go b/src/runtime/mgcpacer_test.go index e373e324a4..ac2a3fa56c 100644 --- a/src/runtime/mgcpacer_test.go +++ b/src/runtime/mgcpacer_test.go @@ -255,8 +255,8 @@ func TestGcPacer(t *testing.T) { // After the 12th GC, the heap will stop growing. Now, just make sure that: // 1. Utilization isn't varying _too_ much, and // 2. The pacer is mostly keeping up with the goal. - assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05) - assertInRange(t, "GC utilization", c[n-1].gcUtilization, 0.25, 0.3) + assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.025) + assertInRange(t, "GC utilization", c[n-1].gcUtilization, 0.25, 0.275) } }, },