From: Michael Anthony Knyszek <mknyszek@google.com>
Date: Mon, 13 Feb 2023 21:45:38 +0000 (+0000)
Subject: runtime: bias the pacer's cons/mark smoothing against noise
X-Git-Tag: go1.21rc1~1207
X-Git-Url: http://www.git.cypherpunks.su/?a=commitdiff_plain;h=b513bd808f9018da0609cffefaef451ea5c19a74;p=gostls13.git

runtime: bias the pacer's cons/mark smoothing against noise

Currently the pacer is designed to pace against the edge. Specifically,
it tries to find the sweet spot at which there are zero assists, but
simultaneously finishes each GC perfectly on time.

This pretty much works, despite the noisiness of the measurement of the
cons/mark ratio, which is central to the pacer's function. (And this
noise is basically a given; the cons/mark ratio is used as a prediction
under a steady-state assumption.) Typically, this means that the GC
might assist a little bit more because it started the GC late, or it
might execute more GC cycles because it started early. In many cases the
magnitude of this variation is small.

However, we can't possibly control for all sources of noise, especially
since some noise can come from the underlying system. Furthermore, there
are inputs to the measurement that have effectively no restrictions on
how they vary, and the pacer needs to assume that they're essentially
static when they might not be in some applications (i.e. goroutine
stacks).

The result of high noise is that the variation in when a GC starts is
much higher, leading to a significant amount of assists in some GC
cycles. While the GC cycle frequency basically averages out in the
steady-state in the face of this variation, starting a GC late has the
significant drawback of reducing application latencies.

This CL thus biases the pacer toward avoiding assists by picking a
cons/mark smoothing function that takes the maximum measured cons/mark
over 5 cycles total. I picked 5 cycles because empirically this was the
best trade-off between window size and smoothness for a uniformly
distributed jitter in the cons/mark signal. The cost here is that if
there's a significant phase change in the application that makes it less
active with the GC, then we'll be using a stale cons/mark measurement
for 5 cycles. I suspect this is fine precisely because this only happens
when the application becomes less active, i.e. when latency matters
less.

Another good reason for this particular bias is that even though the GC
might start earlier and end earlier on average, resulting in more
frequent GC cycles and potentially worse throughput, it also means that
it uses less memory used on average. As a result, there's a reasonable
workaround in just turning GOGC up slightly to reduce GC cycle
frequency and bringing memory (and hopefully throughput) levels back to
the same baseline. Meanwhile, there should still be fewer assists than
before which is just a clear improvement to latency.

Lastly, this CL updates the GC pacer tests to capture this bias against
assists and toward GC cycles starting earlier in the face of noise.

Sweet benchmarks didn't show any meaningful difference, but real
production applications showed a reduction in tail latencies of up
to 45%.

Updates #56966.

Change-Id: I8f03d793f9a1c6e7ef3524d18294dbc0d7de6122
Reviewed-on: https://go-review.googlesource.com/c/go/+/467875
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
---

diff --git a/src/runtime/mgcpacer.go b/src/runtime/mgcpacer.go
index d2921f4ed3..8b6ad4d66f 100644
--- a/src/runtime/mgcpacer.go
+++ b/src/runtime/mgcpacer.go
@@ -130,10 +130,10 @@ type gcControllerState struct {
 	// Updated at the end of each GC cycle, in endCycle.
 	consMark float64
 
-	// lastConsMark is the computed cons/mark value for the previous GC
-	// cycle. Note that this is *not* the last value of cons/mark, but the
-	// actual computed value. See endCycle for details.
-	lastConsMark float64
+	// lastConsMark is the computed cons/mark value for the previous 4 GC
+	// cycles. Note that this is *not* the last value of consMark, but the
+	// measured cons/mark value in endCycle.
+	lastConsMark [4]float64
 
 	// gcPercentHeapGoal is the goal heapLive for when next GC ends derived
 	// from gcPercent.
@@ -652,12 +652,19 @@ func (c *gcControllerState) endCycle(now int64, procs int, userForced bool) {
 	currentConsMark := (float64(c.heapLive.Load()-c.triggered) * (utilization + idleUtilization)) /
 		(float64(scanWork) * (1 - utilization))
 
-	// Update our cons/mark estimate. This is the raw value above, but averaged over 2 GC cycles
-	// because it tends to be jittery, even in the steady-state. The smoothing helps the GC to
-	// maintain much more stable cycle-by-cycle behavior.
+	// Update our cons/mark estimate. This is the maximum of the value we just computed and the last
+	// 4 cons/mark values we measured. The reason we take the maximum here is to bias a noisy
+	// cons/mark measurement toward fewer assists at the expense of additional GC cycles (starting
+	// earlier).
 	oldConsMark := c.consMark
-	c.consMark = (currentConsMark + c.lastConsMark) / 2
-	c.lastConsMark = currentConsMark
+	c.consMark = currentConsMark
+	for i := range c.lastConsMark {
+		if c.lastConsMark[i] > c.consMark {
+			c.consMark = c.lastConsMark[i]
+		}
+	}
+	copy(c.lastConsMark[:], c.lastConsMark[1:])
+	c.lastConsMark[len(c.lastConsMark)-1] = currentConsMark
 
 	if debug.gcpacertrace > 0 {
 		printlock()
diff --git a/src/runtime/mgcpacer_test.go b/src/runtime/mgcpacer_test.go
index e373e324a4..ac2a3fa56c 100644
--- a/src/runtime/mgcpacer_test.go
+++ b/src/runtime/mgcpacer_test.go
@@ -255,8 +255,8 @@ func TestGcPacer(t *testing.T) {
 					// After the 12th GC, the heap will stop growing. Now, just make sure that:
 					// 1. Utilization isn't varying _too_ much, and
 					// 2. The pacer is mostly keeping up with the goal.
-					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
-					assertInRange(t, "GC utilization", c[n-1].gcUtilization, 0.25, 0.3)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.025)
+					assertInRange(t, "GC utilization", c[n-1].gcUtilization, 0.25, 0.275)
 				}
 			},
 		},