From: Michael Anthony Knyszek <mknyszek@google.com>
Date: Mon, 13 Jan 2020 17:18:51 +0000 (+0000)
Subject: runtime: better approximate total cost of scavenging
X-Git-Tag: go1.14rc1~115
X-Git-Url: http://www.git.cypherpunks.su/?a=commitdiff_plain;h=71154e061f067a668e7b619d7b3701470b8561be;p=gostls13.git

runtime: better approximate total cost of scavenging

Currently, the scavenger is paced according to how long it takes to
scavenge one runtime page's worth of memory. However, this pacing
doesn't take into account the additional cost of actually using a
scavenged page. This operation, "sysUsed," is a counterpart to the
scavenging operation "sysUnused." On most systems this operation is a
no-op, but on some systems like Darwin and Windows we actually make a
syscall. Even on systems where it's a no-op, the cost is implicit: a
more expensive page fault when re-using the page.

On Darwin in particular the cost of "sysUnused" is fairly close to the
cost of "sysUsed", which skews the pacing to be too fast. A lot of
soon-to-be-allocated memory ends up scavenged, resulting in many more
expensive "sysUsed" operations, ultimately slowing down the application.

The way to fix this problem is to include the future cost of "sysUsed"
on a page in the scavenging cost. However, measuring the "sysUsed" cost
directly (like we do with "sysUnused") on most systems is infeasible
because we would have to measure the cost of the first access.

Instead, this change applies a multiplicative constant to the measured
scavenging time which is based on a per-system ratio of "sysUnused" to
"sysUsed" costs in the worst case (on systems where it's a no-op, we
measure the cost of the first access). This ultimately slows down the
scavenger to a more reasonable pace, limiting its impact on performance
but still retaining the memory footprint improvements from the previous
release.

Fixes #36507.

Change-Id: I050659cd8cdfa5a32f5cc0b56622716ea0fa5407
Reviewed-on: https://go-review.googlesource.com/c/go/+/214517
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
---

diff --git a/src/runtime/mgcscavenge.go b/src/runtime/mgcscavenge.go
index 24c5554b0b..c2625095f6 100644
--- a/src/runtime/mgcscavenge.go
+++ b/src/runtime/mgcscavenge.go
@@ -80,6 +80,17 @@ const (
 	// maxPagesPerPhysPage is the maximum number of supported runtime pages per
 	// physical page, based on maxPhysPageSize.
 	maxPagesPerPhysPage = maxPhysPageSize / pageSize
+
+	// scavengeCostRatio is the approximate ratio between the costs of using previously
+	// scavenged memory and scavenging memory.
+	//
+	// For most systems the cost of scavenging greatly outweighs the costs
+	// associated with using scavenged memory, making this constant 0. On other systems
+	// (especially ones where "sysUsed" is not just a no-op) this cost is non-trivial.
+	//
+	// This ratio is used as part of multiplicative factor to help the scavenger account
+	// for the additional costs of using scavenged memory in its pacing.
+	scavengeCostRatio = 0.7 * sys.GoosDarwin
 )
 
 // heapRetained returns an estimate of the current heap RSS.
@@ -246,7 +257,7 @@ func bgscavenge(c chan int) {
 		released := uintptr(0)
 
 		// Time in scavenging critical section.
-		crit := int64(0)
+		crit := float64(0)
 
 		// Run on the system stack since we grab the heap lock,
 		// and a stack growth with the heap lock means a deadlock.
@@ -265,7 +276,7 @@ func bgscavenge(c chan int) {
 			start := nanotime()
 			released = mheap_.pages.scavengeOne(physPageSize, false)
 			atomic.Xadduintptr(&mheap_.pages.scavReleased, released)
-			crit = nanotime() - start
+			crit = float64(nanotime() - start)
 		})
 
 		if released == 0 {
@@ -275,6 +286,14 @@ func bgscavenge(c chan int) {
 			continue
 		}
 
+		// Multiply the critical time by 1 + the ratio of the costs of using
+		// scavenged memory vs. scavenging memory. This forces us to pay down
+		// the cost of reusing this memory eagerly by sleeping for a longer period
+		// of time and scavenging less frequently. More concretely, we avoid situations
+		// where we end up scavenging so often that we hurt allocation performance
+		// because of the additional overheads of using scavenged memory.
+		crit *= 1 + scavengeCostRatio
+
 		// If we spent more than 10 ms (for example, if the OS scheduled us away, or someone
 		// put their machine to sleep) in the critical section, bound the time we use to
 		// calculate at 10 ms to avoid letting the sleep time get arbitrarily high.
@@ -290,13 +309,13 @@ func bgscavenge(c chan int) {
 		// much, then scavengeEMWA < idealFraction, so we'll adjust the sleep time
 		// down.
 		adjust := scavengeEWMA / idealFraction
-		sleepTime := int64(adjust * float64(crit) / (scavengePercent / 100.0))
+		sleepTime := int64(adjust * crit / (scavengePercent / 100.0))
 
 		// Go to sleep.
 		slept := scavengeSleep(sleepTime)
 
 		// Compute the new ratio.
-		fraction := float64(crit) / float64(crit+slept)
+		fraction := crit / (crit + float64(slept))
 
 		// Set a lower bound on the fraction.
 		// Due to OS-related anomalies we may "sleep" for an inordinate amount