The current logic causes much more tracking than necessary, when really
_Grunning and _Gsyscall are both sort of "running" from the perspective
of tracking scheduling latency.
This makes cgo calls and syscalls a little faster in the single-threaded
case, and shows much larger improvement in the multi-threaded case
by removing updates of shared variables (though this parallel
microbenchmark is a little unrealistic, so don't ascribe too much weight
to it).
goos: linux
goarch: amd64
pkg: internal/runtime/cgobench
cpu: AMD EPYC 7B13
│ after.out │ after-2.out │
│ sec/op │ sec/op vs base │
CgoCall-64 35.83n ± 1% 34.69n ± 1% -3.20% (p=0.002 n=6)
CgoCallParallel-64 5.338n ± 1% 1.352n ± 4% -74.67% (p=0.002 n=6)
Change-Id: I2ea494dd5ebbbfb457373549986fbe2fbe318d45
Reviewed-on: https://go-review.googlesource.com/c/go/+/646275
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
})
}
- if oldval == _Grunning {
- // Track every gTrackingPeriod time a goroutine transitions out of running.
+ if (oldval == _Grunning || oldval == _Gsyscall) && (newval != _Grunning && newval != _Gsyscall) {
+ // Track every gTrackingPeriod time a goroutine transitions out of _Grunning or _Gsyscall.
+ // Do not track _Grunning <-> _Gsyscall transitions, since they're two very similar states.
if casgstatusAlwaysTrack || gp.trackingSeq%gTrackingPeriod == 0 {
gp.tracking = true
}