net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of
all goroutines, and searches for a specific line in that stack trace.
It currently sometimes fails because it encounters the goroutine its
looking for in the small window where a goroutine might be in _Grunning
while in a syscall, introduced in CL 646198. In that case, the traceback
will give up, failing to print the stack TestCopyError is expecting.
This represents a general regression, since previously runtime.Stack
could never fail to take a goroutine's stack; giving up was only
possible in fatal panic cases.
Fix this the same way we fixed goroutine profiles: allow the stack trace
to proceed if the g's syscallsp != 0. This is safe in any
stop-the-world-related context, because syscallsp won't be mutated while
the goroutine fails to acquire a P, and thus fails to fully exit the
syscall context. This also means the stack below syscallsp won't be
mutated, and thus taking a traceback is also safe.
Fixes #66639.
Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/716680
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
// from a signal handler initiated during a systemstack call.
// The original G is still in the running state, and we want to
// print its stack.
- if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning {
+ //
+ // There's a small window of time in exitsyscall where a goroutine could be
+ // in _Grunning as it's exiting a syscall. This could be the case even if the
+ // world is stopped or frozen.
+ //
+ // This is OK because the goroutine will not exit the syscall while the world
+ // is stopped or frozen. This is also why it's safe to check syscallsp here,
+ // and safe to take the goroutine's stack trace. The syscall path mutates
+ // syscallsp only just before exiting the syscall.
+ if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning && gp.syscallsp == 0 {
print("\tgoroutine running on other thread; stack unavailable\n")
printcreatedby(gp)
} else {