From: Michael Pratt <mpratt@google.com>
Date: Wed, 31 Jan 2024 19:10:56 +0000 (-0500)
Subject: runtime: be careful about crash FD changes during panic
X-Git-Tag: go1.23rc1~1157
X-Git-Url: http://www.git.cypherpunks.su/?a=commitdiff_plain;h=638b9023e31b2e1d0d8ef24086722388afd76e41;p=gostls13.git

runtime: be careful about crash FD changes during panic

There are two separate cases here:

The base case is simple: a concurrent call to SetCrashOutput while
panicking will switch the crash FD, which could cause the first half of
writes to go to the old FD, and the second half to the new FD. This
isn't a correctness problem, but would be annoying to see in practice.
Since it is easy to check for, I simply drop any changes if panicking is
already in progress.

The second case is more important: SetCrashOutput will close the old FD
after the new FD is swapped, but writeErrData has no locking around use
of the fd, so SetCrashOutput could close the FD out from under
writeErrData, causing lost writes. We handle this similarly, by not
allowing SetCrashOutput to close the old FD if a panic is in progress,
but we have to be more careful about synchronization between
writeErrData and setCrashFD to ensure that writeErrData can't observe
the old FD while setCrashFD allows close.

For #42888.

Change-Id: I7270b2cc5ea58a15ba40145b7a96d557acdfe842
Reviewed-on: https://go-review.googlesource.com/c/go/+/559801
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
---

diff --git a/src/runtime/debug/stack.go b/src/runtime/debug/stack.go
index 7072d29c96..8dfea52d34 100644
--- a/src/runtime/debug/stack.go
+++ b/src/runtime/debug/stack.go
@@ -38,6 +38,8 @@ func Stack() []byte {
 // SetCrashOutput duplicates f's file descriptor, so the caller may safely
 // close f as soon as SetCrashOutput returns.
 // To disable this additional crash output, call SetCrashOutput(nil).
+// If called concurrently with a crash, some in-progress output may be written
+// to the old file even after an overriding SetCrashOutput returns.
 func SetCrashOutput(f *os.File) error {
 	fd := ^uintptr(0)
 	if f != nil {
diff --git a/src/runtime/runtime.go b/src/runtime/runtime.go
index c7a511b2a4..05a2098fcd 100644
--- a/src/runtime/runtime.go
+++ b/src/runtime/runtime.go
@@ -250,7 +250,42 @@ var crashFD atomic.Uintptr
 
 //go:linkname setCrashFD
 func setCrashFD(fd uintptr) uintptr {
-	return crashFD.Swap(fd)
+	// Don't change the crash FD if a crash is already in progress.
+	//
+	// Unlike the case below, this is not required for correctness, but it
+	// is generally nicer to have all of the crash output go to the same
+	// place rather than getting split across two different FDs.
+	if panicking.Load() > 0 {
+		return ^uintptr(0)
+	}
+
+	old := crashFD.Swap(fd)
+
+	// If we are panicking, don't return the old FD to runtime/debug for
+	// closing. writeErrData may have already read the old FD from crashFD
+	// before the swap and closing it would cause the write to be lost [1].
+	// The old FD will never be closed, but we are about to crash anyway.
+	//
+	// On the writeErrData thread, panicking.Add(1) happens-before
+	// crashFD.Load() [2].
+	//
+	// On this thread, swapping old FD for new in crashFD happens-before
+	// panicking.Load() > 0.
+	//
+	// Therefore, if panicking.Load() == 0 here (old FD will be closed), it
+	// is impossible for the writeErrData thread to observe
+	// crashFD.Load() == old FD.
+	//
+	// [1] Or, if really unlucky, another concurrent open could reuse the
+	// FD, sending the write into an unrelated file.
+	//
+	// [2] If gp != nil, it occurs when incrementing gp.m.dying in
+	// startpanic_m. If gp == nil, we read panicking.Load() > 0, so an Add
+	// must have happened-before.
+	if panicking.Load() > 0 {
+		return ^uintptr(0)
+	}
+	return old
 }
 
 // auxv is populated on relevant platforms but defined here for all platforms