From 123f7dd3e1eb90825ece57b8dde39438ca34f150 Mon Sep 17 00:00:00 2001 From: Cherry Zhang Date: Tue, 11 Feb 2020 18:54:30 -0500 Subject: [PATCH] runtime: zero upper bit of Y registers in asyncPreempt on darwin/amd64 MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Apparently, the signal handling code path in darwin kernel leaves the upper bits of Y registers in a dirty state, which causes many SSE operations (128-bit and narrower) become much slower. Clear the upper bits to get to a clean state. We do it at the entry of asyncPreempt, which is immediately following exiting from the kernel's signal handling code, if we actually injected a call. It does not cover other exits where we don't inject a call, e.g. failed preemption, profiling signal, or other async signals. But it does cover an important use case of async signals, preempting a tight numerical loop, which we introduced in this cycle. Running the benchmark in issue #37174: name old time/op new time/op delta Fast-8 90.0ns ± 1% 46.8ns ± 3% -47.97% (p=0.000 n=10+10) Slow-8 188ns ± 5% 49ns ± 1% -73.82% (p=0.000 n=10+9) There is no more slowdown due to preemption signals. For #37174. Change-Id: I8b83d083fade1cabbda09b4bc25ccbadafaf7605 Reviewed-on: https://go-review.googlesource.com/c/go/+/219131 Run-TryBot: Cherry Zhang TryBot-Result: Gobot Gobot Reviewed-by: Keith Randall --- src/runtime/mkpreempt.go | 9 +++++++++ src/runtime/preempt_amd64.s | 3 +++ 2 files changed, 12 insertions(+) diff --git a/src/runtime/mkpreempt.go b/src/runtime/mkpreempt.go index 64e220772e..31b6f5cbac 100644 --- a/src/runtime/mkpreempt.go +++ b/src/runtime/mkpreempt.go @@ -244,6 +244,15 @@ func genAMD64() { // TODO: MXCSR register? + // Apparently, the signal handling code path in darwin kernel leaves + // the upper bits of Y registers in a dirty state, which causes + // many SSE operations (128-bit and narrower) become much slower. + // Clear the upper bits to get to a clean state. See issue #37174. + // It is safe here as Go code don't use the upper bits of Y registers. + p("#ifdef GOOS_darwin") + p("VZEROUPPER") + p("#endif") + p("PUSHQ BP") p("MOVQ SP, BP") p("// Save flags before clobbering them") diff --git a/src/runtime/preempt_amd64.s b/src/runtime/preempt_amd64.s index d50c2f3a51..0f2fd7d8dd 100644 --- a/src/runtime/preempt_amd64.s +++ b/src/runtime/preempt_amd64.s @@ -4,6 +4,9 @@ #include "textflag.h" TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0 + #ifdef GOOS_darwin + VZEROUPPER + #endif PUSHQ BP MOVQ SP, BP // Save flags before clobbering them -- 2.48.1