runtime: don't call lockOSThread for every cgo call
For a trivial benchmark with a do-nothing cgo call:
name old time/op new time/op delta
Call-4 64.5ns ± 7% 63.0ns ± 6% -2.25% (p=0.027 n=20+16)
Because Windows uses the cgocall mechanism to make system calls,
and passes arguments in a struct held in the m,
we need to do the lockOSThread/unlockOSThread in that code.
Because deferreturn was getting a nosplit stack overflow error,
change it to avoid calling typedmemmove.
Updates #21827.
Change-Id: I9b1d61434c44faeb29805b46b409c812c9acadc2
Reviewed-on: https://go-review.googlesource.com/64070
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Crawshaw <crawshaw@golang.org>