runtime: fix race atomic operations on external memory
The assembly is broken: it does `MOVQ g(R12), R14` expecting that
R12 contains tls address, but it does not do get_tls(R12) before.
This magically works on linux: `MOVQ g(R12), R14` is compiled to
`mov %fs:0xfffffffffffffff8,%r14` which does not use R12.
But it crashes on windows.
Add explicit `get_tls(R12)`.
Fixes #16206
Change-Id: Ic1f21a6fef2473bcf9147de6646929781c9c1e98
Reviewed-on: https://go-review.googlesource.com/24590 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>