runtime: add a barrier after a new span is allocated
When copying a stack, we
1. allocate a new stack,
2. adjust pointers pointing to the old stack to pointing to the
new stack.
If the GC is running on another thread concurrently, on a machine
with weak memory model, the GC could observe the adjusted pointer
(e.g. through gp._defer which could be a special heap-to-stack
pointer), but not observe the publish of the new stack span. In
this case, the GC will see the adjusted pointer pointing to an
unallocated span, and throw. Fixing this by adding a publication
barrier between the allocation of the span and adjusting pointers.
One testcase for this is TestDeferHeapAndStack in long mode. It
fails reliably on linux-mips64le-mengzhuo builder without the fix,
and passes reliably after the fix.