runtime: decentralize sweep termination and mark transition
This moves all of GC initialization, sweep termination, and the
transition to concurrent marking in to the off->mark transition
function. This means it's now handled on the goroutine that detected
the state exit condition.
As a result, malloc no longer needs to Gosched() at the beginning of
the GC cycle to prevent over-allocation while the GC is starting up
because it will now *help* the GC to start up. The Gosched hack is
still necessary during GC shutdown (this is easy to test by enabling
gctrace and hitting Ctrl-S to block the gctrace output).
At this point, the GC coordinator still handles later phases. This
requires a small tweak to how we start the GC coordinator. Currently,
starting the GC coordinator is best-effort and may fail if the
coordinator is about to park from the previous cycle but hasn't yet.
We fix this by replacing the park/ready to wake up the coordinator
with a semaphore. This is temporary since the coordinator will be
going away in a few commits.