Moving the "don't really preempt" check up earlier in the function
introduced a race where gp.stackguard0 might change between
the early check and the later one. Since the later one is missing the
"don't really preempt" logic, it could decide to preempt incorrectly.
Pull the result of the check into a local variable and use an atomic
to access stackguard0, to eliminate the race.
I believe this will fix the broken OS X and Solaris builders.