runtime: optimistically CAS atomicstatus directly in enter/exitsyscall
This change steals the performance trick from the coro implementation to
try to do the CAS directly first before calling into casgstatus, a much
more heavyweight function. We have to be careful about synctest
bubbling, but overall it's a good bit faster, and easy low-hanging
fruit.