On amd64 and 386, we have a very roundabout way of remembering that we
need to dropm on return that currently involves saving a zero to
needm's argument slot and later bringing it back. Just store the zero.
This also makes amd64 and 386 more consistent with cgocallback on all
other platforms: rather than saving the old M to the G stack, they now
save it to a named slot on the G0 stack.
The needm function no longer needs a dummy argument to get the SP, so
we drop that.