Adds the runtime initialization flow for arm akin to amd64.
In particular,we use the library initialization entry point to:
- create a new OS thread and run the "regular" runtime init stack on
that thread
- return immediately from the main (i.e., loader) thread
- at the first CGO invocation, we wait for the runtime initialization
to complete.
Verified to work on a Raspberry Pi and an Android phone.