runtime: fix uint64 division on 386
The uint64 divide function calls _mul64x32 to do a 64x32-bit multiply
and then compares the result against the 64-bit numerator.
If the result is bigger than the numerator, must use the slow path.
Unfortunately, the 64x32 produces a 96-bit product, and only the
low 64 bits were being used in the comparison. Return all 96 bits,
the bottom 64 via the original uint64* pointer, and the top 32
as the function's return value.
Fixes 386 build (broken by ARM division tests).
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/
13722044