crypto/rsa: use R*R multiplication to get into the Montgomery domain
This is faster than the current code because computing RR involves
one more shiftIn and using it involves an extra multiplication, but each
exponentiation was doing montgomeryRepresentation twice, once for x and
once for 1, and now they share the RR precomputation.
More importantly, it allows precomputing the value and attaching it to
the private key in a future CL.