The assumptions of some of the assembly functions were still scarcely
documented and even disregarded: p256ScalarMult was relying on the fact
that the "undefined behavior" of p256PointAddAsm with regards to
infinity inputs was returning the infinity.
Aside from expanding comments, moving the bit window massaging into a
more easily understood p256OrdRsh function, and fixing the above, this
change folds the last iteration of p256ScalarMult into the loop to
reduce special cases and inverts the iteration order of p256BaseMult so
it matches p256ScalarMult for ease of comparison.