crypto/elliptic: move P-256 amd64/arm64 assembly to nistec
The goal of this CL is to move the implementation to the new interface
with the least amount of changes possible. A follow-up CL will add
documentation and cleanup the assembly API.
* SetBytes does the element and point validity checks now, which were
previously implemented with big.Int.
* p256BaseMult would return (0:0:1) if the scalar was zero, which is
not a valid encoding of the point at infinity, but would get
flattened into (0,0) by p256PointToAffine. The rest of the code can
cope with any encoding with Z = 0, not just (t²:t³:0) with t != 0.
* CombinedMult was only avoiding the big.Int and affine conversion
overhead, which is now gone when operating entirely on nistec types,
so it can be implemented entirely in the crypto/elliptic wrapper,
and will automatically benefit all NIST curves.
* Scalar multiplication can't operate on arbitrarily sized scalars (it
was using big.Int to reduce them), which is fair enough. Changed the
nistec point interface to let ScalarMult and ScalarBaseMult reject
scalars. The crypto/elliptic wrapper still does the big.Int
reduction as needed.
The ppc64le/s390x assembly is disabled but retained to make review of
the change that will re-enable it easier.
Very small performance changes, which we will more then recoup when
crypto/ecdsa moves to invoking nistec directly.