crypto/ecdh,crypto/internal/nistec: enable pruning of unused curves
If a program only uses ecdh.P256(), the implementation of the other
curves shouldn't end up in the binary. This mostly required moving some
operations from init() time. Small performance hit in uncompressed
Bytes/SetBytes, but not big enough to show up in higher-level
benchmarks. If it becomes a problem, we can fix it by pregenerating the
p-1 bytes representation in generate.go.