math/big: add fast path for pure Go addVW for large z
In the normal case, only a few words have to be updated when adding a word to a vector.
When that happens, we can simply copy the rest of the words, which is much faster.
However, the overhead of that makes it prohibitive for small vectors,
so we check the size at the beginning.
The implementation is a bit weird to allow addVW to continued to be inlined; see #30548.
The AddVW benchmarks are surprising, but fully repeatable.
The SubVW benchmarks are more or less as expected.
I expect that removing the indirect function call will
help both and make them a bit more normal.