This minimizes addi usage inside vector heavy loops. This
results in a small performance uptick on P9 ppc64le/linux.
Likewise, cleanup some minor whitespace issues around comments.
The implementation from crypto/sha256 is also shared with notsha256.
It is copied, but preserves notsha256's go:build directives. They are
otherwise identical now. Previously, bootstrap restrictions required
workarounds to support XXLOR on older toolchains. This is not needed
anymore as the minimum bootstrap (1.17) compiler will support XXLOR.