crypto/cipher: speed up xor operations in CBC, CFB, OBF, CTR
and GCM on 386 and amd64
Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz:
benchmark old MB/s new MB/s speedup
BenchmarkAESGCMSeal1K 82.39 92.05 1.12x
BenchmarkAESGCMOpen1K 82.28 91.88 1.12x
BenchmarkAESCFBEncrypt1K 141.54 277.59 1.96x
BenchmarkAESCFBDecrypt1K 133.06 278.07 2.09x
BenchmarkAESOFB1K 160.51 380.24 2.37x
BenchmarkAESCTR1K 164.07 429.25 2.62x
BenchmarkAESCBCEncrypt1K 170.99 263.74 1.54x
BenchmarkAESCBCDecrypt1K 124.96 249.14 1.99x
Fixes #6741.
R=agl, dave, agl
CC=golang-dev
https://golang.org/cl/
24250044