]>
Cypherpunks repositories - gostls13.git/commit
crypto/sha1: add sha-ni AMD64 implementation
Based on the Intel docs. Provides a ~44% speed-up compared to the AVX
implementation and a ~57% speed-up compared to the generic AMD64
assembly implementation.
│ /usr/local/google/home/bracewell/sha1-avx.bench │ /usr/local/google/home/bracewell/sha1-ni-stack.bench │
│ sec/op │ sec/op vs base │
Hash8Bytes/New-24 157.60n ± 0% 92.51n ± 0% -41.30% (p=0.000 n=20)
Hash8Bytes/Sum-24 147.00n ± 0% 85.06n ± 0% -42.14% (p=0.000 n=20)
Hash320Bytes/New-24 625.3n ± 0% 276.7n ± 0% -55.75% (p=0.000 n=20)
Hash320Bytes/Sum-24 626.2n ± 0% 272.4n ± 0% -56.51% (p=0.000 n=20)
Hash1K/New-24 1206.5n ± 0% 692.2n ± 0% -42.63% (p=0.000 n=20)
Hash1K/Sum-24 1210.0n ± 0% 688.2n ± 0% -43.13% (p=0.000 n=20)
Hash8K/New-24 7.744µ ± 0% 4.920µ ± 0% -36.46% (p=0.000 n=20)
Hash8K/Sum-24 7.737µ ± 0% 4.913µ ± 0% -36.50% (p=0.000 n=20)
geomean 971.5n 536.1n -44.81%
│ /usr/local/google/home/bracewell/sha1-avx.bench │ /usr/local/google/home/bracewell/sha1-ni-stack.bench │
│ B/s │ B/s vs base │
Hash8Bytes/New-24 48.41Mi ± 0% 82.47Mi ± 0% +70.37% (p=0.000 n=20)
Hash8Bytes/Sum-24 51.90Mi ± 0% 89.70Mi ± 0% +72.82% (p=0.000 n=20)
Hash320Bytes/New-24 488.0Mi ± 0% 1103.0Mi ± 0% +126.01% (p=0.000 n=20)
Hash320Bytes/Sum-24 487.4Mi ± 0% 1120.5Mi ± 0% +129.91% (p=0.000 n=20)
Hash1K/New-24 809.6Mi ± 0% 1410.8Mi ± 0% +74.26% (p=0.000 n=20)
Hash1K/Sum-24 806.9Mi ± 0% 1419.1Mi ± 0% +75.86% (p=0.000 n=20)
Hash8K/New-24 1008.9Mi ± 0% 1588.0Mi ± 0% +57.40% (p=0.000 n=20)
Hash8K/Sum-24 1009.8Mi ± 0% 1590.1Mi ± 0% +57.47% (p=0.000 n=20)
geomean 375.8Mi 680.9Mi +81.20%
│ /usr/local/google/home/bracewell/sha1-amd64.bench │ /usr/local/google/home/bracewell/sha1-ni-stack.bench │
│ sec/op │ sec/op vs base │
Hash8Bytes/New-24 153.90n ± 0% 92.51n ± 0% -39.89% (p=0.000 n=20)
Hash8Bytes/Sum-24 145.90n ± 0% 85.06n ± 0% -41.70% (p=0.000 n=20)
Hash320Bytes/New-24 666.8n ± 0% 276.7n ± 0% -58.50% (p=0.000 n=20)
Hash320Bytes/Sum-24 660.3n ± 0% 272.4n ± 0% -58.75% (p=0.000 n=20)
Hash1K/New-24 1810.5n ± 0% 692.2n ± 0% -61.77% (p=0.000 n=20)
Hash1K/Sum-24 1806.0n ± 0% 688.2n ± 0% -61.90% (p=0.000 n=20)
Hash8K/New-24 13.509µ ± 0% 4.920µ ± 0% -63.58% (p=0.000 n=20)
Hash8K/Sum-24 13.515µ ± 0% 4.913µ ± 0% -63.65% (p=0.000 n=20)
geomean 1.248µ 536.1n -57.05%
│ /usr/local/google/home/bracewell/sha1-amd64.bench │ /usr/local/google/home/bracewell/sha1-ni-stack.bench │
│ B/s │ B/s vs base │
Hash8Bytes/New-24 49.57Mi ± 0% 82.47Mi ± 0% +66.37% (p=0.000 n=20)
Hash8Bytes/Sum-24 52.29Mi ± 0% 89.70Mi ± 0% +71.52% (p=0.000 n=20)
Hash320Bytes/New-24 457.7Mi ± 0% 1103.0Mi ± 0% +140.97% (p=0.000 n=20)
Hash320Bytes/Sum-24 462.2Mi ± 0% 1120.5Mi ± 0% +142.45% (p=0.000 n=20)
Hash1K/New-24 539.4Mi ± 0% 1410.8Mi ± 0% +161.57% (p=0.000 n=20)
Hash1K/Sum-24 540.7Mi ± 0% 1419.1Mi ± 0% +162.44% (p=0.000 n=20)
Hash8K/New-24 578.4Mi ± 0% 1588.0Mi ± 0% +174.57% (p=0.000 n=20)
Hash8K/Sum-24 578.1Mi ± 0% 1590.1Mi ± 0% +175.07% (p=0.000 n=20)
geomean 292.4Mi 680.9Mi +132.86%
Change-Id: Ife90386ba410a80c2e6222c1fe4df2368c4e12b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/642157
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: Neal Patel <nealpatel@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>