]>
Cypherpunks repositories - gostls13.git/commit
hash/crc32: optimize the loong64 crc32 implementation
Make use of the newly added LA64 CRC32 instructions to accelerate
computation of CRC32 with IEEE and Castagnoli polynomials.
Benchmarks:
goos: linux
goarch: loong64
pkg: hash/crc32
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
CRC32/poly=IEEE/size=15/align=0 63.35n ± 0% 15.80n ± 0% -75.06% (p=0.000 n=20)
CRC32/poly=IEEE/size=15/align=1 63.35n ± 0% 16.42n ± 0% -74.08% (p=0.000 n=20)
CRC32/poly=IEEE/size=40/align=0 65.40n ± 0% 19.22n ± 0% -70.61% (p=0.000 n=20)
CRC32/poly=IEEE/size=40/align=1 65.40n ± 0% 19.23n ± 0% -70.60% (p=0.000 n=20)
CRC32/poly=IEEE/size=512/align=0 407.30n ± 0% 66.86n ± 0% -83.58% (p=0.000 n=20)
CRC32/poly=IEEE/size=512/align=1 407.30n ± 0% 66.86n ± 0% -83.58% (p=0.000 n=20)
CRC32/poly=IEEE/size=1kB/align=0 778.2n ± 0% 118.1n ± 0% -84.82% (p=0.000 n=20)
CRC32/poly=IEEE/size=1kB/align=1 778.2n ± 0% 118.1n ± 0% -84.82% (p=0.000 n=20)
CRC32/poly=IEEE/size=4kB/align=0 3004.0n ± 0% 425.6n ± 0% -85.83% (p=0.000 n=20)
CRC32/poly=IEEE/size=4kB/align=1 3004.0n ± 0% 425.6n ± 0% -85.83% (p=0.000 n=20)
CRC32/poly=IEEE/size=32kB/align=0 23.775µ ± 0% 3.305µ ± 0% -86.10% (p=0.000 n=20)
CRC32/poly=IEEE/size=32kB/align=1 23.774µ ± 0% 3.305µ ± 0% -86.10% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=15/align=0 63.58n ± 0% 15.28n ± 0% -75.97% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=15/align=1 63.58n ± 0% 16.95n ± 0% -73.34% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=40/align=0 65.29n ± 0% 17.04n ± 0% -73.90% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=40/align=1 65.29n ± 0% 19.05n ± 0% -70.83% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=512/align=0 407.20n ± 0% 55.06n ± 0% -86.48% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=512/align=1 407.20n ± 0% 56.44n ± 0% -86.14% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=1kB/align=0 778.10n ± 0% 95.08n ± 0% -87.78% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=1kB/align=1 778.10n ± 0% 97.72n ± 0% -87.44% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=4kB/align=0 3004.0n ± 0% 338.5n ± 0% -88.73% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=4kB/align=1 3004.0n ± 0% 341.1n ± 0% -88.64% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=32kB/align=0 23.775µ ± 0% 2.623µ ± 0% -88.97% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=32kB/align=1 23.775µ ± 0% 2.896µ ± 0% -87.82% (p=0.000 n=20)
CRC32/poly=Koopman/size=15/align=0 63.11n ± 0% 63.11n ± 0% ~ (p=0.737 n=20)
CRC32/poly=Koopman/size=15/align=1 63.11n ± 0% 63.11n ± 0% ~ (p=1.000 n=20)
CRC32/poly=Koopman/size=40/align=0 153.2n ± 0% 153.2n ± 0% ~ (p=1.000 n=20)
CRC32/poly=Koopman/size=40/align=1 153.2n ± 0% 153.2n ± 0% ~ (p=0.737 n=20)
CRC32/poly=Koopman/size=512/align=0 1.854µ ± 0% 1.854µ ± 0% ~ (p=1.000 n=20)
CRC32/poly=Koopman/size=512/align=1 1.854µ ± 0% 1.854µ ± 0% ~ (p=0.737 n=20)
CRC32/poly=Koopman/size=1kB/align=0 3.699µ ± 0% 3.699µ ± 0% ~ (p=1.000 n=20)
CRC32/poly=Koopman/size=1kB/align=1 3.699µ ± 0% 3.699µ ± 0% ~ (p=1.000 n=20)
CRC32/poly=Koopman/size=4kB/align=0 14.77µ ± 0% 14.77µ ± 0% ~ (p=0.495 n=20)
CRC32/poly=Koopman/size=4kB/align=1 14.77µ ± 0% 14.77µ ± 0% ~ (p=0.704 n=20)
CRC32/poly=Koopman/size=32kB/align=0 118.1µ ± 0% 118.1µ ± 0% ~ (p=0.057 n=20)
CRC32/poly=Koopman/size=32kB/align=1 118.1µ ± 0% 118.1µ ± 0% ~ (p=0.493 n=20)
geomean 1.001µ 306.8n -69.35%
goos: linux
goarch: loong64
pkg: hash/crc32
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
CRC32/poly=IEEE/size=15/align=0 75.70n ± 1% 47.04n ± 1% -37.86% (p=0.000 n=20)
CRC32/poly=IEEE/size=15/align=1 75.70n ± 1% 46.64n ± 1% -38.39% (p=0.000 n=20)
CRC32/poly=IEEE/size=40/align=0 89.26n ± 0% 65.49n ± 0% -26.63% (p=0.000 n=20)
CRC32/poly=IEEE/size=40/align=1 89.09n ± 0% 72.55n ± 1% -18.56% (p=0.000 n=20)
CRC32/poly=IEEE/size=512/align=0 621.0n ± 0% 513.5n ± 0% -17.31% (p=0.000 n=20)
CRC32/poly=IEEE/size=512/align=1 621.0n ± 0% 521.9n ± 0% -15.96% (p=0.000 n=20)
CRC32/poly=IEEE/size=1kB/align=0 1.204µ ± 0% 1.001µ ± 0% -16.86% (p=0.000 n=20)
CRC32/poly=IEEE/size=1kB/align=1 1.205µ ± 0% 1.009µ ± 0% -16.27% (p=0.000 n=20)
CRC32/poly=IEEE/size=4kB/align=0 4.665µ ± 0% 3.923µ ± 0% -15.91% (p=0.000 n=20)
CRC32/poly=IEEE/size=4kB/align=1 4.665µ ± 0% 3.931µ ± 0% -15.73% (p=0.000 n=20)
CRC32/poly=IEEE/size=32kB/align=0 36.97µ ± 0% 31.20µ ± 0% -15.60% (p=0.000 n=20)
CRC32/poly=IEEE/size=32kB/align=1 36.96µ ± 0% 31.21µ ± 0% -15.57% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=15/align=0 75.72n ± 1% 48.07n ± 1% -36.52% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=15/align=1 75.70n ± 1% 46.99n ± 2% -37.93% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=40/align=0 87.91n ± 0% 64.89n ± 0% -26.19% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=40/align=1 87.91n ± 0% 72.12n ± 1% -17.97% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=512/align=0 619.8n ± 0% 514.3n ± 0% -17.02% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=512/align=1 619.8n ± 0% 521.7n ± 0% -15.83% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=1kB/align=0 1.202µ ± 0% 1.001µ ± 0% -16.72% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=1kB/align=1 1.202µ ± 0% 1.009µ ± 0% -16.06% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=4kB/align=0 4.663µ ± 0% 3.924µ ± 0% -15.85% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=4kB/align=1 4.663µ ± 0% 3.931µ ± 0% -15.70% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=32kB/align=0 36.96µ ± 0% 31.20µ ± 0% -15.60% (p=0.000 n=20)
CRC32/poly=Castagnoli/size=32kB/align=1 36.96µ ± 0% 31.21µ ± 0% -15.57% (p=0.000 n=20)
CRC32/poly=Koopman/size=15/align=0 74.91n ± 1% 74.95n ± 1% ~ (p=0.963 n=20)
CRC32/poly=Koopman/size=15/align=1 74.91n ± 1% 75.02n ± 1% ~ (p=0.909 n=20)
CRC32/poly=Koopman/size=40/align=0 165.0n ± 0% 165.0n ± 0% ~ (p=0.865 n=20)
CRC32/poly=Koopman/size=40/align=1 165.1n ± 0% 165.0n ± 0% ~ (p=0.342 n=20)
CRC32/poly=Koopman/size=512/align=0 1.867µ ± 0% 1.867µ ± 0% ~ (p=0.320 n=20)
CRC32/poly=Koopman/size=512/align=1 1.867µ ± 0% 1.867µ ± 0% ~ (p=0.782 n=20)
CRC32/poly=Koopman/size=1kB/align=0 3.712µ ± 0% 3.712µ ± 0% ~ (p=0.859 n=20)
CRC32/poly=Koopman/size=1kB/align=1 3.712µ ± 0% 3.713µ ± 0% ~ (p=0.175 n=20)
CRC32/poly=Koopman/size=4kB/align=0 14.79µ ± 0% 14.79µ ± 0% ~ (p=0.826 n=20)
CRC32/poly=Koopman/size=4kB/align=1 14.79µ ± 0% 14.79µ ± 0% ~ (p=0.169 n=20)
CRC32/poly=Koopman/size=32kB/align=0 118.1µ ± 0% 118.1µ ± 0% ~ (p=0.941 n=20)
CRC32/poly=Koopman/size=32kB/align=1 118.1µ ± 0% 118.1µ ± 0% ~ (p=0.473 n=20)
geomean 1.299µ 1.109µ -14.68%
Performance of poly=Koopman is not affected.
This patch is a copy of CL 478596.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: I345192cdf693f21fe1015a8b8361ca68ac780c9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/624355
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>