hash/crc32: add AVX512 IEEE CRC32 calculation
Benchmark:
goos: windows
goarch: amd64
pkg: hash/crc32
cpu: AMD Ryzen 9 9950X 16-Core Processor
benchmark old MB/s new MB/s speedup
BenchmarkCRC32/poly=IEEE/size=15/align=0-32 1081.48 1089.42 1.01x
BenchmarkCRC32/poly=IEEE/size=15/align=1-32 1085.87 1082.61 1.00x
BenchmarkCRC32/poly=IEEE/size=40/align=0-32 2756.33 2752.37 1.00x
BenchmarkCRC32/poly=IEEE/size=40/align=1-32 2758.27 2756.99 1.00x
BenchmarkCRC32/poly=IEEE/size=512/align=0-32 18133.44 18076.52 1.00x
BenchmarkCRC32/poly=IEEE/size=512/align=1-32 18151.05 18055.41 0.99x
BenchmarkCRC32/poly=IEEE/size=1kB/align=0-32 19902.93 48581.07 2.44x
BenchmarkCRC32/poly=IEEE/size=1kB/align=1-32 19966.99 48393.25 2.42x
BenchmarkCRC32/poly=IEEE/size=4kB/align=0-32 21690.33 51679.25 2.38x
BenchmarkCRC32/poly=IEEE/size=4kB/align=1-32 21655.30 51731.22 2.39x
BenchmarkCRC32/poly=IEEE/size=32kB/align=0-32 22046.57 46406.90 2.10x
BenchmarkCRC32/poly=IEEE/size=32kB/align=1-32 21986.22 46250.66 2.10x
AVX512 are enabled above 1KB input size.
This rather high limit is due to AVX512 may be slower to ramp up
than the regular SSE4 implementation for smaller inputs.
This is not reflected in the benchmarks,
since consecutive calls means the CPU is "hot".
The 'HasAVX512VPCLMULQDQ' name mirrors the one in golang.org/x/sys/cpu
Change-Id: Id23685d8e3cc412b6d397a7d70056844bdb79271
Change-Id: Id23685d8e3cc412b6d397a7d70056844bdb79271
GitHub-Last-Rev:
6639f07b9febc7c96a7f3b402a2fd60f7be5e154
GitHub-Pull-Request: golang/go#74701
Reviewed-on: https://go-review.googlesource.com/c/go/+/689435
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>