hash/crc32: speedup crc32 of IEEE using slicingBy8
The Slicing-By-8 [1] algorithm has much performance improvements than
current approach. This patch only uses it for IEEE, which is the most
common case in practice.
There is the benchmark on Mac OS X 10.9:
benchmark old MB/s new MB/s speedup
BenchmarkIEEECrc1KB 349.40 353.03 1.01x
BenchmarkIEEECrc4KB 351.55 934.35 2.66x
BenchmarkCastagnoliCrc1KB 7037.58 7392.63 1.05x
This algorithm need 8K lookup table, so it's enabled only for block
larger than 4K.