]> Cypherpunks repositories - gostls13.git/commit
hash/crc32: improve the AMD64 implementation using SSE4.2
authorRadu Berinde <radu@cockroachlabs.com>
Sat, 27 Aug 2016 17:17:30 +0000 (13:17 -0400)
committerBrad Fitzpatrick <bradfitz@golang.org>
Sun, 28 Aug 2016 01:39:03 +0000 (01:39 +0000)
commit90c3cf4b52cc9373a96009da0d013019c1f5bcd8
treee2202155e9b9b55b4d3086840941f5290d99645b
parent320bd562cbb24a01beb02706c42d06a290160645
hash/crc32: improve the AMD64 implementation using SSE4.2

The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.

Additionally, we no longer initialize the software tables if SSE4.2 is
available.

Adding a test for the SSE implementation (restricted to amd64 and
amd64p32).

Benchmarks on a Haswell i5-4670 @ 3.4 GHz:

name                           old time/op    new time/op     delta
CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%

name                           old speed      new speed       delta
CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%

Fixes #16107.

Change-Id: Ibe50ea76574674ce0571ef31c31015e0ed66b907
Reviewed-on: https://go-review.googlesource.com/27931
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
src/hash/crc32/crc32.go
src/hash/crc32/crc32_amd64.go
src/hash/crc32/crc32_amd64.s
src/hash/crc32/crc32_amd64_test.go [new file with mode: 0644]
src/hash/crc32/crc32_amd64p32.go
src/hash/crc32/crc32_generic.go
src/hash/crc32/crc32_s390x.go
src/hash/crc32/crc32_test.go