]> Cypherpunks repositories - gostls13.git/commit
hash/crc32: improve asm for ppc64SlicingUpdateBy8
authorJayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Mon, 29 Apr 2024 17:37:27 +0000 (12:37 -0500)
committerLynn Boger <laboger@linux.vnet.ibm.com>
Mon, 6 May 2024 12:09:50 +0000 (12:09 +0000)
commitac174400f460e9b577079e8606439e0bae62adb0
tree4eb1ceb69e41760f9cbd6d3c9f51efc4a30b3c92
parentff0bc4669e00b590df4f185e417ed6dc1818e566
 hash/crc32: improve asm for ppc64SlicingUpdateBy8

Improvements are made in the assembler code which improves  time and
space by 9-10%.
1. ANDCC, followed by SLD is combined and replaced by CLRLSLDI.
2. MOVWZ can use an indexed load and eliminate an ADD instruction in some cases.
Example: ADD R7,R10,R7 followed by MOVWZ 0(R7),R5 can be replaced with just MOVWZ (R7)(R10),R5.
3. Optimizations for the block after the "short" label includes the same MOVWZ use of indexed load, as well as other improvements.

The gain from code  changes can be seen as follows, generated by
benchstat:

goos: linux
goarch: ppc64le
pkg: hash/crc32
cpu: POWER10
                                     |  oldCrc.out   |  newCrc.out                       |
                                     |    sec/op     |   sec/op     vs base              |
CRC32/poly=IEEE/size=15/align=0-12      50.19n ±  1%   39.85n ± 0%  -20.59% (p=0.002 n=6)
CRC32/poly=IEEE/size=15/align=1-12      50.18n ±  1%   39.87n ± 0%  -20.54% (p=0.002 n=6)
CRC32/poly=IEEE/size=40/align=0-12      40.25n ±  0%   36.95n ± 0%   -8.19% (p=0.002 n=6)
CRC32/poly=IEEE/size=40/align=1-12      40.31n ±  0%   36.95n ± 0%   -8.36% (p=0.002 n=6)
CRC32/poly=IEEE/size=512/align=0-12     38.03n ±  0%   38.17n ± 0%   +0.37% (p=0.002 n=6)
CRC32/poly=IEEE/size=512/align=1-12     89.19n ±  1%   73.65n ± 0%  -17.43% (p=0.002 n=6)
CRC32/poly=IEEE/size=1kB/align=0-12     50.73n ±  7%   50.14n ± 0%   -1.18% (p=0.002 n=6)
CRC32/poly=IEEE/size=1kB/align=1-12    101.00n ± 37%   81.58n ± 0%  -19.23% (p=0.002 n=6)
CRC32/poly=IEEE/size=4kB/align=0-12     98.30n ± 45%   93.05n ± 0%   -5.34% (p=0.043 n=6)
CRC32/poly=IEEE/size=4kB/align=1-12     140.8n ±  0%   125.8n ± 0%  -10.65% (p=0.002 n=6)
CRC32/poly=IEEE/size=32kB/align=0-12    525.8n ±  0%   528.5n ± 0%   +0.52% (p=0.011 n=6)
CRC32/poly=IEEE/size=32kB/align=1-12    584.4n ±  1%   576.3n ± 0%   -1.39% (p=0.002 n=6)
geomean                                 90.51n         81.74n        -9.69%

                             |    oldCrc.out |    newCrc.out              |
                                     |      B/s      |     B/s       vs base       |
CRC32/poly=IEEE/size=15/align=0-12     285.0Mi ±  1%    359.0Mi ± 0%  +25.94% (p=0.002 n=6)
CRC32/poly=IEEE/size=15/align=1-12     285.1Mi ±  1%    358.8Mi ± 0%  +25.86% (p=0.002 n=6)
CRC32/poly=IEEE/size=40/align=0-12     947.8Mi ±  0%   1032.3Mi ± 0%   +8.91% (p=0.002 n=6)
CRC32/poly=IEEE/size=40/align=1-12     946.2Mi ±  0%   1032.5Mi ± 0%   +9.12% (p=0.002 n=6)
CRC32/poly=IEEE/size=512/align=0-12    12.54Gi ±  0%    12.49Gi ± 0%   -0.37% (p=0.002 n=6)
CRC32/poly=IEEE/size=512/align=1-12    5.346Gi ±  1%    6.475Gi ± 0%  +21.12% (p=0.002 n=6)
CRC32/poly=IEEE/size=1kB/align=0-12    18.80Gi ±  7%    19.02Gi ± 0%   +1.20% (p=0.002 n=6)
CRC32/poly=IEEE/size=1kB/align=1-12    9.454Gi ± 27%   11.690Gi ± 0%  +23.66% (p=0.002 n=6)
CRC32/poly=IEEE/size=4kB/align=0-12    38.86Gi ± 31%    41.00Gi ± 0%   +5.49% (p=0.041 n=6)
CRC32/poly=IEEE/size=4kB/align=1-12    27.10Gi ±  0%    30.32Gi ± 0%  +11.89% (p=0.002 n=6)
CRC32/poly=IEEE/size=32kB/align=0-12   58.05Gi ±  0%    57.74Gi ± 0%   -0.53% (p=0.009 n=6)
CRC32/poly=IEEE/size=32kB/align=1-12   52.22Gi ±  1%    52.95Gi ± 0%   +1.41% (p=0.002 n=6)
geomean                                6.074Gi          6.724Gi       +10.70%

Change-Id: I378c0e84e798656384a8009f4ac48b51614489b2
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/582395
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
src/hash/crc32/crc32_ppc64le.s