Cypherpunks repositories - gostls13.git/commit

crypto/md5: improve ppc64x performance

This is mostly cleanup and simplification.  This removes
many unneeded register moves, loads, and bit twiddlings
which were holdovers from porting this from the amd64
version.

The updated code loads each block once per iteration
instead of once per round. Similarly, the logical
operations now match the original md5 specification.

Likewise, add extra sizes to the benchtest to give more
data points on how the implementation scales with input
size.

All in all, this is roughly a 20% improvement on ppc64le
code running on POWER9 (POWER8 is similar, but around
16%):

name                 old time/op    new time/op    delta
Hash8Bytes              297ns ± 0%     255ns ± 0%  -14.14%
Hash64                  527ns ± 0%     444ns ± 0%  -15.76%
Hash128                 771ns ± 0%     645ns ± 0%  -16.35%
Hash256                1.26µs ± 0%    1.05µs ± 0%  -16.68%
Hash512                2.23µs ± 0%    1.85µs ± 0%  -16.82%
Hash1K                 4.16µs ± 0%    3.46µs ± 0%  -16.83%
Hash8K                 31.2µs ± 0%    26.0µs ± 0%  -16.74%
Hash1M                 3.58ms ± 0%    2.98ms ± 0%  -16.74%
Hash8M                 26.1ms ± 0%    21.7ms ± 0%  -16.81%
Hash8BytesUnaligned     297ns ± 0%     255ns ± 0%  -14.08%
Hash1KUnaligned        4.16µs ± 0%    3.46µs ± 0%  -16.79%
Hash8KUnaligned        31.2µs ± 0%    26.0µs ± 0%  -16.78%

name                 old speed      new speed      delta
Hash8Bytes           26.9MB/s ± 0%  31.4MB/s ± 0%  +16.45%
Hash64                122MB/s ± 0%   144MB/s ± 0%  +18.69%
Hash128               166MB/s ± 0%   199MB/s ± 0%  +19.54%
Hash256               203MB/s ± 0%   244MB/s ± 0%  +20.01%
Hash512               230MB/s ± 0%   276MB/s ± 0%  +20.18%
Hash1K                246MB/s ± 0%   296MB/s ± 0%  +20.26%
Hash8K                263MB/s ± 0%   315MB/s ± 0%  +20.11%
Hash1M                293MB/s ± 0%   352MB/s ± 0%  +20.10%
Hash8M                321MB/s ± 0%   386MB/s ± 0%  +20.21%
Hash8BytesUnaligned  26.9MB/s ± 0%  31.4MB/s ± 0%  +16.41%
Hash1KUnaligned       246MB/s ± 0%   296MB/s ± 0%  +20.19%
Hash8KUnaligned       263MB/s ± 0%   315MB/s ± 0%  +20.15%

Change-Id: I269bfa6878966bb4f6a64dc349100f5dc453ab7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/300613
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>

author	Paul E. Murphy <murp@ibm.com>
	Wed, 10 Mar 2021 23:06:54 +0000 (17:06 -0600)
committer	Lynn Boger <laboger@linux.vnet.ibm.com>
	Mon, 15 Mar 2021 12:30:38 +0000 (12:30 +0000)
commit	4350e4961a6ea3d36a33271423735b37c96dd5bf
tree	6315d542cb50cdb176f4b73e098689fe9004d424	tree
parent	6ccb5c49cca52766f6d288d128db67be6392c579	commit \| diff

src/crypto/md5/md5_test.go		diff \| blob \| history
src/crypto/md5/md5block_ppc64x.s		diff \| blob \| history