Replace the 128-bit multiplication in 4 parts with bits.Mul64
and two single-width multiplications. This simplifies the code
and increases throughput by ~50% on amd64.
name old time/op new time/op delta
Fnv128KB-4 9.64µs ± 0% 6.09µs ± 0% -36.89% (p=0.016 n=4+5)
Fnv128aKB-4 9.11µs ± 0% 6.17µs ± 5% -32.32% (p=0.008 n=5+5)
name old speed new speed delta
Fnv128KB-4 106MB/s ± 0% 168MB/s ± 0% +58.44% (p=0.016 n=4+5)
Fnv128aKB-4 112MB/s ± 0% 166MB/s ± 5% +47.85% (p=0.008 n=5+5)
Change-Id: Id752f2a20ea3de23a41e08db89eecf2bb60b7e6d
Reviewed-on: https://go-review.googlesource.com/c/133936
Run-TryBot: Matt Layher <mdlayher@gmail.com>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matt Layher <mdlayher@gmail.com> Reviewed-by: Robert Griesemer <gri@golang.org>