Adding PCALIGN in indexbodyp9 function shows
improvements in some SimonWaldherr benchmarks and one of the index
benchmarks on both Power9 and Power10
name old time/op new time/op delta
Contains 19.8ns ± 0% 15.6ns ± 0% -21.24%
ContainsNot 21.3ns ± 0% 18.9ns ± 0% -11.03%
ContainsBytes 19.1ns ± 0% 16.0ns ± 0% -16.54%
Index/10 17.3ns ± 0% 16.1ns ± 0% -7.30%
Index/32 59.6ns ± 0% 59.6ns ± 0% +0.12%
Index/4K 3.68µs ± 0% 3.68µs ± 0% ~
Index/4M 3.74ms ± 0% 3.74ms ± 0% -0.00%
Index/64M 59.8ms ± 0% 59.8ms ± 0% ~
Change-Id: I784e57e0b0f5bac143f57f3a32845219e43d47fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/447595
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
BGT index2to16tail
MOVD $3, R17 // Number of bytes beyond 16
-
+ PCALIGN $32
index2to16loop:
LXVB16X (R7)(R0), V1 // Load next 16 bytes of string into V1 from R7
LXVB16X (R7)(R17), V5 // Load next 16 bytes of string into V5 from R7+3
MTVSRD R10, V8 // Set up shift
VSLDOI $8, V8, V8, V8
VSLO V1, V8, V1 // Shift by start byte
-
+ PCALIGN $32
index2to16next:
VAND V1, SEPMASK, V2 // Just compare size of sep
VCMPEQUBCC V0, V2, V3 // Compare sep and partial string