]> Cypherpunks repositories - gostls13.git/commit
runtime: improve memclr on ppc64x/power10
authorArchana R <aravind5@in.ibm.com>
Mon, 13 Feb 2023 12:18:59 +0000 (06:18 -0600)
committerLynn Boger <laboger@linux.vnet.ibm.com>
Wed, 22 Feb 2023 18:53:31 +0000 (18:53 +0000)
commite6faa375b4804588cdeb67eff9d01add8093f6e1
treeb413cbae4de0d92ccd5618e881be7ac723cbf862
parentcad872f75a1946ea49397a33e269dabe56febff6
runtime: improve memclr on ppc64x/power10

Rewrite memclr asm function to use the new power10 instruction stxvl
or the store vector with length which can specify the number of bytes
to be stored in a register, thereby avoiding loops to store the tail
end bytes.
On power9 and power8 the code remains unchanged.
The performance for all sizes<16 improve on power10 with this change.

name          old time/op    new time/op     delta
Memclr/1        2.59ns ± 1%     2.80ns ± 9%      ~
Memclr/2        2.77ns ± 0%     2.90ns ± 8%      ~
Memclr/3        3.70ns ± 1%     3.00ns ± 1%   -19.02%
Memclr/4        4.56ns ± 5%     2.98ns ± 1%   -34.56%
Memclr/5        10.1ns ± 8%      2.9ns ± 4%   -70.82%
Memclr/6        6.99ns ± 4%     2.84ns ± 9%   -59.40%
Memclr/7        8.66ns ± 5%     2.92ns ± 4%   -66.27%
Memclr/8        2.75ns ± 1%     2.75ns ± 0%      ~
Memclr/9        3.43ns ± 1%     2.75ns ± 0%   -19.78%
Memclr/10       3.30ns ± 0%     2.77ns ± 0%   -15.81%
Memclr/12       5.09ns ± 0%     2.78ns ± 0%   -45.49%
Memclr/15       9.79ns ± 0%     2.78ns ± 0%   -71.64%
Memclr/16       2.65ns ± 1%     2.77ns ± 0%    +4.77%
GoMemclr/1      2.46ns ± 3%     2.46ns ± 3%      ~
GoMemclr/2      2.91ns ± 0%     2.46ns ± 1%   -15.38%
GoMemclr/3      4.85ns ± 6%     2.45ns ± 1%   -49.50%
GoMemclr/4      5.44ns ± 5%     2.47ns ± 1%   -54.63%
GoMemclr/5      6.48ns ± 0%     2.45ns ± 1%   -62.20%
GoMemclr/6      9.33ns ±20%     2.45ns ± 1%   -73.76%
GoMemclr/7      8.79ns ± 2%     2.46ns ± 1%   -72.03%
GoMemclr/8      2.91ns ± 0%     2.84ns ± 0%    -2.35%
GoMemclr/9      5.04ns ± 0%     2.84ns ± 0%   -43.67%
GoMemclr/10     5.17ns ± 1%     2.84ns ± 0%   -44.99%
GoMemclr/12     6.22ns ± 0%     2.84ns ± 0%   -54.41%
GoMemclr/15     10.1ns ± 0%      2.8ns ± 0%   -72.04%
GoMemclr/16     2.90ns ± 0%     2.92ns ± 0%    +0.83%

Change-Id: I2b7e1a9ceafba11c57c04a2e8cbd12b5ac87ec9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/467635
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
src/runtime/memclr_ppc64x.s