]> Cypherpunks repositories - gostls13.git/commit
internal/bytealg: vector implementation of equal for riscv64
authorJoel Sing <joel@sing.id.au>
Wed, 12 Feb 2025 12:41:22 +0000 (23:41 +1100)
committerJoel Sing <joel@sing.id.au>
Wed, 6 Aug 2025 13:18:46 +0000 (06:18 -0700)
commit75ea2d05c01903a69dbdcd15e64b934da73c84ea
tree0a1f80244aeab91708a700376490faea102986dd
parent17a8be71178dab07438cfffb84d2588f2a66bea1
internal/bytealg: vector implementation of equal for riscv64

Provide a vector implementation of equal for riscv64, which is used
when compiled with the rva23u64 profile, or when vector is detected
to be available. Inputs that are 8 byte aligned will still be handled
via a the non-vector code if the length is less than or equal to 64
bytes.

On a Banana Pi F3, with GORISCV64=rva23u64:

                                │   equal.1    │               equal.2                │
                                │    sec/op    │    sec/op     vs base                │
Equal/0-8                         1.254n ±  0%   1.254n ±  0%        ~ (p=1.000 n=10)
Equal/same/1-8                    21.32n ±  0%   21.32n ±  0%        ~ (p=0.466 n=10)
Equal/same/6-8                    21.32n ±  0%   21.32n ±  0%        ~ (p=0.689 n=10)
Equal/same/9-8                    21.32n ±  0%   21.32n ±  0%        ~ (p=0.861 n=10)
Equal/same/15-8                   21.32n ±  0%   21.32n ±  0%        ~ (p=0.657 n=10)
Equal/same/16-8                   21.32n ±  0%   21.33n ±  0%        ~ (p=0.075 n=10)
Equal/same/20-8                   21.32n ±  0%   21.32n ±  0%        ~ (p=0.249 n=10)
Equal/same/32-8                   21.32n ±  0%   21.32n ±  0%        ~ (p=0.303 n=10)
Equal/same/4K-8                   21.32n ±  0%   21.32n ±  0%        ~ (p=1.000 n=10)
Equal/same/4M-8                   21.32n ±  0%   21.32n ±  0%        ~ (p=0.582 n=10)
Equal/same/64M-8                  21.32n ±  0%   21.32n ±  0%        ~ (p=0.930 n=10)
Equal/1-8                         39.16n ±  1%   38.71n ±  0%   -1.15% (p=0.000 n=10)
Equal/6-8                         51.49n ±  1%   50.40n ±  1%   -2.12% (p=0.000 n=10)
Equal/9-8                         54.46n ±  1%   53.89n ±  0%   -1.04% (p=0.000 n=10)
Equal/15-8                        71.81n ±  1%   70.59n ±  0%   -1.71% (p=0.000 n=10)
Equal/16-8                        69.14n ±  0%   68.21n ±  0%   -1.34% (p=0.000 n=10)
Equal/20-8                        78.59n ±  0%   77.59n ±  0%   -1.26% (p=0.000 n=10)
Equal/32-8                        41.55n ±  0%   41.16n ±  0%   -0.96% (p=0.000 n=10)
Equal/4K-8                        925.5n ±  0%   561.4n ±  1%  -39.34% (p=0.000 n=10)
Equal/4M-8                        3.110m ± 32%   2.463m ± 16%  -20.80% (p=0.000 n=10)
Equal/64M-8                       47.34m ± 30%   39.89m ± 16%  -15.75% (p=0.004 n=10)
EqualBothUnaligned/64_0-8         32.17n ±  1%   32.11n ±  1%        ~ (p=0.184 n=10)
EqualBothUnaligned/64_1-8         79.48n ±  0%   48.24n ±  1%  -39.31% (p=0.000 n=10)
EqualBothUnaligned/64_4-8         72.71n ±  0%   48.37n ±  1%  -33.48% (p=0.000 n=10)
EqualBothUnaligned/64_7-8         77.12n ±  0%   48.16n ±  1%  -37.56% (p=0.000 n=10)
EqualBothUnaligned/4096_0-8       908.4n ±  0%   562.4n ±  2%  -38.09% (p=0.000 n=10)
EqualBothUnaligned/4096_1-8       956.6n ±  0%   571.4n ±  3%  -40.26% (p=0.000 n=10)
EqualBothUnaligned/4096_4-8       949.6n ±  0%   571.6n ±  3%  -39.81% (p=0.000 n=10)
EqualBothUnaligned/4096_7-8       954.2n ±  0%   571.7n ±  3%  -40.09% (p=0.000 n=10)
EqualBothUnaligned/4194304_0-8    2.935m ± 29%   2.664m ± 19%        ~ (p=0.089 n=10)
EqualBothUnaligned/4194304_1-8    3.341m ± 13%   2.896m ± 34%        ~ (p=0.075 n=10)
EqualBothUnaligned/4194304_4-8    3.204m ± 39%   3.352m ± 33%        ~ (p=0.796 n=10)
EqualBothUnaligned/4194304_7-8    3.226m ± 30%   2.737m ± 34%  -15.16% (p=0.043 n=10)
EqualBothUnaligned/67108864_0-8   49.04m ± 17%   39.94m ± 12%  -18.57% (p=0.005 n=10)
EqualBothUnaligned/67108864_1-8   51.96m ± 15%   42.48m ± 15%  -18.23% (p=0.015 n=10)
EqualBothUnaligned/67108864_4-8   47.67m ± 17%   37.85m ± 41%  -20.61% (p=0.035 n=10)
EqualBothUnaligned/67108864_7-8   53.00m ± 22%   38.76m ± 21%  -26.87% (p=0.000 n=10)
CompareBytesEqual-8               51.71n ±  1%   52.00n ±  0%   +0.57% (p=0.002 n=10)
geomean                           1.469µ         1.265µ        -13.93%

                                │    equal.1     │                equal.2                 │
                                │      B/s       │      B/s        vs base                │
Equal/same/1-8                     44.73Mi ±  0%    44.72Mi ±  0%        ~ (p=0.426 n=10)
Equal/same/6-8                     268.3Mi ±  0%    268.4Mi ±  0%        ~ (p=0.753 n=10)
Equal/same/9-8                     402.6Mi ±  0%    402.5Mi ±  0%        ~ (p=0.209 n=10)
Equal/same/15-8                    670.9Mi ±  0%    670.9Mi ±  0%        ~ (p=0.724 n=10)
Equal/same/16-8                    715.6Mi ±  0%    715.4Mi ±  0%   -0.04% (p=0.022 n=10)
Equal/same/20-8                    894.6Mi ±  0%    894.5Mi ±  0%        ~ (p=0.060 n=10)
Equal/same/32-8                    1.398Gi ±  0%    1.398Gi ±  0%        ~ (p=0.986 n=10)
Equal/same/4K-8                    178.9Gi ±  0%    178.9Gi ±  0%        ~ (p=0.853 n=10)
Equal/same/4M-8                    178.9Ti ±  0%    178.9Ti ±  0%        ~ (p=0.971 n=10)
Equal/same/64M-8                  2862.8Ti ±  0%   2862.6Ti ±  0%        ~ (p=0.971 n=10)
Equal/1-8                          24.35Mi ±  1%    24.63Mi ±  0%   +1.16% (p=0.000 n=10)
Equal/6-8                          111.1Mi ±  1%    113.5Mi ±  1%   +2.17% (p=0.000 n=10)
Equal/9-8                          157.6Mi ±  1%    159.3Mi ±  0%   +1.05% (p=0.000 n=10)
Equal/15-8                         199.2Mi ±  1%    202.7Mi ±  0%   +1.74% (p=0.000 n=10)
Equal/16-8                         220.7Mi ±  0%    223.7Mi ±  0%   +1.36% (p=0.000 n=10)
Equal/20-8                         242.7Mi ±  0%    245.8Mi ±  0%   +1.27% (p=0.000 n=10)
Equal/32-8                         734.3Mi ±  0%    741.6Mi ±  0%   +0.98% (p=0.000 n=10)
Equal/4K-8                         4.122Gi ±  0%    6.795Gi ±  1%  +64.84% (p=0.000 n=10)
Equal/4M-8                         1.258Gi ± 24%    1.586Gi ± 14%  +26.12% (p=0.000 n=10)
Equal/64M-8                        1.320Gi ± 23%    1.567Gi ± 14%  +18.69% (p=0.004 n=10)
EqualBothUnaligned/64_0-8          1.853Gi ±  1%    1.856Gi ±  1%        ~ (p=0.190 n=10)
EqualBothUnaligned/64_1-8          767.9Mi ±  0%   1265.2Mi ±  1%  +64.76% (p=0.000 n=10)
EqualBothUnaligned/64_4-8          839.4Mi ±  0%   1261.9Mi ±  1%  +50.33% (p=0.000 n=10)
EqualBothUnaligned/64_7-8          791.4Mi ±  0%   1267.5Mi ±  1%  +60.16% (p=0.000 n=10)
EqualBothUnaligned/4096_0-8        4.199Gi ±  0%    6.784Gi ±  2%  +61.54% (p=0.000 n=10)
EqualBothUnaligned/4096_1-8        3.988Gi ±  0%    6.676Gi ±  3%  +67.40% (p=0.000 n=10)
EqualBothUnaligned/4096_4-8        4.017Gi ±  0%    6.674Gi ±  3%  +66.14% (p=0.000 n=10)
EqualBothUnaligned/4096_7-8        3.998Gi ±  0%    6.673Gi ±  3%  +66.92% (p=0.000 n=10)
EqualBothUnaligned/4194304_0-8     1.332Gi ± 22%    1.468Gi ± 16%        ~ (p=0.089 n=10)
EqualBothUnaligned/4194304_1-8     1.169Gi ± 12%    1.350Gi ± 25%        ~ (p=0.075 n=10)
EqualBothUnaligned/4194304_4-8     1.222Gi ± 28%    1.165Gi ± 48%        ~ (p=0.796 n=10)
EqualBothUnaligned/4194304_7-8     1.211Gi ± 23%    1.427Gi ± 26%  +17.88% (p=0.043 n=10)
EqualBothUnaligned/67108864_0-8    1.274Gi ± 14%    1.567Gi ± 14%  +22.97% (p=0.005 n=10)
EqualBothUnaligned/67108864_1-8    1.204Gi ± 14%    1.471Gi ± 13%  +22.18% (p=0.015 n=10)
EqualBothUnaligned/67108864_4-8    1.311Gi ± 14%    1.651Gi ± 29%  +25.92% (p=0.035 n=10)
EqualBothUnaligned/67108864_7-8    1.179Gi ± 18%    1.612Gi ± 17%  +36.73% (p=0.000 n=10)
geomean                            1.870Gi          2.190Gi        +17.16%

Change-Id: I9c5270bcc6997d020a96d1e97c7e7cfc7ca7fd34
Reviewed-on: https://go-review.googlesource.com/c/go/+/646736
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
src/internal/bytealg/bytealg.go
src/internal/bytealg/equal_riscv64.s