]> Cypherpunks repositories - gostls13.git/commit
crypto/internal/fips140/subtle: provide riscv64 assembly implementation for xorBytes
authorJoel Sing <joel@sing.id.au>
Fri, 3 Jan 2025 08:06:18 +0000 (19:06 +1100)
committerJoel Sing <joel@sing.id.au>
Sat, 15 Feb 2025 04:58:15 +0000 (20:58 -0800)
commitc62c69dd5c1af0e25c76071f8987480680f09222
tree1b183582d0e231da2889951f9ee6e000fa2de990
parent63ae4167208259fea30769e7baf8ef5b4c73ef4e
crypto/internal/fips140/subtle: provide riscv64 assembly implementation for xorBytes

Provide a riscv64 assembly implementation of xorBytes, which
can process up to 64 bytes per loop and has better handling
for unaligned inputs. This provides a considerable performance
gain compared to the generic code.

On a StarFive VisionFive 2:

                                     │   subtle.1   │              subtle.2               │
                                     │    sec/op    │   sec/op     vs base                │
XORBytes/8Bytes-4                       59.54n ± 0%   58.15n ± 0%   -2.33% (p=0.000 n=10)
XORBytes/128Bytes-4                    125.60n ± 0%   74.93n ± 0%  -40.35% (p=0.000 n=10)
XORBytes/2048Bytes-4                   1088.5n ± 0%   602.4n ± 0%  -44.66% (p=0.000 n=10)
XORBytes/8192Bytes-4                    4.163µ ± 0%   2.271µ ± 0%  -45.45% (p=0.000 n=10)
XORBytes/32768Bytes-4                   35.47µ ± 0%   28.12µ ± 0%  -20.74% (p=0.000 n=10)
XORBytesUnaligned/8Bytes0Offset-4       59.80n ± 0%   57.48n ± 0%   -3.86% (p=0.000 n=10)
XORBytesUnaligned/8Bytes1Offset-4       72.97n ± 0%   57.48n ± 0%  -21.23% (p=0.000 n=10)
XORBytesUnaligned/8Bytes2Offset-4       72.97n ± 0%   57.50n ± 0%  -21.21% (p=0.000 n=10)
XORBytesUnaligned/8Bytes3Offset-4       72.99n ± 0%   57.48n ± 0%  -21.26% (p=0.000 n=10)
XORBytesUnaligned/8Bytes4Offset-4       72.96n ± 0%   57.44n ± 0%  -21.28% (p=0.000 n=10)
XORBytesUnaligned/8Bytes5Offset-4       72.93n ± 0%   57.48n ± 0%  -21.18% (p=0.000 n=10)
XORBytesUnaligned/8Bytes6Offset-4       72.97n ± 0%   57.47n ± 0%  -21.25% (p=0.000 n=10)
XORBytesUnaligned/8Bytes7Offset-4       72.96n ± 0%   57.47n ± 0%  -21.24% (p=0.000 n=10)
XORBytesUnaligned/128Bytes0Offset-4    125.30n ± 0%   74.18n ± 0%  -40.80% (p=0.000 n=10)
XORBytesUnaligned/128Bytes1Offset-4     557.4n ± 0%   131.1n ± 0%  -76.48% (p=0.000 n=10)
XORBytesUnaligned/128Bytes2Offset-4     557.3n ± 0%   132.5n ± 0%  -76.22% (p=0.000 n=10)
XORBytesUnaligned/128Bytes3Offset-4     557.6n ± 0%   133.7n ± 0%  -76.02% (p=0.000 n=10)
XORBytesUnaligned/128Bytes4Offset-4     557.4n ± 0%   125.0n ± 0%  -77.57% (p=0.000 n=10)
XORBytesUnaligned/128Bytes5Offset-4     557.7n ± 0%   125.7n ± 0%  -77.46% (p=0.000 n=10)
XORBytesUnaligned/128Bytes6Offset-4     557.5n ± 0%   127.0n ± 0%  -77.22% (p=0.000 n=10)
XORBytesUnaligned/128Bytes7Offset-4     557.7n ± 0%   128.3n ± 0%  -76.99% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes0Offset-4   1088.5n ± 0%   601.9n ± 0%  -44.71% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes1Offset-4   8243.0n ± 0%   655.7n ± 0%  -92.05% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes2Offset-4   8244.0n ± 0%   657.1n ± 0%  -92.03% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes3Offset-4   8247.5n ± 0%   658.7n ± 0%  -92.01% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes4Offset-4   8243.0n ± 0%   649.8n ± 0%  -92.12% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes5Offset-4   8247.0n ± 0%   650.2n ± 0%  -92.12% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes6Offset-4   8243.0n ± 0%   651.6n ± 0%  -92.09% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes7Offset-4   8244.0n ± 0%   652.8n ± 0%  -92.08% (p=0.000 n=10)
geomean                                 410.1n        147.2n       -64.10%

                                     │   subtle.1   │                subtle.2                 │
                                     │     B/s      │      B/s       vs base                  │
XORBytes/8Bytes-4                      128.1Mi ± 0%    131.2Mi ± 0%     +2.40% (p=0.000 n=10)
XORBytes/128Bytes-4                    971.6Mi ± 0%   1629.2Mi ± 0%    +67.69% (p=0.000 n=10)
XORBytes/2048Bytes-4                   1.752Gi ± 0%    3.166Gi ± 0%    +80.68% (p=0.000 n=10)
XORBytes/8192Bytes-4                   1.833Gi ± 0%    3.360Gi ± 0%    +83.35% (p=0.000 n=10)
XORBytes/32768Bytes-4                  881.0Mi ± 0%   1111.5Mi ± 0%    +26.16% (p=0.000 n=10)
XORBytesUnaligned/8Bytes0Offset-4      127.6Mi ± 0%    132.7Mi ± 0%     +4.02% (p=0.000 n=10)
XORBytesUnaligned/8Bytes1Offset-4      104.5Mi ± 0%    132.7Mi ± 0%    +26.95% (p=0.000 n=10)
XORBytesUnaligned/8Bytes2Offset-4      104.6Mi ± 0%    132.7Mi ± 0%    +26.92% (p=0.000 n=10)
XORBytesUnaligned/8Bytes3Offset-4      104.5Mi ± 0%    132.8Mi ± 0%    +27.01% (p=0.000 n=10)
XORBytesUnaligned/8Bytes4Offset-4      104.6Mi ± 0%    132.8Mi ± 0%    +27.02% (p=0.000 n=10)
XORBytesUnaligned/8Bytes5Offset-4      104.6Mi ± 0%    132.7Mi ± 0%    +26.89% (p=0.000 n=10)
XORBytesUnaligned/8Bytes6Offset-4      104.5Mi ± 0%    132.8Mi ± 0%    +26.99% (p=0.000 n=10)
XORBytesUnaligned/8Bytes7Offset-4      104.6Mi ± 0%    132.8Mi ± 0%    +26.97% (p=0.000 n=10)
XORBytesUnaligned/128Bytes0Offset-4    974.4Mi ± 0%   1645.7Mi ± 0%    +68.90% (p=0.000 n=10)
XORBytesUnaligned/128Bytes1Offset-4    219.0Mi ± 0%    931.3Mi ± 0%   +325.23% (p=0.000 n=10)
XORBytesUnaligned/128Bytes2Offset-4    219.0Mi ± 0%    921.2Mi ± 0%   +320.57% (p=0.000 n=10)
XORBytesUnaligned/128Bytes3Offset-4    218.9Mi ± 0%    912.9Mi ± 0%   +316.97% (p=0.000 n=10)
XORBytesUnaligned/128Bytes4Offset-4    219.0Mi ± 0%    976.4Mi ± 0%   +345.85% (p=0.000 n=10)
XORBytesUnaligned/128Bytes5Offset-4    218.9Mi ± 0%    971.2Mi ± 0%   +343.70% (p=0.000 n=10)
XORBytesUnaligned/128Bytes6Offset-4    219.0Mi ± 0%    961.1Mi ± 0%   +338.86% (p=0.000 n=10)
XORBytesUnaligned/128Bytes7Offset-4    218.9Mi ± 0%    951.1Mi ± 0%   +334.52% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes0Offset-4   1.752Gi ± 0%    3.169Gi ± 0%    +80.83% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes1Offset-4   236.9Mi ± 0%   2978.6Mi ± 0%  +1157.10% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes2Offset-4   236.9Mi ± 0%   2972.1Mi ± 0%  +1154.48% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes3Offset-4   236.8Mi ± 0%   2965.1Mi ± 0%  +1152.05% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes4Offset-4   236.9Mi ± 0%   3005.9Mi ± 0%  +1168.65% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes5Offset-4   236.8Mi ± 0%   3004.0Mi ± 0%  +1168.42% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes6Offset-4   236.9Mi ± 0%   2997.2Mi ± 0%  +1164.96% (p=0.000 n=10)
XORBytesUnaligned/2048Bytes7Offset-4   236.9Mi ± 0%   2991.9Mi ± 0%  +1162.93% (p=0.000 n=10)
geomean                                260.4Mi         806.7Mi        +209.73%

Change-Id: I9bec9c8f48df7284f8414ac745615c2a093e9ae9
Reviewed-on: https://go-review.googlesource.com/c/go/+/639858
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
TryBot-Bypass: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
src/crypto/internal/fips140/subtle/xor_asm.go
src/crypto/internal/fips140/subtle/xor_generic.go
src/crypto/internal/fips140/subtle/xor_riscv64.s [new file with mode: 0644]