]>
Cypherpunks repositories - gostls13.git/commit
cmd/compile: improve FP performance on ARM64
FMADD/FMSUB/FNMADD/FNMSUB are efficient FP instructions, which can
be used by the comiler to improve FP performance. This CL implements
this optimization.
1. The compilecmp benchmark shows little change.
name old time/op new time/op delta
Template 2.35s ± 4% 2.38s ± 4% ~ (p=0.161 n=15+15)
Unicode 1.36s ± 5% 1.36s ± 4% ~ (p=0.685 n=14+13)
GoTypes 8.11s ± 3% 8.13s ± 2% ~ (p=0.624 n=15+15)
Compiler 40.5s ± 2% 40.7s ± 2% ~ (p=0.137 n=15+15)
SSA 115s ± 3% 116s ± 1% ~ (p=0.270 n=15+14)
Flate 1.46s ± 4% 1.45s ± 5% ~ (p=0.870 n=15+15)
GoParser 1.85s ± 2% 1.87s ± 3% ~ (p=0.477 n=14+15)
Reflect 5.11s ± 4% 5.10s ± 2% ~ (p=0.624 n=15+15)
Tar 2.23s ± 3% 2.23s ± 5% ~ (p=0.624 n=15+15)
XML 2.72s ± 5% 2.74s ± 3% ~ (p=0.290 n=15+14)
[Geo mean] 5.02s 5.03s +0.29%
name old user-time/op new user-time/op delta
Template 2.90s ± 2% 2.90s ± 3% ~ (p=0.780 n=14+15)
Unicode 1.71s ± 5% 1.70s ± 3% ~ (p=0.458 n=14+13)
GoTypes 9.77s ± 2% 9.76s ± 2% ~ (p=0.838 n=15+15)
Compiler 49.1s ± 2% 49.1s ± 2% ~ (p=0.902 n=15+15)
SSA 144s ± 1% 144s ± 2% ~ (p=0.567 n=15+15)
Flate 1.75s ± 5% 1.74s ± 3% ~ (p=0.461 n=15+15)
GoParser 2.22s ± 2% 2.21s ± 3% ~ (p=0.233 n=15+15)
Reflect 5.99s ± 2% 5.95s ± 1% ~ (p=0.093 n=14+15)
Tar 2.68s ± 2% 2.67s ± 3% ~ (p=0.310 n=14+15)
XML 3.22s ± 2% 3.24s ± 3% ~ (p=0.512 n=15+15)
[Geo mean] 6.08s 6.07s -0.19%
name old text-bytes new text-bytes delta
HelloSize 641kB ± 0% 641kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 9.46kB ± 0% 9.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.24MB ± 0% 1.24MB ± 0% ~ (all equal)
2. The go1 benchmark shows little improvement in total (excluding noise),
but some improvement in test case Mandelbrot200 and FmtFprintfFloat.
name old time/op new time/op delta
BinaryTree17-4 42.1s ± 2% 42.0s ± 2% ~ (p=0.453 n=30+28)
Fannkuch11-4 33.5s ± 3% 33.3s ± 3% -0.38% (p=0.045 n=30+30)
FmtFprintfEmpty-4 534ns ± 0% 534ns ± 0% ~ (all equal)
FmtFprintfString-4 1.09µs ± 0% 1.09µs ± 0% -0.27% (p=0.000 n=23+17)
FmtFprintfInt-4 1.16µs ± 3% 1.16µs ± 3% ~ (p=0.714 n=30+30)
FmtFprintfIntInt-4 1.76µs ± 1% 1.77µs ± 0% +0.15% (p=0.002 n=23+23)
FmtFprintfPrefixedInt-4 2.21µs ± 3% 2.20µs ± 3% ~ (p=0.390 n=30+30)
FmtFprintfFloat-4 3.28µs ± 0% 3.11µs ± 0% -5.01% (p=0.000 n=25+26)
FmtManyArgs-4 7.18µs ± 0% 7.19µs ± 0% +0.13% (p=0.000 n=24+25)
GobDecode-4 94.9ms ± 0% 95.6ms ± 5% +0.83% (p=0.002 n=23+29)
GobEncode-4 80.7ms ± 4% 79.8ms ± 0% -1.11% (p=0.003 n=30+24)
Gzip-4 4.58s ± 4% 4.59s ± 3% +0.26% (p=0.002 n=30+26)
Gunzip-4 449ms ± 4% 443ms ± 0% ~ (p=0.096 n=30+26)
HTTPClientServer-4 553µs ± 1% 548µs ± 1% -0.96% (p=0.000 n=30+30)
JSONEncode-4 215ms ± 4% 214ms ± 4% -0.29% (p=0.000 n=30+30)
JSONDecode-4 868ms ± 4% 875ms ± 5% +0.79% (p=0.008 n=30+30)
Mandelbrot200-4 51.4ms ± 0% 46.7ms ± 3% -9.09% (p=0.000 n=25+26)
GoParse-4 42.1ms ± 0% 41.8ms ± 0% -0.61% (p=0.000 n=25+24)
RegexpMatchEasy0_32-4 1.02µs ± 4% 1.02µs ± 4% -0.17% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 3.90µs ± 0% 3.95µs ± 4% ~ (p=0.516 n=23+30)
RegexpMatchEasy1_32-4 970ns ± 3% 973ns ± 3% ~ (p=0.951 n=30+30)
RegexpMatchEasy1_1K-4 6.43µs ± 3% 6.33µs ± 0% -1.62% (p=0.000 n=30+25)
RegexpMatchMedium_32-4 1.75µs ± 0% 1.75µs ± 0% ~ (p=0.422 n=25+24)
RegexpMatchMedium_1K-4 568µs ± 3% 562µs ± 0% ~ (p=0.079 n=30+24)
RegexpMatchHard_32-4 30.8µs ± 0% 31.2µs ± 4% +1.46% (p=0.018 n=23+30)
RegexpMatchHard_1K-4 932µs ± 0% 946µs ± 3% +1.49% (p=0.000 n=24+30)
Revcomp-4 7.69s ± 3% 7.69s ± 2% +0.04% (p=0.032 n=24+25)
Template-4 893ms ± 5% 880ms ± 6% -1.53% (p=0.000 n=30+30)
TimeParse-4 4.90µs ± 3% 4.84µs ± 0% ~ (p=0.080 n=30+25)
TimeFormat-4 4.70µs ± 1% 4.76µs ± 0% +1.21% (p=0.000 n=23+26)
[Geo mean] 710µs 706µs -0.63%
name old speed new speed delta
GobDecode-4 8.09MB/s ± 0% 8.03MB/s ± 5% -0.77% (p=0.002 n=23+29)
GobEncode-4 9.52MB/s ± 4% 9.62MB/s ± 0% +1.07% (p=0.003 n=30+24)
Gzip-4 4.24MB/s ± 4% 4.23MB/s ± 3% -0.35% (p=0.002 n=30+26)
Gunzip-4 43.2MB/s ± 4% 43.8MB/s ± 0% ~ (p=0.123 n=30+26)
JSONEncode-4 9.03MB/s ± 4% 9.06MB/s ± 4% +0.28% (p=0.000 n=30+30)
JSONDecode-4 2.24MB/s ± 4% 2.22MB/s ± 5% -0.79% (p=0.008 n=30+30)
GoParse-4 1.38MB/s ± 1% 1.38MB/s ± 0% ~ (p=0.401 n=25+17)
RegexpMatchEasy0_32-4 31.4MB/s ± 4% 31.5MB/s ± 3% +0.16% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 262MB/s ± 0% 259MB/s ± 4% ~ (p=0.693 n=23+30)
RegexpMatchEasy1_32-4 33.0MB/s ± 3% 32.9MB/s ± 3% ~ (p=0.139 n=30+30)
RegexpMatchEasy1_1K-4 159MB/s ± 3% 162MB/s ± 0% +1.60% (p=0.000 n=30+25)
RegexpMatchMedium_32-4 570kB/s ± 0% 570kB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 1.80MB/s ± 3% 1.82MB/s ± 0% +1.09% (p=0.007 n=30+24)
RegexpMatchHard_32-4 1.04MB/s ± 0% 1.03MB/s ± 3% -1.38% (p=0.003 n=23+30)
RegexpMatchHard_1K-4 1.10MB/s ± 0% 1.08MB/s ± 3% -1.52% (p=0.000 n=24+30)
Revcomp-4 33.0MB/s ± 3% 33.0MB/s ± 2% ~ (p=0.128 n=24+25)
Template-4 2.17MB/s ± 5% 2.21MB/s ± 6% +1.61% (p=0.000 n=30+30)
[Geo mean] 7.79MB/s 7.79MB/s +0.05%
Change-Id: Ied3dbdb5ba8e386168629cba06fcd4263bbb83e1
Reviewed-on: https://go-review.googlesource.com/94901
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>