]>
Cypherpunks repositories - gostls13.git/commit
cmd/compile: optimize ARM code with NMULF/NMULD
NMULF and NMULD are efficient FP instructions, and the go compiler can
use them to generate better code.
The benchmark tests of my patch did not show general change, but big
improvement in special cases.
1.A special test case improved 12.6%.
https://github.com/benshi001/ugo1/blob/master/fpmul_test.go
name old time/op new time/op delta
FPMul-4 398µs ± 1% 348µs ± 1% -12.64% (p=0.000 n=40+40)
2. the compilecmp test showed little change.
name old time/op new time/op delta
Template 2.30s ± 1% 2.31s ± 1% ~ (p=0.754 n=17+19)
Unicode 1.31s ± 3% 1.32s ± 5% ~ (p=0.265 n=20+20)
GoTypes 7.73s ± 2% 7.73s ± 1% ~ (p=0.925 n=20+20)
Compiler 37.0s ± 1% 37.3s ± 2% +0.79% (p=0.002 n=19+20)
SSA 83.8s ± 4% 83.5s ± 2% ~ (p=0.964 n=20+17)
Flate 1.43s ± 2% 1.44s ± 1% ~ (p=0.602 n=20+20)
GoParser 1.82s ± 2% 1.81s ± 2% ~ (p=0.141 n=19+20)
Reflect 5.08s ± 2% 5.08s ± 3% ~ (p=0.835 n=20+19)
Tar 2.36s ± 1% 2.35s ± 1% ~ (p=0.195 n=18+17)
XML 2.57s ± 2% 2.56s ± 1% ~ (p=0.283 n=20+17)
[Geo mean] 4.74s 4.75s +0.05%
name old user-time/op new user-time/op delta
Template 2.75s ± 2% 2.75s ± 0% ~ (p=0.620 n=20+15)
Unicode 1.59s ± 4% 1.60s ± 4% ~ (p=0.479 n=20+19)
GoTypes 9.48s ± 1% 9.47s ± 1% ~ (p=0.743 n=20+20)
Compiler 45.7s ± 1% 45.7s ± 1% ~ (p=0.482 n=19+20)
SSA 109s ± 1% 109s ± 2% ~ (p=0.800 n=18+20)
Flate 1.67s ± 3% 1.67s ± 3% ~ (p=0.598 n=19+18)
GoParser 2.15s ± 4% 2.13s ± 3% ~ (p=0.153 n=20+20)
Reflect 5.95s ± 2% 5.95s ± 2% ~ (p=0.961 n=19+20)
Tar 2.93s ± 2% 2.92s ± 3% ~ (p=0.242 n=20+19)
XML 3.02s ± 3% 3.04s ± 3% ~ (p=0.233 n=19+18)
[Geo mean] 5.74s 5.74s -0.04%
name old text-bytes new text-bytes delta
HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal)
3. The go1 benchmark showed little change in total.
name old time/op new time/op delta
BinaryTree17-4 41.8s ± 1% 41.8s ± 1% ~ (p=0.388 n=40+39)
Fannkuch11-4 24.1s ± 1% 24.1s ± 1% ~ (p=0.077 n=40+40)
FmtFprintfEmpty-4 834ns ± 1% 831ns ± 1% -0.31% (p=0.002 n=40+37)
FmtFprintfString-4 1.34µs ± 1% 1.34µs ± 0% ~ (p=0.387 n=40+40)
FmtFprintfInt-4 1.44µs ± 1% 1.44µs ± 1% ~ (p=0.421 n=40+40)
FmtFprintfIntInt-4 2.09µs ± 0% 2.09µs ± 1% ~ (p=0.589 n=40+39)
FmtFprintfPrefixedInt-4 2.32µs ± 1% 2.33µs ± 1% +0.15% (p=0.001 n=40+40)
FmtFprintfFloat-4 4.51µs ± 0% 4.44µs ± 1% -1.50% (p=0.000 n=40+40)
FmtManyArgs-4 7.94µs ± 0% 7.97µs ± 0% +0.36% (p=0.001 n=32+40)
GobDecode-4 104ms ± 1% 102ms ± 2% -1.27% (p=0.000 n=39+37)
GobEncode-4 90.5ms ± 1% 90.9ms ± 2% +0.40% (p=0.006 n=37+40)
Gzip-4 4.10s ± 2% 4.08s ± 1% -0.30% (p=0.004 n=40+40)
Gunzip-4 603ms ± 0% 602ms ± 1% ~ (p=0.303 n=37+40)
HTTPClientServer-4 672µs ± 3% 658µs ± 2% -2.08% (p=0.000 n=39+37)
JSONEncode-4 238ms ± 1% 239ms ± 0% +0.26% (p=0.001 n=40+25)
JSONDecode-4 884ms ± 1% 885ms ± 1% +0.16% (p=0.012 n=40+40)
Mandelbrot200-4 49.3ms ± 0% 49.3ms ± 0% ~ (p=0.588 n=40+38)
GoParse-4 46.3ms ± 1% 46.4ms ± 2% ~ (p=0.487 n=40+40)
RegexpMatchEasy0_32-4 1.28µs ± 1% 1.28µs ± 0% +0.12% (p=0.003 n=40+40)
RegexpMatchEasy0_1K-4 7.78µs ± 5% 7.78µs ± 4% ~ (p=0.825 n=40+40)
RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 0% ~ (p=0.659 n=40+40)
RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.4µs ± 2% ~ (p=0.266 n=40+40)
RegexpMatchMedium_32-4 2.05µs ± 1% 2.05µs ± 0% -0.18% (p=0.002 n=40+28)
RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.397 n=37+40)
RegexpMatchHard_32-4 28.9µs ± 1% 28.9µs ± 1% -0.22% (p=0.002 n=40+40)
RegexpMatchHard_1K-4 868µs ± 1% 870µs ± 1% +0.21% (p=0.015 n=40+40)
Revcomp-4 67.3ms ± 1% 67.2ms ± 2% ~ (p=0.262 n=38+39)
Template-4 1.07s ± 1% 1.07s ± 1% ~ (p=0.276 n=40+40)
TimeParse-4 7.16µs ± 1% 7.16µs ± 1% ~ (p=0.610 n=39+40)
TimeFormat-4 13.3µs ± 1% 13.3µs ± 1% ~ (p=0.617 n=38+40)
[Geo mean] 720µs 719µs -0.13%
name old speed new speed delta
GobDecode-4 7.39MB/s ± 1% 7.49MB/s ± 2% +1.25% (p=0.000 n=39+38)
GobEncode-4 8.48MB/s ± 1% 8.45MB/s ± 2% -0.40% (p=0.005 n=37+40)
Gzip-4 4.74MB/s ± 2% 4.75MB/s ± 1% +0.30% (p=0.018 n=40+40)
Gunzip-4 32.2MB/s ± 0% 32.2MB/s ± 1% ~ (p=0.272 n=36+40)
JSONEncode-4 8.15MB/s ± 1% 8.13MB/s ± 0% -0.26% (p=0.003 n=40+25)
JSONDecode-4 2.19MB/s ± 1% 2.19MB/s ± 1% ~ (p=0.676 n=40+40)
GoParse-4 1.25MB/s ± 2% 1.25MB/s ± 2% ~ (p=0.823 n=40+40)
RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.1MB/s ± 0% -0.12% (p=0.006 n=40+40)
RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 5% ~ (p=0.821 n=40+40)
RegexpMatchEasy1_32-4 24.7MB/s ± 1% 24.7MB/s ± 0% ~ (p=0.630 n=40+40)
RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 98.8MB/s ± 2% ~ (p=0.268 n=40+40)
RegexpMatchMedium_32-4 487kB/s ± 2% 490kB/s ± 0% +0.51% (p=0.001 n=40+40)
RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.208 n=39+40)
RegexpMatchHard_32-4 1.11MB/s ± 1% 1.11MB/s ± 0% +0.36% (p=0.000 n=40+33)
RegexpMatchHard_1K-4 1.18MB/s ± 1% 1.18MB/s ± 1% ~ (p=0.207 n=40+37)
Revcomp-4 37.8MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.276 n=38+39)
Template-4 1.82MB/s ± 1% 1.81MB/s ± 1% ~ (p=0.122 n=38+40)
[Geo mean] 6.81MB/s 6.81MB/s +0.06%
fixes #19843
Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59
Reviewed-on: https://go-review.googlesource.com/61150
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>