]> Cypherpunks repositories - gostls13.git/commit
runtime: port performance-critical functions to regabi
authorAustin Clements <austin@google.com>
Thu, 8 Apr 2021 21:43:51 +0000 (17:43 -0400)
committerAustin Clements <austin@google.com>
Mon, 12 Apr 2021 18:08:47 +0000 (18:08 +0000)
commit849dba07a5392d2f137deeaa9e797f907c00d0bd
tree24e986e8cdf28b6a444ca57c7b6e5fc55ea8ef07
parent865d2bc78e5b6170c8b773880dc5fa3405791dc2
runtime: port performance-critical functions to regabi

This CL ports a few performance-critical runtime assembly functions to
use register arguments directly. While using the faster ABI is nice,
the real win here is that we avoid ABI wrappers: since these are
"builtin" functions in the compiler, it can generate calls to them
without knowing that their native implementation is ABI0. Hence, it
generates ABIInternal calls that go through ABI wrappers. By porting
them to use ABIInternal natively, we avoid the overhead of the ABI
wrapper.

This significantly improves performance on several benchmarks,
comparing regabiwrappers before and after this change:

name                                old time/op  new time/op  delta
BiogoIgor                            15.7s ± 2%   15.7s ± 2%    ~     (p=0.617 n=25+25)
BiogoKrishna                         18.5s ± 5%   17.7s ± 2%  -4.61%  (p=0.000 n=25+25)
BleveIndexBatch100                   5.91s ± 3%   5.82s ± 3%  -1.60%  (p=0.000 n=25+25)
BleveQuery                           6.76s ± 0%   6.60s ± 1%  -2.31%  (p=0.000 n=22+25)
CompileTemplate                      248ms ± 5%   245ms ± 1%    ~     (p=0.643 n=25+20)
CompileUnicode                      94.4ms ± 3%  93.9ms ± 2%    ~     (p=0.152 n=24+23)
CompileGoTypes                       1.60s ± 2%   1.59s ± 2%    ~     (p=0.059 n=24+24)
CompileCompiler                      104ms ± 3%   103ms ± 1%    ~     (p=0.056 n=25+22)
CompileSSA                           10.9s ± 1%   10.9s ± 1%    ~     (p=0.052 n=25+25)
CompileFlate                         156ms ± 8%   152ms ± 1%  -2.49%  (p=0.008 n=25+21)
CompileGoParser                      248ms ± 1%   249ms ± 2%    ~     (p=0.058 n=21+20)
CompileReflect                       595ms ± 3%   601ms ± 4%    ~     (p=0.182 n=25+25)
CompileTar                           211ms ± 2%   211ms ± 1%    ~     (p=0.663 n=23+23)
CompileXML                           282ms ± 2%   284ms ± 5%    ~     (p=0.456 n=21+23)
CompileStdCmd                        13.6s ± 2%   13.5s ± 2%    ~     (p=0.112 n=25+24)
FoglemanFauxGLRenderRotateBoat       8.69s ± 2%   8.67s ± 0%    ~     (p=0.094 n=22+25)
FoglemanPathTraceRenderGopherIter1   20.2s ± 2%   20.7s ± 3%  +2.53%  (p=0.000 n=24+24)
GopherLuaKNucleotide                 31.4s ± 1%   31.0s ± 1%  -1.28%  (p=0.000 n=25+24)
MarkdownRenderXHTML                  246ms ± 1%   244ms ± 1%  -0.79%  (p=0.000 n=20+21)
Tile38WithinCircle100kmRequest       843µs ± 4%   818µs ± 4%  -2.93%  (p=0.000 n=25+25)
Tile38IntersectsCircle100kmRequest  1.06ms ± 5%  1.05ms ± 3%  -1.19%  (p=0.021 n=24+25)
Tile38KNearestLimit100Request       1.01ms ± 1%  1.01ms ± 2%    ~     (p=0.335 n=22+25)
[Geo mean]                           596ms        592ms       -0.71%

(https://perf.golang.org/search?q=upload:20210411.5)

It also significantly reduces the performance penalty of enabling
regabiwrappers, though it doesn't yet fully close the gap on all
benchmarks:

name                                old time/op  new time/op  delta
BiogoIgor                            15.7s ± 1%   15.7s ± 2%    ~     (p=0.366 n=24+25)
BiogoKrishna                         17.7s ± 2%   17.7s ± 2%    ~     (p=0.315 n=23+25)
BleveIndexBatch100                   5.86s ± 4%   5.82s ± 3%    ~     (p=0.137 n=24+25)
BleveQuery                           6.55s ± 0%   6.60s ± 1%  +0.83%  (p=0.000 n=24+25)
CompileTemplate                      244ms ± 1%   245ms ± 1%    ~     (p=0.208 n=21+20)
CompileUnicode                      94.0ms ± 4%  93.9ms ± 2%    ~     (p=0.666 n=24+23)
CompileGoTypes                       1.60s ± 2%   1.59s ± 2%    ~     (p=0.154 n=25+24)
CompileCompiler                      103ms ± 1%   103ms ± 1%    ~     (p=0.905 n=24+22)
CompileSSA                           10.9s ± 2%   10.9s ± 1%    ~     (p=0.803 n=25+25)
CompileFlate                         153ms ± 1%   152ms ± 1%    ~     (p=0.182 n=23+21)
CompileGoParser                      250ms ± 2%   249ms ± 2%    ~     (p=0.843 n=24+20)
CompileReflect                       595ms ± 4%   601ms ± 4%    ~     (p=0.141 n=25+25)
CompileTar                           212ms ± 3%   211ms ± 1%    ~     (p=0.499 n=23+23)
CompileXML                           282ms ± 1%   284ms ± 5%    ~     (p=0.129 n=20+23)
CompileStdCmd                        13.5s ± 2%   13.5s ± 2%    ~     (p=0.480 n=24+24)
FoglemanFauxGLRenderRotateBoat       8.66s ± 1%   8.67s ± 0%    ~     (p=0.325 n=25+25)
FoglemanPathTraceRenderGopherIter1   20.6s ± 3%   20.7s ± 3%    ~     (p=0.137 n=25+24)
GopherLuaKNucleotide                 30.5s ± 2%   31.0s ± 1%  +1.68%  (p=0.000 n=23+24)
MarkdownRenderXHTML                  243ms ± 1%   244ms ± 1%  +0.51%  (p=0.000 n=23+21)
Tile38WithinCircle100kmRequest       801µs ± 2%   818µs ± 4%  +2.11%  (p=0.000 n=25+25)
Tile38IntersectsCircle100kmRequest  1.01ms ± 2%  1.05ms ± 3%  +4.34%  (p=0.000 n=24+25)
Tile38KNearestLimit100Request       1.00ms ± 1%  1.01ms ± 2%  +0.81%  (p=0.008 n=21+25)
[Geo mean]                           589ms        592ms       +0.50%

(https://perf.golang.org/search?q=upload:20210411.6)

Change-Id: I8f77f010b0abc658064df569a27a9c7a7b1c7bf9
Reviewed-on: https://go-review.googlesource.com/c/go/+/308931
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
src/runtime/asm_amd64.s
src/runtime/memclr_amd64.s
src/runtime/memmove_amd64.s
src/runtime/stubs.go