cmd/8g: eliminate obviously useless temps before regopt.
This patch introduces a sort of pre-regopt peephole optimization.
When a temporary is introduced that just holds a value for the
duration of the next instruction and is otherwise unused, we
elide it to make the job of regopt easier.
Since x86 has very few registers, this situation happens very
often. The result is large savings in stack variables for
arithmetic-heavy functions.
crypto/aes
benchmark old ns/op new ns/op delta
BenchmarkEncrypt 1301 392 -69.87%
BenchmarkDecrypt 1309 368 -71.89%
BenchmarkExpand 2913 1036 -64.44%
benchmark old MB/s new MB/s speedup
BenchmarkEncrypt 12.29 40.74 3.31x
BenchmarkDecrypt 12.21 43.37 3.55x
crypto/md5
benchmark old ns/op new ns/op delta
BenchmarkHash8Bytes 1761 914 -48.10%
BenchmarkHash1K 16912 5570 -67.06%
BenchmarkHash8K 123895 38286 -69.10%
benchmark old MB/s new MB/s speedup
BenchmarkHash8Bytes 4.54 8.75 1.93x
BenchmarkHash1K 60.55 183.83 3.04x
BenchmarkHash8K 66.12 213.97 3.24x
bench/go1
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17
8364835000 8303154000 -0.74%
BenchmarkFannkuch11
7511723000 6381729000 -15.04%
BenchmarkGobDecode
27764090 27103270 -2.38%
BenchmarkGobEncode
11240880 11184370 -0.50%
BenchmarkGzip
1470224000 856668400 -41.73%
BenchmarkGunzip
240660800 201697300 -16.19%
BenchmarkJSONEncode
155225800 185571900 +19.55%
BenchmarkJSONDecode
243347900 282123000 +15.93%
BenchmarkMandelbrot200
12240970 12201880 -0.32%
BenchmarkParse
8837445 8765210 -0.82%
BenchmarkRevcomp
2556310000 1868566000 -26.90%
BenchmarkTemplate
389298000 379792000 -2.44%
benchmark old MB/s new MB/s speedup
BenchmarkGobDecode 27.64 28.32 1.02x
BenchmarkGobEncode 68.28 68.63 1.01x
BenchmarkGzip 13.20 22.65 1.72x
BenchmarkGunzip 80.63 96.21 1.19x
BenchmarkJSONEncode 12.50 10.46 0.84x
BenchmarkJSONDecode 7.97 6.88 0.86x
BenchmarkParse 6.55 6.61 1.01x
BenchmarkRevcomp 99.43 136.02 1.37x
BenchmarkTemplate 4.98 5.11 1.03x
Fixes #4035.
R=golang-dev, minux.ma, rsc
CC=golang-dev
https://golang.org/cl/
6828056