compress/flate: Performance improvement for inflate
Decode as much as possible of a Huffman symbol in a single table
lookup (much like the zlib implementation), filling more bits
(conservatively, so we don't consume past the end of the stream)
when the code prefix indicates more bits are needed. This
results in about a 50% performance gain in speed benchmarks.
The following set is benchcmp done on a retina MacBook Pro:
benchmark old MB/s new MB/s speedup
BenchmarkDecodeDigitsSpeed1e4 28.41 42.79 1.51x
BenchmarkDecodeDigitsSpeed1e5 30.18 47.62 1.58x
BenchmarkDecodeDigitsSpeed1e6 30.81 48.14 1.56x
BenchmarkDecodeDigitsDefault1e4 30.28 44.61 1.47x
BenchmarkDecodeDigitsDefault1e5 32.18 51.94 1.61x
BenchmarkDecodeDigitsDefault1e6 35.57 53.28 1.50x
BenchmarkDecodeDigitsCompress1e4 30.39 44.83 1.48x
BenchmarkDecodeDigitsCompress1e5 33.05 51.64 1.56x
BenchmarkDecodeDigitsCompress1e6 35.69 53.04 1.49x
BenchmarkDecodeTwainSpeed1e4 25.90 43.04 1.66x
BenchmarkDecodeTwainSpeed1e5 29.97 48.19 1.61x
BenchmarkDecodeTwainSpeed1e6 31.36 49.43 1.58x
BenchmarkDecodeTwainDefault1e4 28.79 45.02 1.56x
BenchmarkDecodeTwainDefault1e5 37.12 55.65 1.50x
BenchmarkDecodeTwainDefault1e6 39.28 58.16 1.48x
BenchmarkDecodeTwainCompress1e4 28.64 44.90 1.57x
BenchmarkDecodeTwainCompress1e5 37.40 55.98 1.50x
BenchmarkDecodeTwainCompress1e6 39.35 58.06 1.48x
R=rsc, dave, minux.ma, bradfitz, nigeltao
CC=golang-dev
https://golang.org/cl/
6872063