When compiling the stdlib most of the calls
to sgen are for exactly 2 or 3 words:
85% for 6g and 70% for 8g.
Special case them for performance.
This optimization is not relevant to 5g and 9g.
6g
benchmark old ns/op new ns/op delta
BenchmarkCopyFat16 3.25 0.82 -74.77%
BenchmarkCopyFat24 5.47 0.95 -82.63%
8g
benchmark old ns/op new ns/op delta
BenchmarkCopyFat8 3.84 2.42 -36.98%
BenchmarkCopyFat12 4.94 2.15 -56.48%