Rationale: small buffer optimization does not work and it has
made things slower since 2014. Until we can make it work,
we should prefer simpler code that also turns out to be more
efficient.
With this change, it's possible to use
NewBuffer(make([]byte, 0, bootstrapSize)) to get the desired
stack-allocated initial buffer since escape analysis can
prove the created slice to be non-escaping.
New implementation key points:
- Zero value bytes.Buffer performs better than before
- You can have a truly stack-allocated buffer, and it's not even limited to 64 bytes
- The unsafe.Sizeof(bytes.Buffer{}) is reduced significantly
- Empty writes don't cause allocations