runtime: pad work.full and pad.empty to avoid false sharing
With the Garbage collector (GC), we observed a false-sharing between
work.full and work.empty. (referenced most from runtime.gcDrain and
runtime.getempty)
This false-sharing becomes worse and impact performance on multi core
system. On Intel Xeon 8480+ and default GC setting(GC=100), we can
observed top HITM>4% (by perf c2c)caused by it.
After resolveed this false-sharing issue, we can get performance 8%~9.7%
improved. Verify workloads:
DeathStarBench/hotelReservation: 9.7% of RPS improved
https://github.com/delimitrou/DeathStarBench/tree/master/hotelReservation
gRPC-go/benchmark: 8% of RPS improved
https://github.com/grpc/grpc-go/tree/master/benchmark