cmd/compile,cmd/preprofile: move logic to shared common package
The processing performed in cmd/preprofile is a simple version of the
same initial processing performed by cmd/compile/internal/pgo. Refactor
this processing into the new IR-independent cmd/internal/pgo package.
Now cmd/preprofile and cmd/compile run the same code for initial
processing of a pprof profile, guaranteeing that they always stay in
sync.
Since it is now trivial, this CL makes one change to the serialization
format: the entries are ordered by weight. This allows us to avoid
sorting ByWeight on deserialization.
Impact on PGO parsing when compiling cmd/compile with PGO:
* Without preprocessing: PGO parsing ~13.7% of CPU time
* With preprocessing (unsorted): ~2.9% of CPU time (sorting ~1.7%)
* With preprocessing (sorted): ~1.3% of CPU time
The remaining 1.3% of CPU time approximately breaks down as:
* ~0.5% parsing the preprocessed profile
* ~0.7% building weighted IR call graph
* ~0.5% walking function IR to find direct calls
* ~0.2% performing lookups for indirect calls targets
For #58102.
Change-Id: Iaba425ea30b063ca195fb2f7b29342961c8a64c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/569337
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>