cmd/compile/internal/pgo: count only the last two frames as a call edge
Currently for every CPU profile sample, we apply its weight to all
call edges of the entire call stack. Frames higher up the stack
are unlikely to be repeated calls (e.g. runtime.main calling
main.main). So adding weights to call edges higher up the stack
may be not reflecting the actual call edge weights in the program.
This CL changes it to add weights to only the edge between the
last two frames.
Without a branch profile (e.g. LBR records) this is not perfect,
but seems more reasonable.