runtime: reduce linear search through pcvalue cache
This change introduces two optimizations together,
one for recursive and one for non-recursive stacks.
For recursive stacks, we introduce the new entry
at the beginning of the cache, so it can be found first.
This adds an extra read and write.
While we're here, switch from fastrandn, which does a multiply,
to fastrand % n, which does a shift.
For non-recursive stacks, split the cache from [16]pcvalueCacheEnt
into [2][8]pcvalueCacheEnt, and add a very cheap associative lookup.