internal/runtime/maps: use match to skip non-full slots in iteration
Iteration over swissmaps with low load (think map with large hint but
only one entry) is signicantly regressed vs old maps. See noswiss vs
swiss-tip below (+60%).
Currently we visit every single slot and individually check if the slot
is full or not.
We can do much better by using the control word to find all full slots
in a group in a single operation. This lets us skip completely empty
groups for instance.
Always using the control match approach is great for maps with low load,
but is a regression for mostly full maps. Mostly full maps have the
majority of slots full, so most calls to mapiternext will return the
next slot. In that case, doing the full group match on every call is
more expensive than checking the individual slot.
Thus we take a hybrid approach: on each call, we first check an
individual slot. If that slot is full, we're done. If that slot is
non-full, then we fall back to doing full group matches.
This trade-off works well. Both mostly empty and mostly full maps
perform nearly as well as doing all matching and all individual,
respectively.
The fast path is placed above the slow path loop rather than combined
(with some sort of `useMatch` variable) into a single loop to help the
compiler's code generation. The compiler really struggles with code
generation on a combined loop for some reason, yielding ~15% additional
instructions/op.