runtime: add fast check for self-loop pointer in scanobject
Addresses a problem reported on the mailing list.
This will come up mainly in programs custom allocators that batch allocations,
but it still helps in our programs, which mainly do not have such allocations.
In contrast to the one in test/bench/go1 (above), the binarytree program on the
shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS
to runtime.NumCPU()*2.
Using that version, before vs after:
name old mean new mean delta
BinaryTree20 18.6s × (0.96,1.05) 11.3s × (0.98,1.02) -39.46% (p=0.000)
And Go 1.4 vs after:
name old mean new mean delta
BinaryTree20 13.0s × (0.97,1.02) 11.3s × (0.98,1.02) -13.21% (p=0.000)
There is still a scheduling problem - the raw run times are hiding the fact that
this chews up 2x the CPU - but we'll take care of that separately.