cmd/compile: use a different register for updated value in AtomicAnd8/Or8 on ARM64
ARM64 manual says it is "constrained unpredictable" if the src
and dst registers of STLXRB are same, although it doesn't seem
to cause any problem on real hardwares so far. Fix by allocating
a different register to hold the updated value for
AtomicAnd8/Or8. We do this by making the ops returns <val,mem>
like AtomicAdd, although val will not be used elsewhere.