cmd/internal/obj/arm64: fix the bug of incorrect handling negative offset of LDP/STP/LDPW/STPW
The current assembler will report error when the negative offset is in
the range of [-256, 0) and is not the multiples of 4/8.
The fix introduces C_NSAUTO_8, C_NSAUTO_4 and C_NAUTO4K. C_NPAUTO
includes C_NSAUTO_8 instead of C_NSAUTO, C_NAUTO4K includes C_NSAUTO_8,
C_NSAUTO_4 and C_NSAUTO. So that assembler will encode the negative offset
that is greater than -4095 and is not the multiples of 4/8 as two instructions.