Currently, semrelease1 readies the next waiter before recording a
mutex event. However, if the next waiter is expecting to look at the
mutex profile, as is the case in TestMutexProfile, this may delay
recording the event too much.
Swap the order of these operations so semrelease1 records the mutex
event before readying the next waiter. This also means readying the
next waiter is the very last thing semrelease1 does, which seems
appropriate.