A comment in waitgroup.go describes the following scenario
as the reason to have dynamically created semaphores:
// G1: Add(1)
// G1: go G2()
// G1: Wait() // Context switch after Unlock() and before Semacquire().
// G2: Done() // Release semaphore: sema == 1, waiters == 0. G1 doesn't run yet.
// G3: Wait() // Finds counter == 0, waiters == 0, doesn't block.
// G3: Add(1) // Makes counter == 1, waiters == 0.
// G3: go G4()
// G3: Wait() // G1 still hasn't run, G3 finds sema == 1, unblocked! Bug.
However, the scenario is incorrect:
G3: Add(1) happens concurrently with G1: Wait(),
and so there is no reasonable behavior of the program
(G1: Wait() may or may not wait for G3: Add(1) which
can't be the intended behavior).
With this conclusion we can:
1. Remove dynamic allocation of semaphores.
2. Remove the mutex entirely and instead pack counter and waiters
into single uint64.
This makes the logic significantly simpler, both Add and Wait
do only a single atomic RMW to update the state.