]> Cypherpunks repositories - gostls13.git/commit
runtime: use backoff and ISB instruction to reduce contention in (*lfstack).pop and...
authorfanzha02 <fannie.zhang@arm.com>
Tue, 14 Jan 2025 09:32:56 +0000 (09:32 +0000)
committerMichael Knyszek <mknyszek@google.com>
Thu, 23 Oct 2025 00:02:28 +0000 (17:02 -0700)
commit50586182abd82ec724b9beb6b806610d08846d8e
tree95d75aa8a575d1aea4a4abe11f0a6dfbc375ab8b
parent1ff59f3dd3569e1225c9273fc205cb54df674bf5
runtime: use backoff and ISB instruction to reduce contention in (*lfstack).pop and (*spanSet).pop on arm64

When profiling CPU usage LiveKit on AArch64/x86 (AWS), the graphs show
CPU spikes that was repeating in a semi-periodic manner and spikes occur
when the GC(garbage collector) is active.

Our analysis found that the getempty function accounted for 10.54% of the
overhead, which was mainly caused by the work.empty.pop() function. And
listing pop shows that the majority of the time, with a 10.29% overhead,
is spent on atomic.Cas64((*uint64)(head), old, next).

This patch adds a backoff approach to reduce the high overhead of the
atomic operation primarily occurs when contention over a specific memory
address increases, typically with the rise in the number of threads.

Note that on paltforms other than arm64, the initial value of backoff is zero.

This patch rewrites the implementation of procyield() on arm64, which is an
Armv8.0-A compatible delay function using the counter-timer.

The garbage collector benchmark:

                           │    master       │               opt                        │
                           │   sec/op        │        sec/op     vs base                │
Garbage/benchmem-MB=64-160   3.782m ± 4%        2.264m ± 2%      -40.12% (p=0.000 n=10)
                           │ user+sys-sec/op │ user+sys-sec/op   vs base                │
Garbage/benchmem-MB=64-160   433.5m ± 4%        255.4m ± 2%      -41.08% (p=0.000 n=10)

Reference for backoff mechianism:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/multi-threaded-applications-arm

Change-Id: Ie8128a2243ceacbb82ab2a88941acbb8428bad94
Reviewed-on: https://go-review.googlesource.com/c/go/+/654895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
src/runtime/asm_arm64.s
src/runtime/lfstack.go
src/runtime/mspanset.go