From 788464724cb061198bab1fb4f7885eb46d38f847 Mon Sep 17 00:00:00 2001 From: Austin Clements Date: Fri, 23 Feb 2018 12:03:00 -0500 Subject: [PATCH] runtime: reduce arena size to 4MB on 64-bit Windows MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Currently, we use 64MB heap arenas on 64-bit platforms. This works well on UNIX-like OSes because they treat untouched pages as essentially free. However, on Windows, committed memory is charged against a process whether or not it has demand-faulted physical pages in. Hence, on Windows, even a process with a tiny heap will commit 64MB for one heap arena, plus another 32MB for the arena map. Things are much worse under the race detector, which increases the heap commitment by a factor of 5.5X, leading to 384MB of committed memory at runtime init. Fix this by reducing the heap arena size to 4MB on Windows. To counterbalance the effect of increasing the arena map size by a factor of 16, and to further reduce the impact of the commitment for the arena map, we switch from a single entry L1 arena map to a 64 entry L1 arena map. Compared to the original arena design, this slows down the x/benchmarks garbage benchmark by 0.49% (the slow down of this commit alone is 1.59%, but the previous commit bought us a 1% speed-up): name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.28ms ± 1% 2.29ms ± 1% +0.49% (p=0.000 n=17+18) (https://perf.golang.org/search?q=upload:20180223.1) (This was measured on linux/amd64 by modifying its arena configuration as above.) Fixes #23900. Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1 Reviewed-on: https://go-review.googlesource.com/96780 Run-TryBot: Austin Clements TryBot-Result: Gobot Gobot Reviewed-by: Rick Hudson --- src/runtime/malloc.go | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/src/runtime/malloc.go b/src/runtime/malloc.go index bad35116b0..6e04f50e1d 100644 --- a/src/runtime/malloc.go +++ b/src/runtime/malloc.go @@ -213,17 +213,38 @@ const ( // in a uintptr. maxAlloc = (1 << heapAddrBits) - (1-_64bit)*1 + // The number of bits in a heap address, the size of heap + // arenas, and the L1 and L2 arena map sizes are related by + // + // (1 << addrBits) = arenaBytes * L1entries * L2entries + // + // Currently, we balance these as follows: + // + // Platform Addr bits Arena size L1 entries L2 size + // -------------- --------- ---------- ---------- ------- + // */64-bit 48 64MB 1 32MB + // windows/64-bit 48 4MB 64 8MB + // */32-bit 32 4MB 1 4KB + // */mips(le) 31 4MB 1 2KB + // heapArenaBytes is the size of a heap arena. The heap // consists of mappings of size heapArenaBytes, aligned to // heapArenaBytes. The initial heap mapping is one arena. // - // This is currently 64MB on 64-bit and 4MB on 32-bit. + // This is currently 64MB on 64-bit non-Windows and 4MB on + // 32-bit and on Windows. We use smaller arenas on Windows + // because all committed memory is charged to the process, + // even if it's not touched. Hence, for processes with small + // heaps, the mapped arena space needs to be commensurate. + // This is particularly important with the race detector, + // since it significantly amplifies the cost of committed + // memory. heapArenaBytes = 1 << logHeapArenaBytes // logHeapArenaBytes is log_2 of heapArenaBytes. For clarity, // prefer using heapArenaBytes where possible (we need the // constant to compute some other constants). - logHeapArenaBytes = (6+20)*_64bit + (2+20)*(1-_64bit) + logHeapArenaBytes = (6+20)*(_64bit*(1-sys.GoosWindows)) + (2+20)*(_64bit*sys.GoosWindows) + (2+20)*(1-_64bit) // heapArenaBitmapBytes is the size of each heap arena's bitmap. heapArenaBitmapBytes = heapArenaBytes / (sys.PtrSize * 8 / 2) @@ -239,7 +260,11 @@ const ( // index is effectively unused. There is a performance benefit // to this, since the generated code can be more efficient, // but comes at the cost of having a large L2 mapping. - arenaL1Bits = 0 + // + // We use the L1 map on 64-bit Windows because the arena size + // is small, but the address space is still 48 bits, and + // there's a high cost to having a large L2. + arenaL1Bits = 6 * (_64bit * sys.GoosWindows) // arenaL2Bits is the number of bits of the arena number // covered by the second level arena index. -- 2.48.1