src/builtins/base.tq · f884e2faab281624e99a83d6af779069eb9b1280 · Linshizhi / V8

[compiler] improve inlining heuristics: call frequency per executed bytecodes · 352a154e
Tobias Tebbi authored Aug 29, 2019
TLDR: Inline less, but more where it matters. ~10% decrease in Turbofan
compile time including off-thread, while improving Octane scores by ~2%.

How things used to work:

There is a flag FLAG_min_inlining_frequency that limits inlining by
the callsite being sufficiently frequently executed. This call frequency
was measured relative to invocations of the parent (= the function we
originally optimize). At the same time, the limit was very low (0.15),
meaning we mostly relied on the total amount of inlined code
(FLAG_max_inlined_bytecode_size_cumulative) to limit inlining.

How things work now:

Instead of measuring call frequency relative to parent invocations, we
should have a measure that predicts how often the callsite in question
will be executed in the future. An obvious attempt at that would be to
measure how often the callsite was executed in absolute numbers in the
past. But depending on how fast feedback stabilizes, it can take more
or less time until we optimize a function. If we just take the absolute
call frequency up to the point in time when we optimize, we would
inline more for functions that stabilize slowly, which doesn't make
sense. So instead, we measure absolute call count per KB of executed
bytecodes of the parent function.
Since inlining big functions is more expensive, this threshold is
additionally scaled linearly with the bytecode-size of the inlinee.
The resulting formula is:
call_frequency >
FLAG_min_inlining_frequency *
  (bytecode.length() - FLAG_max_inlined_bytecode_size_small) /
  (FLAG_max_inlined_bytecode_size - FLAG_max_inlined_bytecode_size_small)

The new threshold is chosen in a way that it effectively limits
inlining, which allows us to increase
FLAG_max_inlined_bytecode_size_cumulative without increasing inlining
in general.

The reduction in compile time (x64 build) of ~10% was observed in Octane,
ARES-6, web-tooling-benchmark, and the standalone TypeScript benchmark.
The hope is that this will reduce CPU-time in real-world situations
too.
The Octane improvements come from inlining more in places where it
matters.

Bug: v8:6682

Change-Id: I99baa17dec85b71616a3ab3414d7e055beca39a0
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1768366
Commit-Queue: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Reviewed-by: Maya Lekova <mslekova@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63449}
352a154e
base.tq 110 KB
Replace base.tq