-
Tobias Tebbi authored
This reverts commit 352a154e. Reason for revert: https://crbug.com/999972 Original change's description: > [compiler] improve inlining heuristics: call frequency per executed bytecodes > > TLDR: Inline less, but more where it matters. ~10% decrease in Turbofan > compile time including off-thread, while improving Octane scores by ~2%. > > How things used to work: > > There is a flag FLAG_min_inlining_frequency that limits inlining by > the callsite being sufficiently frequently executed. This call frequency > was measured relative to invocations of the parent (= the function we > originally optimize). At the same time, the limit was very low (0.15), > meaning we mostly relied on the total amount of inlined code > (FLAG_max_inlined_bytecode_size_cumulative) to limit inlining. > > How things work now: > > Instead of measuring call frequency relative to parent invocations, we > should have a measure that predicts how often the callsite in question > will be executed in the future. An obvious attempt at that would be to > measure how often the callsite was executed in absolute numbers in the > past. But depending on how fast feedback stabilizes, it can take more > or less time until we optimize a function. If we just take the absolute > call frequency up to the point in time when we optimize, we would > inline more for functions that stabilize slowly, which doesn't make > sense. So instead, we measure absolute call count per KB of executed > bytecodes of the parent function. > Since inlining big functions is more expensive, this threshold is > additionally scaled linearly with the bytecode-size of the inlinee. > The resulting formula is: > call_frequency > > FLAG_min_inlining_frequency * > (bytecode.length() - FLAG_max_inlined_bytecode_size_small) / > (FLAG_max_inlined_bytecode_size - FLAG_max_inlined_bytecode_size_small) > > The new threshold is chosen in a way that it effectively limits > inlining, which allows us to increase > FLAG_max_inlined_bytecode_size_cumulative without increasing inlining > in general. > > The reduction in compile time (x64 build) of ~10% was observed in Octane, > ARES-6, web-tooling-benchmark, and the standalone TypeScript benchmark. > The hope is that this will reduce CPU-time in real-world situations > too. > The Octane improvements come from inlining more in places where it > matters. > > Bug: v8:6682 > > Change-Id: I99baa17dec85b71616a3ab3414d7e055beca39a0 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1768366 > Commit-Queue: Tobias Tebbi <tebbi@chromium.org> > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Georg Neis <neis@chromium.org> > Reviewed-by: Maya Lekova <mslekova@chromium.org> > Cr-Commit-Position: refs/heads/master@{#63449} TBR=rmcilroy@chromium.org,neis@chromium.org,jgruber@chromium.org,tebbi@chromium.org,mslekova@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: v8:6682 chromium:999972 Change-Id: Iffca63d4bef81afa0f66e34d35fb72f3b5baf517 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1784281Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Commit-Queue: Tobias Tebbi <tebbi@chromium.org> Cr-Commit-Position: refs/heads/master@{#63554}
eb443e1f