• Tobias Tebbi's avatar
    [compiler] improve inlining heuristics: call frequency per executed bytecodes · 352a154e
    Tobias Tebbi authored
    TLDR: Inline less, but more where it matters. ~10% decrease in Turbofan
    compile time including off-thread, while improving Octane scores by ~2%.
    
    How things used to work:
    
    There is a flag FLAG_min_inlining_frequency that limits inlining by
    the callsite being sufficiently frequently executed. This call frequency
    was measured relative to invocations of the parent (= the function we
    originally optimize). At the same time, the limit was very low (0.15),
    meaning we mostly relied on the total amount of inlined code
    (FLAG_max_inlined_bytecode_size_cumulative) to limit inlining.
    
    How things work now:
    
    Instead of measuring call frequency relative to parent invocations, we
    should have a measure that predicts how often the callsite in question
    will be executed in the future. An obvious attempt at that would be to
    measure how often the callsite was executed in absolute numbers in the
    past. But depending on how fast feedback stabilizes, it can take more
    or less time until we optimize a function. If we just take the absolute
    call frequency up to the point in time when we optimize, we would
    inline more for functions that stabilize slowly, which doesn't make
    sense. So instead, we measure absolute call count per KB of executed
    bytecodes of the parent function.
    Since inlining big functions is more expensive, this threshold is
    additionally scaled linearly with the bytecode-size of the inlinee.
    The resulting formula is:
    call_frequency >
    FLAG_min_inlining_frequency *
      (bytecode.length() - FLAG_max_inlined_bytecode_size_small) /
      (FLAG_max_inlined_bytecode_size - FLAG_max_inlined_bytecode_size_small)
    
    The new threshold is chosen in a way that it effectively limits
    inlining, which allows us to increase
    FLAG_max_inlined_bytecode_size_cumulative without increasing inlining
    in general.
    
    The reduction in compile time (x64 build) of ~10% was observed in Octane,
    ARES-6, web-tooling-benchmark, and the standalone TypeScript benchmark.
    The hope is that this will reduce CPU-time in real-world situations
    too.
    The Octane improvements come from inlining more in places where it
    matters.
    
    Bug: v8:6682
    
    Change-Id: I99baa17dec85b71616a3ab3414d7e055beca39a0
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1768366
    Commit-Queue: Tobias Tebbi <tebbi@chromium.org>
    Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
    Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
    Reviewed-by: 's avatarGeorg Neis <neis@chromium.org>
    Reviewed-by: 's avatarMaya Lekova <mslekova@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#63449}
    352a154e
js-inlining-heuristic.cc 30.7 KB