1. 20 Jan, 2022 1 commit
  2. 09 Nov, 2021 1 commit
  3. 07 Sep, 2021 1 commit
  4. 24 Aug, 2021 1 commit
  5. 08 Jul, 2021 1 commit
  6. 15 Jun, 2021 1 commit
  7. 15 Apr, 2021 1 commit
  8. 22 Feb, 2021 1 commit
    • Mythri A's avatar
      [turboprop] Reduce BytecodeBudgetInterrupt overhead from Turboprop · 5b783479
      Mythri A authored
      Earlier we used the same interrupt budget always and waited for higher
      number of ticks when tiering up from Turboprop to TurboFan. On some of
      the real world pages this adds a reasonable overhead for processing
      these interrupts. This cl sets the interrupt budget to a higher value so
      there are fewer interrupts. This cl:
      1. Sets the interrupt budget on feedback cell to
      FLAG_interrupt_budget * scale factor when we install optimized code.
      2. Resets the budget to FLAG_interrupt_budget when there is a
      deoptimization.
      3. Updates the runtime profiler to remove the scaling of number of ticks
      needed for optimization when tiering up from TP to TF.
      
      On sheets benchmark, we spend 40-50ms when servicing interrupts from
      Turboprop code. This change brings it down to ~7ms. We also see
      improvements on other pages.
      
      
      Bug: v8:9684
      Change-Id: Ia3e5e998d1fff44f2e08a240a8769b7ebe794da2
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2696661
      Commit-Queue: Mythri Alle <mythria@chromium.org>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#72906}
      5b783479
  9. 18 Feb, 2021 1 commit
  10. 17 Feb, 2021 1 commit
    • Seth Brenith's avatar
      Reland "[interpreter] Short Star bytecode" · 7be64db4
      Seth Brenith authored
      This is a reland of cf93071c
      
      Original change's description:
      > [interpreter] Short Star bytecode
      >
      > Design doc:
      > https://docs.google.com/document/d/1g_NExMT78II_KnIYNa9MvyPYIj23qAiFUEsyemY5KRk/edit
      >
      > This change adds 16 new interpreter opcodes, kStar0 through kStar15, so
      > that we can use a single byte to represent the common operation of
      > storing to a low-numbered register. This generally reduces the quantity
      > of bytecode generated on web sites by 8-9%.
      >
      > In order to not degrade speed, a couple of other changes are required:
      >
      > The existing lookahead logic to check for Star after certain other
      > bytecode handlers is updated to check for these new short Star codes
      > instead. Furthermore, that lookahead logic is updated to contain its own
      > copy of the dispatch jump rather than merging control flow with the
      > lookahead-failed case, to improve branch prediction.
      >
      > A bunch of constants use bytecode size in bytes as a proxy for the size
      > or complexity of a function, and are adjusted downward proportionally to
      > the decrease in generated bytecode size.
      >
      > Other small drive-by fix: update generate-bytecode-expectations to emit
      > \n instead of \r\n on Windows.
      >
      > Change-Id: I6307c2b0f5794a3a1088bb0fb94f6e1615441ed5
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2641180
      > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
      > Cr-Commit-Position: refs/heads/master@{#72773}
      
      Change-Id: I1afb670c25694498b3989de615858f984a8c7f6f
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2698057
      Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Reviewed-by: 's avatarMythri Alle <mythria@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#72821}
      7be64db4
  11. 16 Feb, 2021 2 commits
    • Leszek Swirski's avatar
      Revert "[interpreter] Short Star bytecode" · 08a49bbe
      Leszek Swirski authored
      This reverts commit cf93071c.
      
      Reason for revert: Speculative revert because of Mac4 GC stress failure: https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Mac64%20GC%20Stress/16697/overview
      
      Original change's description:
      > [interpreter] Short Star bytecode
      >
      > Design doc:
      > https://docs.google.com/document/d/1g_NExMT78II_KnIYNa9MvyPYIj23qAiFUEsyemY5KRk/edit
      >
      > This change adds 16 new interpreter opcodes, kStar0 through kStar15, so
      > that we can use a single byte to represent the common operation of
      > storing to a low-numbered register. This generally reduces the quantity
      > of bytecode generated on web sites by 8-9%.
      >
      > In order to not degrade speed, a couple of other changes are required:
      >
      > The existing lookahead logic to check for Star after certain other
      > bytecode handlers is updated to check for these new short Star codes
      > instead. Furthermore, that lookahead logic is updated to contain its own
      > copy of the dispatch jump rather than merging control flow with the
      > lookahead-failed case, to improve branch prediction.
      >
      > A bunch of constants use bytecode size in bytes as a proxy for the size
      > or complexity of a function, and are adjusted downward proportionally to
      > the decrease in generated bytecode size.
      >
      > Other small drive-by fix: update generate-bytecode-expectations to emit
      > \n instead of \r\n on Windows.
      >
      > Change-Id: I6307c2b0f5794a3a1088bb0fb94f6e1615441ed5
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2641180
      > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
      > Cr-Commit-Position: refs/heads/master@{#72773}
      
      TBR=rmcilroy@chromium.org,mythria@chromium.org,seth.brenith@microsoft.com
      
      Change-Id: I0162b9400861b90bacef27cca9aebc8ab9d74c10
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2697350Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
      Commit-Queue: Leszek Swirski <leszeks@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#72777}
      08a49bbe
    • Seth Brenith's avatar
      [interpreter] Short Star bytecode · cf93071c
      Seth Brenith authored
      Design doc:
      https://docs.google.com/document/d/1g_NExMT78II_KnIYNa9MvyPYIj23qAiFUEsyemY5KRk/edit
      
      This change adds 16 new interpreter opcodes, kStar0 through kStar15, so
      that we can use a single byte to represent the common operation of
      storing to a low-numbered register. This generally reduces the quantity
      of bytecode generated on web sites by 8-9%.
      
      In order to not degrade speed, a couple of other changes are required:
      
      The existing lookahead logic to check for Star after certain other
      bytecode handlers is updated to check for these new short Star codes
      instead. Furthermore, that lookahead logic is updated to contain its own
      copy of the dispatch jump rather than merging control flow with the
      lookahead-failed case, to improve branch prediction.
      
      A bunch of constants use bytecode size in bytes as a proxy for the size
      or complexity of a function, and are adjusted downward proportionally to
      the decrease in generated bytecode size.
      
      Other small drive-by fix: update generate-bytecode-expectations to emit
      \n instead of \r\n on Windows.
      
      Change-Id: I6307c2b0f5794a3a1088bb0fb94f6e1615441ed5
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2641180Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
      Cr-Commit-Position: refs/heads/master@{#72773}
      cf93071c
  12. 15 Feb, 2021 1 commit
  13. 12 Feb, 2021 1 commit
  14. 29 Jan, 2021 1 commit
  15. 25 Jan, 2021 1 commit
    • Mythri A's avatar
      [turboprop] Delay optimizing functions that get hot slower · 502419a8
      Mythri A authored
      Functions that get hot quickly are more likely to stay hot and stable,
      so optimize these functions earlier than the function that become
      hot slower. To measure how "soon" the function gets hot this cl
      introduces a global tick that is incremented whenever a function
      registers a tick. We use the difference in the global tick between the
      current tick and the last tick on that function to measure how soon
      the function is becoming hot. We use the last tick to account for
      functions that aren't used so much at the start but become hot
      in a later phase. Currently we use this heuristic only for Turboprop
      tierups. It is possible to extend this to extend this to Turbofan in
      future.
      
      Bug: v8:9684
      Change-Id: I8ef265c03520274c68d56a9d35429531a3ba3d1d
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2627850
      Commit-Queue: Mythri Alle <mythria@chromium.org>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#72281}
      502419a8
  16. 18 Jan, 2021 1 commit
  17. 15 Jan, 2021 1 commit
    • Mythri A's avatar
      Reland "[turboprop] Enable tierup to TurboFan with FLAG_turboprop" · 3a6920d2
      Mythri A authored
      This is a reland of e38cb757. This
      was reverted as a potential culprit for a wasm failure. The
      actual revert that fixed the bots is here:
      https://chromium-review.googlesource.com/c/v8/v8/+/2630736.
      This should be safe to reland. I verified locally that the test is
      failing with or without this change.
      
      Original change's description:
      > [turboprop] Enable tierup to TurboFan with FLAG_turboprop
      >
      > FLAG_turboprop was used to test the turboprop compiler without any
      > further tierup to TurboFan. This cl changes:
      > - FLAG_turboprop to also tier up to TurboFan.
      > - Introduces FLAG_turboprop_as_toptier to continue running the
      >   configuration without tierup.
      > - Removes FLAG_turboprop_as_midtier which is same as FLAG_turboprop.
      >
      > Bug: v8:9684
      > Change-Id: I487bda13d226434837770ecc43b3ced7c31ccf19
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2622214
      > Commit-Queue: Mythri Alle <mythria@chromium.org>
      > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > Reviewed-by: Jakob Gruber <jgruber@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#72101}
      
      Bug: v8:9684
      Change-Id: I8b61fd8e562190c3c7bf5a003273f2a058542dad
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2632588
      Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#72110}
      3a6920d2
  18. 14 Jan, 2021 2 commits
  19. 17 Dec, 2020 1 commit
  20. 02 Dec, 2020 1 commit
  21. 27 Nov, 2020 1 commit
  22. 26 Nov, 2020 1 commit
  23. 24 Nov, 2020 1 commit
  24. 11 Nov, 2020 1 commit
    • Mythri A's avatar
      [turboprop] Adjust OSR heuristics for Turboprop · 301b354e
      Mythri A authored
      Turboprop should tierup to OSR roughly at the same time as TurboFan,
      so we wait for kProfilerTicksForTurboPropOSR ticks before OSRing. This
      value was incorrect because we OSR after 4 ticks (we increment the ticks
      after the tiering up decision). Also, we wait for additional ticks based
      on function size. That should also adjust for the lower interrupt budget
      on Turboprop.
      
      Bug: v8:9684
      Change-Id: I84c0afadd0562e598bbbe1c0cf904d7488c70261
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2532295
      Commit-Queue: Mythri Alle <mythria@chromium.org>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71135}
      301b354e
  25. 05 Nov, 2020 1 commit
  26. 02 Nov, 2020 1 commit
  27. 30 Sep, 2020 1 commit
    • Jakob Gruber's avatar
      [turboprop] Add TURBOPROP code kind · 75b8c238
      Jakob Gruber authored
      Turboprop-generated Code objects will now have the dedicated
      TURBOPROP code kind instead of OPTIMIZED_FUNCTION. When possible,
      the code kind is used as the source of truth instead of
      FLAG_turboprop. This is the initial step towards implementing
      tier-up from Turboprop to Turbofan.
      
      Future work: Rename OPTIMIZED_FUNCTION to TURBOFAN, rename STUB to
      DEOPT_ENTRIES_OR_FOR_TESTING, implement TP tier-up.
      
      No-Try: true
      Bug: v8:9684
      Cq-Include-Trybots: luci.v8.try:v8_linux64_fyi_rel_ng
      Change-Id: I3c9308718d7e9a2b7e6796e7ea94f17e5ff84c0a
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2424140
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarMythri Alle <mythria@chromium.org>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70213}
      75b8c238
  28. 02 Sep, 2020 1 commit
  29. 20 Aug, 2020 1 commit
  30. 19 Aug, 2020 1 commit
    • Jakob Gruber's avatar
      [nci] Implement tier-up, part 2 (marking) · 1096e031
      Jakob Gruber authored
      This is part two of the implementation (part 1: heuristics in NCI code
      to call the runtime profiler, part 2: heuristics in the runtime
      profiler to mark the function for optimization, part 3: the final
      part, recognizing and acting upon the marked function).
      
      The runtime profiler heuristics added here remain very similar to what
      we have for ignition, except that we now inspect optimized frames with
      NCI code, and that we (currently) do not OSR from NCI to TF.
      
      Bug: v8:8888
      Change-Id: Ie88b0a0dcee16334cea585c771a4b505035f2291
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2358748
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarMythri Alle <mythria@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#69484}
      1096e031
  31. 11 Aug, 2020 1 commit
  32. 05 Aug, 2020 1 commit
    • Mythri A's avatar
      [turboprop] Change heuristics for OSRing in TurboProp · bd9609a0
      Mythri A authored
      Change the heuristics for OSRing in TurboProp. Currently we OSR if
      a funciton is already optimized / marked for optimization but is still
      running optimized code. Since TurboProp optimizes much earlier than
      TurboFan using the same heuristics would cause us to OSR more often
      than required. This cl adds an additional check on the number of ticks
      to make sure the function is hot enough for OSRing.
      
      Bug: v8:9684
      Change-Id: I7a1c8229182a928fd85efb23e2d385413c5209ef
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2339098
      Commit-Queue: Mythri Alle <mythria@chromium.org>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#69252}
      bd9609a0
  33. 29 Jul, 2020 1 commit
    • Jakob Gruber's avatar
      [nci] Update interrupt budget from NCI code · 980e224a
      Jakob Gruber authored
      This is the first step towards implementing a tier-up mechanism from
      NCI code to TF. We will follow the existing Ignition-to-Turbofan
      mechanics, which are, roughly:
      
      1. Track a bytecode interrupt budget.
      2. When exhausted, call the runtime profiler, which increments
         profiler ticks for the top frame's function.
      3. When a function should tier up, it is marked as such using the
         FeedbackVector::optimized_code_weak_or_smi slot / the
         OptimizationMarker mechanism.
      4. The InterpreterEntryTrampoline checks this slot and calls into
         runtime to compile if needed.
      5. The finished code is also placed into this slot, as well as
         installed on the JSFunction.
      6. Again, the IET checks the slot and tail-calls the code object if it
         exists.
      
      This CL implements step 1 for NCI code by inserting the new simplified
      UpdateInterruptBudget operator at the same spots (and using the same
      offsets) as Ignition. When the budget is exhausted, we call a runtime
      function that currently does nothing and will be implemented in the
      next CL.
      
      Bug: v8:8888
      Change-Id: I98c0f8d96f32d515218dc2a76f961d44fe281c86
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2312778
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarGeorg Neis <neis@chromium.org>
      Reviewed-by: 's avatarMythri Alle <mythria@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#69124}
      980e224a
  34. 16 Mar, 2020 1 commit
  35. 10 Jan, 2020 1 commit
  36. 02 Dec, 2019 1 commit
  37. 04 Sep, 2019 1 commit
    • Tobias Tebbi's avatar
      Revert "[compiler] improve inlining heuristics: call frequency per executed bytecodes" · eb443e1f
      Tobias Tebbi authored
      This reverts commit 352a154e.
      
      Reason for revert: https://crbug.com/999972
      
      Original change's description:
      > [compiler] improve inlining heuristics: call frequency per executed bytecodes
      > 
      > TLDR: Inline less, but more where it matters. ~10% decrease in Turbofan
      > compile time including off-thread, while improving Octane scores by ~2%.
      > 
      > How things used to work:
      > 
      > There is a flag FLAG_min_inlining_frequency that limits inlining by
      > the callsite being sufficiently frequently executed. This call frequency
      > was measured relative to invocations of the parent (= the function we
      > originally optimize). At the same time, the limit was very low (0.15),
      > meaning we mostly relied on the total amount of inlined code
      > (FLAG_max_inlined_bytecode_size_cumulative) to limit inlining.
      > 
      > How things work now:
      > 
      > Instead of measuring call frequency relative to parent invocations, we
      > should have a measure that predicts how often the callsite in question
      > will be executed in the future. An obvious attempt at that would be to
      > measure how often the callsite was executed in absolute numbers in the
      > past. But depending on how fast feedback stabilizes, it can take more
      > or less time until we optimize a function. If we just take the absolute
      > call frequency up to the point in time when we optimize, we would
      > inline more for functions that stabilize slowly, which doesn't make
      > sense. So instead, we measure absolute call count per KB of executed
      > bytecodes of the parent function.
      > Since inlining big functions is more expensive, this threshold is
      > additionally scaled linearly with the bytecode-size of the inlinee.
      > The resulting formula is:
      > call_frequency >
      > FLAG_min_inlining_frequency *
      >   (bytecode.length() - FLAG_max_inlined_bytecode_size_small) /
      >   (FLAG_max_inlined_bytecode_size - FLAG_max_inlined_bytecode_size_small)
      > 
      > The new threshold is chosen in a way that it effectively limits
      > inlining, which allows us to increase
      > FLAG_max_inlined_bytecode_size_cumulative without increasing inlining
      > in general.
      > 
      > The reduction in compile time (x64 build) of ~10% was observed in Octane,
      > ARES-6, web-tooling-benchmark, and the standalone TypeScript benchmark.
      > The hope is that this will reduce CPU-time in real-world situations
      > too.
      > The Octane improvements come from inlining more in places where it
      > matters.
      > 
      > Bug: v8:6682
      > 
      > Change-Id: I99baa17dec85b71616a3ab3414d7e055beca39a0
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1768366
      > Commit-Queue: Tobias Tebbi <tebbi@chromium.org>
      > Reviewed-by: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > Reviewed-by: Georg Neis <neis@chromium.org>
      > Reviewed-by: Maya Lekova <mslekova@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#63449}
      
      TBR=rmcilroy@chromium.org,neis@chromium.org,jgruber@chromium.org,tebbi@chromium.org,mslekova@chromium.org
      
      # Not skipping CQ checks because original CL landed > 1 day ago.
      
      Bug: v8:6682 chromium:999972
      Change-Id: Iffca63d4bef81afa0f66e34d35fb72f3b5baf517
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1784281Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Commit-Queue: Tobias Tebbi <tebbi@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#63554}
      eb443e1f
  38. 29 Aug, 2019 1 commit
    • Tobias Tebbi's avatar
      [compiler] improve inlining heuristics: call frequency per executed bytecodes · 352a154e
      Tobias Tebbi authored
      TLDR: Inline less, but more where it matters. ~10% decrease in Turbofan
      compile time including off-thread, while improving Octane scores by ~2%.
      
      How things used to work:
      
      There is a flag FLAG_min_inlining_frequency that limits inlining by
      the callsite being sufficiently frequently executed. This call frequency
      was measured relative to invocations of the parent (= the function we
      originally optimize). At the same time, the limit was very low (0.15),
      meaning we mostly relied on the total amount of inlined code
      (FLAG_max_inlined_bytecode_size_cumulative) to limit inlining.
      
      How things work now:
      
      Instead of measuring call frequency relative to parent invocations, we
      should have a measure that predicts how often the callsite in question
      will be executed in the future. An obvious attempt at that would be to
      measure how often the callsite was executed in absolute numbers in the
      past. But depending on how fast feedback stabilizes, it can take more
      or less time until we optimize a function. If we just take the absolute
      call frequency up to the point in time when we optimize, we would
      inline more for functions that stabilize slowly, which doesn't make
      sense. So instead, we measure absolute call count per KB of executed
      bytecodes of the parent function.
      Since inlining big functions is more expensive, this threshold is
      additionally scaled linearly with the bytecode-size of the inlinee.
      The resulting formula is:
      call_frequency >
      FLAG_min_inlining_frequency *
        (bytecode.length() - FLAG_max_inlined_bytecode_size_small) /
        (FLAG_max_inlined_bytecode_size - FLAG_max_inlined_bytecode_size_small)
      
      The new threshold is chosen in a way that it effectively limits
      inlining, which allows us to increase
      FLAG_max_inlined_bytecode_size_cumulative without increasing inlining
      in general.
      
      The reduction in compile time (x64 build) of ~10% was observed in Octane,
      ARES-6, web-tooling-benchmark, and the standalone TypeScript benchmark.
      The hope is that this will reduce CPU-time in real-world situations
      too.
      The Octane improvements come from inlining more in places where it
      matters.
      
      Bug: v8:6682
      
      Change-Id: I99baa17dec85b71616a3ab3414d7e055beca39a0
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1768366
      Commit-Queue: Tobias Tebbi <tebbi@chromium.org>
      Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Reviewed-by: 's avatarGeorg Neis <neis@chromium.org>
      Reviewed-by: 's avatarMaya Lekova <mslekova@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#63449}
      352a154e