1. 04 Mar, 2019 1 commit
  2. 08 Oct, 2018 1 commit
    • Benedikt Meurer's avatar
      [turbofan] Escape analysis support for LoadElement with variable index. · 3e43ded9
      Benedikt Meurer authored
      This adds support to the escape analysis to allow scalar replacement
      of (small) FixedArrays with element accesses where the index is not a
      compile time constant. This happens quite often when inlining functions
      that operate on variable number of arguments. For example consider this
      little piece of code:
      
      ```js
      function sum(...args) {
        let s = 0;
        for (let i = 0; i < args.length; ++i) s += args[i];
        return s;
      }
      
      function sum2(x, y) {
        return sum(x, y);
      }
      ```
      
      This example is made up, of course, but it shows the problem. Let's
      assume that TurboFan inlines the function `sum` into it's call site
      at `sum2`. Now it has to materialize the `args` array with the two
      values `x` and `y`, and iterate through these `args` to sum them up.
      The escape analysis pass figures out that `args` doesn't escape (aka
      doesn't outlive) the optimized code for `sum2` now, but TurboFan still
      needs to materialize the elements backing store for `args` since there's
      a `LoadElement(args.elements,i)` in the graph now, and `i` is not a
      compile time constant.
      
      However the escape analysis has more information than just that. In
      particular the escape analysis knows exactly how many elements a non
      escaping object has, based on the fact that the allocation must be
      local to the function and that we only track objects with known size.
      So in the case above when we get to `args[i]` in the escape analysis
      the relevant part of the graph looks something like this:
      
      ```
      elements = LoadField[elements](args)
      length = LoadField[length](args)
      index = CheckBounds(i, length)
      value = LoadElement(elements, index)
      ```
      
      In particular the contract here is that `LoadElement(elements,index)`
      is guaranteed to have an `index` that is within the valid bounds for
      the `elements` (there must be a preceeding `CheckBounds` or some other
      guard in optimized code before it). And since `elements` is allocated
      inside of the optimized code object, the escape analysis also knows
      that `elements` has exactly two elements inside (namely the values of
      `x` and `y`). So we can use that information and replace the access
      with a `Select(index===0,x,y)` operation instead, which allows us to
      scalar replace the `elements`, since there's no escaping use anymore
      in the graph.
      
      We do this for the case that the number of elements is 2, as described
      above, but also for the case where elements length is one. In case
      of 0, we know that the `LoadElement` must be in dead code, but we can't
      just mark it for deletion from the graph (to make sure it doesn't block
      scalar replacement of non-dead code), so we don't handle this for now.
      And for one element it's even easier, since the `LoadElement` has to
      yield exactly said element.
      
      We could generalize this to handle arbitrary lengths, but since there's
      a cost to arbitrary decision trees here, it's unclear when this is still
      beneficial. Another possible solution for length > 2 would be to have
      special stack allocation for these backing stores and do variable index
      accesses to these stack areas. But that's way beyond the scope of this
      isolated change.
      
      This change shows a ~2% improvement on the EarleyBoyer benchmark in
      JetStream, since it benefits a lot from not having to materialize these
      small arguments backing stores.
      
      Drive-by-fix: Fix JSCreateLowering to properly initialize "elements"
      with StoreElement instead of StoreField (which violates the invariant
      in TurboFan that fields and elements never alias).
      
      Bug: v8:5267, v8:6200
      Change-Id: Idd464a15a81e7c9653c48c814b406eb859841428
      Reviewed-on: https://chromium-review.googlesource.com/c/1267935
      Commit-Queue: Benedikt Meurer <bmeurer@chromium.org>
      Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#56442}
      3e43ded9
  3. 13 Jul, 2017 3 commits
  4. 12 Jul, 2017 1 commit
  5. 30 Jun, 2017 1 commit
  6. 23 Jun, 2017 1 commit
  7. 16 Jun, 2017 1 commit
  8. 30 Jan, 2014 1 commit
    • jarin@chromium.org's avatar
      The current · 99ce5a24
      jarin@chromium.org authored
      version is passing all the existing test + a bunch of new tests
      (packaged in the change list, too).
      
      The patch extends the SlotRef object to describe captured and duplicated
      objects. Since the SlotRefs are not independent of each other anymore,
      there is a new SlotRefValueBuilder class that stores the SlotRefs and
      later materializes the objects from the SlotRefs.
      
      Note that unlike the previous implementation of SlotRefs, we now build
      the SlotRef entries for the entire frame, not just the particular
      function.  This is because duplicate objects might refer to previous
      captured objects (that might live inside other inlined function's part
      of the frame).
      
      We also need to store the materialized objects between other potential
      invocations of the same arguments object so that we materialize each
      captured object at most once.  The materialized objects of frames live
      in the new MaterielizedObjectStore object (contained in Isolate),
      indexed by the frame's FP address.  Each argument materialization (and
      deoptimization) tries to lookup its captured objects in the store before
      building new ones.  Deoptimization also removes the materialized objects
      from the store. We also schedule a lazy deopt to be sure that we always
      get rid of the materialized objects and that the optmized function
      adopts the materialized objects (instead of happily computing with its
      captured representations).
      
      Concerns:
      
      - Is the FP address the right key for a frame? (Note that deoptimizer's
      representation of frame is different from the argument object
      materializer's one - it is not easy to find common ground.)
      
      - Performance is suboptimal in several places, but a quick local run of
      benchmarks does not seem to show a perf hit. Examples of possible
      improvements: smarter generation of SlotRefs (build other functions'
      SlotRefs only for captured objects and only if necessary), smarter
      lookup of stored materialized objects.
      
      - Ideally, we would like to share the code for argument materialization
      with deoptimizer's materializer.  However, the supporting data structures
      (mainly the frame descriptor) are quite different in each case, so it
      looks more like a separate project.
      
      Thanks for any feedback.
      
      R=danno@chromium.org, mstarzinger@chromium.org
      LOG=N
      BUG=
      
      Committed: https://code.google.com/p/v8/source/detail?r=18918
      
      Review URL: https://codereview.chromium.org/103243005
      
      git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@18936 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
      99ce5a24
  9. 29 Jan, 2014 2 commits
    • jarin@chromium.org's avatar
      Revert "Captured arguments object materialization" · ec51f26b
      jarin@chromium.org authored
      R=jarin@chromium.org
      
      Review URL: https://codereview.chromium.org/130803009
      
      git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@18923 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
      ec51f26b
    • jarin@chromium.org's avatar
      This is a preview of the captured arguments object materialization, · 868ad01e
      jarin@chromium.org authored
      mostly to make sure that it is going in the right direction. The current
      version is passing all the existing test + a bunch of new tests
      (packaged in the change list, too).
      
      The patch extends the SlotRef object to describe captured and duplicated
      objects. Since the SlotRefs are not independent of each other anymore,
      there is a new SlotRefValueBuilder class that stores the SlotRefs and
      later materializes the objects from the SlotRefs.
      
      Note that unlike the previous implementation of SlotRefs, we now build
      the SlotRef entries for the entire frame, not just the particular
      function.  This is because duplicate objects might refer to previous
      captured objects (that might live inside other inlined function's part
      of the frame).
      
      We also need to store the materialized objects between other potential
      invocations of the same arguments object so that we materialize each
      captured object at most once.  The materialized objects of frames live
      in the new MaterielizedObjectStore object (contained in Isolate),
      indexed by the frame's FP address.  Each argument materialization (and
      deoptimization) tries to lookup its captured objects in the store before
      building new ones.  Deoptimization also removes the materialized objects
      from the store. We also schedule a lazy deopt to be sure that we always
      get rid of the materialized objects and that the optmized function
      adopts the materialized objects (instead of happily computing with its
      captured representations).
      
      Concerns:
      
      - Is there a simpler/more correct way to store the already-materialized
      objects? (At the moment there is a custom root reference to JSArray
      containing frames' FixedArrays with their captured objects.)
      
      - Is the FP address the right key for a frame? (Note that deoptimizer's
      representation of frame is different from the argument object
      materializer's one - it is not easy to find common ground.)
      
      - Performance is suboptimal in several places, but a quick local run of
      benchmarks does not seem to show a perf hit. Examples of possible
      improvements: smarter generation of SlotRefs (build other functions'
      SlotRefs only for captured objects and only if necessary), smarter
      lookup of stored materialized objects.
      
      - Ideally, we would like to share the code for argument materialization
      with deoptimizer's materializer.  However, the supporting data structures
      (mainly the frame descriptor) are quite different in each case, so it
      looks more like a separate project.
      
      Thanks for any feedback.
      
      R=mstarzinger@chromium.org, danno@chromium.org
      LOG=N
      BUG=
      
      Review URL: https://codereview.chromium.org/103243005
      
      git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@18918 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
      868ad01e