1. 17 Dec, 2020 3 commits
    • Zhi An Ng's avatar
      [wasm-simd][x64] Optimize arch shuffle if AVX supported · 46ce9b05
      Zhi An Ng authored
      AVX has 3-operands shuffle/unpack operations. We currently always
      require that dst == src0 in all cases, which is not required if we have
      AVX. For the arch shuffles that map to a single native instruction, add
      support to check for AVX in the instruction-selector, to not require
      same as first, and in the code-gen to support generating AVX.
      
      The other arch shuffles are slightly more complicated, and can be
      optimized in a future change.
      
      Bug: v8:11270
      Change-Id: I25b271aeff71fbe860d5bcc8abb17c36bcdab32c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591858Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71820}
      46ce9b05
    • Zhi An Ng's avatar
      [x64][wasm-simd] Only require unique registers for shuffles that use temps · 12e37399
      Zhi An Ng authored
      An improvement to generic shuffle improvement
      (https://crrev.com/c/2152853) required a temporary SIMD register to hold
      the mask, rather than pushing it onto a stack. The temporary register
      requires that we UseUniqueRegister on the inputs, to prevent aliasing,
      as we will write to the temp. However, we only need this for the generic
      shuffle. We accidentally over-constraint all other pattern matched
      shuffles, since they don't use any temps.
      
      On a ~2000 line function containing ~150 shuffles (not all of which are
      generic shuffles), we get 16 less instruction in the native code, and
      actually see a very small improvement in the overall benchmarks.
      
      Bug: v8:11270
      Change-Id: I09974f7615e4b8f5e2416ed17ca47cc7613fd6b1
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591857Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71818}
      12e37399
    • Zhi An Ng's avatar
      [wasm-simd][ia32][x64] More optimization for f32x4.extract_lane · 741e5a66
      Zhi An Ng authored
      We can have more optimizations for this instruction, they leave some
      junk in the top lanes of dst, but that doesn't matter:
      
      - when lane is 1: we use movshdup, this is 4 bytes long
      - when lane is 2: use movhlps, this is 3 bytes long
      - otherwise use shufps (4 bytes) or pshufd (5 bytes)
      
      All of which are better than insertps (6 bytes).
      
      Change-Id: I0e524431d1832e297e8c8bb418d42382d93fa691
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591850
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71813}
      741e5a66
  2. 16 Dec, 2020 4 commits
  3. 15 Dec, 2020 1 commit
    • Zhi An Ng's avatar
      [x64][wasm-simd] Pattern match 32x4 rotate · 7c98abdb
      Zhi An Ng authored
      Code like:
      
        x = wasm_v32x4_shuffle(x, x, 1, 2, 3, 0);
      
      is currently matched by S8x16Concat, which lowers to two instructions:
      
        movapd xmm_dst, xmm_src
        palignr xmm_dst, xmm_src, 0x4
      
      There is a special case after a S8x16Concat is matched:.
      
      - is_swizzle, the inputs are the same
      - it is a 32x4 shuffle (offset % 4 == 0)
      
      Which can have a better codegen:
      
      - (dst == src) shufps dst, src, 0b00111001
      - (dst != src) pshufd dst, src, 0b00111001
      
      Add a new simd shuffle matcher which will match 32x4 rotate, and
      construct the appropriate indices referring to the 32x4 elements.
      
      pshufd for the given example. However, this matching happens after
      S8x16Concat, so we get the palignr first. We could move the pattern
      matching cases around, but it will lead to some cases where
      where it would have matched a S8x16Concat, but now matches a
      S32x4shuffle instead, leading to worse codegen.
      
      Note: we also pattern match on 32x4Swizzle, which correctly generates
      Change-Id: Ie3aca53bbc06826be2cf49632de4c24ec73d0a9a
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589062Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71754}
      7c98abdb
  4. 14 Dec, 2020 2 commits
  5. 10 Dec, 2020 3 commits
  6. 07 Dec, 2020 1 commit
  7. 03 Dec, 2020 1 commit
  8. 01 Dec, 2020 1 commit
  9. 17 Nov, 2020 1 commit
  10. 12 Nov, 2020 1 commit
    • Pierre Langlois's avatar
      [heap] Do not use V8_LIKELY on FLAG_disable_write_barriers. · 4a89c018
      Pierre Langlois authored
      FLAG_disable_write_barriers is a constexpr so the V8_LIKELY macro isn't
      necessary. Interestingly, it can also cause clang to warn that the code
      is unreachable, whereas without `__builtin_expect()` the compiler
      doesn't mind. See for example:
      
      ```
      constexpr bool kNo = false;
      
      void warns() {
        if (__builtin_expect(kNo, 0)) {
          int a = 42;
        }
      }
      
      void does_not_warn() {
        if (kNo) {
          int a = 42;
        }
      }
      ```
      
      Compiling V8 for arm64 with both `v8_disable_write_barriers = true` and
      `v8_enable_pointer_compression = false` would trigger this warning.
      
      Bug: v8:9533
      Change-Id: Id2ae156d60217007bb9ebf50628e8908e0193d05
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2534811Reviewed-by: 's avatarUlan Degenbaev <ulan@chromium.org>
      Reviewed-by: 's avatarGeorg Neis <neis@chromium.org>
      Commit-Queue: Pierre Langlois <pierre.langlois@arm.com>
      Cr-Commit-Position: refs/heads/master@{#71157}
      4a89c018
  11. 05 Nov, 2020 1 commit
    • Zhi An Ng's avatar
      [wasm-simd][x64] Optimize integer splats of constant 0 · 7d7b25d9
      Zhi An Ng authored
      Integer splats (especially for sizes < 32-bits) does not directly
      translate to a single instruction on x64. We can do better for special
      values, like 0, which can be lowered to `xor dst dst`. We do this check
      in the instruction selector, and emit a special opcode kX64S128Zero.
      
      Also change the xor operation for kX64S128Zero from xorps to pxor. This
      can help reduce any potential data bypass delay (search for this on
      agner's microarchitecture manual for more details.). Since integer
      splats are likely to be followed by integer ops, we should remain in the
      integer domain, thus use pxor.
      
      For i64x2.splat the codegen goes from:
      
        xorl rdi,rdi
        vmovq xmm0,rdi
        vmovddup xmm0,xmm0
      
      to:
      
        vpxor xmm0,xmm0,xmm0
      
      Also add a unittest to verify this optimization, and necessary
      raw-assembler methods for the test.
      
      Bug: v8:11093
      Change-Id: I26b092032b6e672f1d5d26e35d79578ebe591cfe
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2516299Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70977}
      7d7b25d9
  12. 04 Nov, 2020 2 commits
  13. 02 Nov, 2020 1 commit
    • Zhi An Ng's avatar
      [wasm-simd] Enhance Shufps to copy src to dst · 14570fe0
      Zhi An Ng authored
      Extract Shufps to handle both AVX and SSE cases, in the SSE case it will
      copy src to dst if they are not the same. This allows us to use it in
      Liftoff as well, without the extra copy when AVX is supported.
      
      In other places, the usage of Shufps is unnecessary, since they are
      within a clause checking for non-AVX support, so we can simply use the
      shufps (non-macro-assembler).
      
      Bug: v8:9561
      Change-Id: Icb043d7a43397c1b0810ece2666be567f0f5986c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2513866Reviewed-by: 's avatarClemens Backes <clemensb@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70911}
      14570fe0
  14. 30 Oct, 2020 1 commit
  15. 29 Oct, 2020 2 commits
    • Zhi An Ng's avatar
      [wasm-simd][x64] Don't fix dst to src on AVX · d4f7ea80
      Zhi An Ng authored
      On AVX, many instructions can have 3 operands, unlike SSE which only has
      2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that
      will cause some unnecessary moves.
      
      This change moves a bunch of instructions that have single instruction
      codegen into a macro list which supports the this non-restricted AVX
      codegen.
      
      Bug: v8:9561
      Change-Id: I348a8396e8a1129daf2e1ed08ae8526e1bc3a73b
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505254Reviewed-by: 's avatarClemens Backes <clemensb@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70888}
      d4f7ea80
    • Zhi An Ng's avatar
      Reland "[wasm-simd][ia32][x64] Only use registers for shuffles" · 45cb1ce0
      Zhi An Ng authored
      This is a reland of 3fb07882
      
      Original change's description:
      > [wasm-simd][ia32][x64] Only use registers for shuffles
      >
      > Shuffles have pattern matching clauses which, depending on the
      > instruction used, can require src0 or src1 to be register or not.
      > However we do not have 16-byte alignment for SIMD operands yet, so it
      > will segfault when we use an SSE SIMD instruction with unaligned
      > operands.
      >
      > This patch fixes all the shuffle cases to always use a register for the
      > input nodes, and it does so by ignoring the values of src0_needs_reg and
      > src1_needs_reg. When we eventually have memory alignment, we can
      > re-enable this check, without mucking around too much in the logic in
      > each shuffle match clause.
      >
      > Bug: v8:9198
      > Change-Id: I264e136f017353019f19954c62c88206f7b90656
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849
      > Reviewed-by: Andreas Haas <ahaas@chromium.org>
      > Reviewed-by: Adam Klein <adamk@chromium.org>
      > Commit-Queue: Adam Klein <adamk@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70848}
      
      Bug: v8:9198
      Change-Id: I40c6c8f0cd8908a2d6ab7016d8ed4d4fb2ab4114
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505250Reviewed-by: 's avatarAdam Klein <adamk@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70862}
      45cb1ce0
  16. 28 Oct, 2020 4 commits
  17. 27 Oct, 2020 1 commit
  18. 22 Oct, 2020 1 commit
  19. 21 Oct, 2020 1 commit
    • Jakob Gruber's avatar
      Reland "Reland "[deoptimizer] Change deopt entries into builtins"" · c7cb9bec
      Jakob Gruber authored
      This is a reland of fbfa9bf4
      
      The arm64 was missing proper codegen for CFI, thus sizes were off.
      
      Original change's description:
      > Reland "[deoptimizer] Change deopt entries into builtins"
      >
      > This is a reland of 7f58ced7
      >
      > It fixes the different exit size emitted on x64/Atom CPUs due to
      > performance tuning in TurboAssembler::Call. Additionally, add
      > cctests to verify the fixed size exits.
      >
      > Original change's description:
      > > [deoptimizer] Change deopt entries into builtins
      > >
      > > While the overall goal of this commit is to change deoptimization
      > > entries into builtins, there are multiple related things happening:
      > >
      > > - Deoptimization entries, formerly stubs (i.e. Code objects generated
      > >   at runtime, guaranteed to be immovable), have been converted into
      > >   builtins. The major restriction is that we now need to preserve the
      > >   kRootRegister, which was formerly used on most architectures to pass
      > >   the deoptimization id. The solution differs based on platform.
      > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      > > - Removed heap/ support for immovable Code generation.
      > > - Removed the DeserializerData class (no longer needed).
      > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
      > >   in which the final jump to the deoptimization entry is generated
      > >   once per Code object, and deopt exits can continue to emit a
      > >   near-call.
      > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
      > >   sizes by 4/8, 5, and 5 bytes, respectively.
      > >
      > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      > > by using the same strategy as on arm64 (recalc deopt id from return
      > > address). Before:
      > >
      > >  e300a002       movw r10, <id>
      > >  e59fc024       ldr ip, [pc, <entry offset>]
      > >  e12fff3c       blx ip
      > >
      > > After:
      > >
      > >  e59acb35       ldr ip, [r10, <entry offset>]
      > >  e12fff3c       blx ip
      > >
      > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      > > object (max 32 bytes added overhead per Code object). Before:
      > >
      > >  9401cdae       bl <entry offset>
      > >
      > > After:
      > >
      > >  # eager deoptimization entry jump.
      > >  f95b1f50       ldr x16, [x26, <eager entry offset>]
      > >  d61f0200       br x16
      > >  # lazy deoptimization entry jump.
      > >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
      > >  d61f0200       br x16
      > >  # the deopt exit.
      > >  97fffffc       bl <eager deoptimization entry jump offset>
      > >
      > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      > >
      > >  bb00000000     mov ebx,<id>
      > >  e825f5372b     call <entry>
      > >
      > > After:
      > >
      > >  e8ea2256ba     call <entry>
      > >
      > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      > >
      > >  49c7c511000000 REX.W movq r13,<id>
      > >  e8ea2f0700     call <entry>
      > >
      > > After:
      > >
      > >  41ff9560360000 call [r13+<entry offset>]
      > >
      > > Bug: v8:8661,v8:8768
      > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      > > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
      > > Cr-Commit-Position: refs/heads/master@{#70597}
      >
      > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
      > Bug: v8:8661,v8:8768,chromium:1140165
      > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
      > Reviewed-by: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70655}
      
      Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
      Bug: v8:8661
      Bug: v8:8768
      Bug: chromium:1140165
      Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70672}
      c7cb9bec
  20. 20 Oct, 2020 4 commits
    • Georg Neis's avatar
      [ia32,x64] Make more use of the 'leave' instruction · 8f0ab471
      Georg Neis authored
      It is a little shorter and cheaper[1] than the equivalent
      "mov sp,bp; pop bp".
      
      Also remove support for the 'enter' instruction, since
      - it is unused,
      - it is neither shorter nor cheaper than the corresponding
        push and mov (in fact more expensive[1]), and
      - our disassembler doesn't support it.
      
      [1] See https://www.agner.org/optimize/instruction_tables.pdf
      
      Change-Id: I6c99c2f3e53081aea55445a54e18eaf45baa79c2
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2482822
      Commit-Queue: Georg Neis <neis@chromium.org>
      Reviewed-by: 's avatarVictor Gomes <victorgomes@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70660}
      8f0ab471
    • Maya Lekova's avatar
      Revert "Reland "[deoptimizer] Change deopt entries into builtins"" · 7c7aa4fa
      Maya Lekova authored
      This reverts commit fbfa9bf4.
      
      Reason for revert: Seems to break arm64 sim CFI build (please see DeoptExitSizeIfFixed) - https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/2808
      
      Original change's description:
      > Reland "[deoptimizer] Change deopt entries into builtins"
      >
      > This is a reland of 7f58ced7
      >
      > It fixes the different exit size emitted on x64/Atom CPUs due to
      > performance tuning in TurboAssembler::Call. Additionally, add
      > cctests to verify the fixed size exits.
      >
      > Original change's description:
      > > [deoptimizer] Change deopt entries into builtins
      > >
      > > While the overall goal of this commit is to change deoptimization
      > > entries into builtins, there are multiple related things happening:
      > >
      > > - Deoptimization entries, formerly stubs (i.e. Code objects generated
      > >   at runtime, guaranteed to be immovable), have been converted into
      > >   builtins. The major restriction is that we now need to preserve the
      > >   kRootRegister, which was formerly used on most architectures to pass
      > >   the deoptimization id. The solution differs based on platform.
      > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      > > - Removed heap/ support for immovable Code generation.
      > > - Removed the DeserializerData class (no longer needed).
      > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
      > >   in which the final jump to the deoptimization entry is generated
      > >   once per Code object, and deopt exits can continue to emit a
      > >   near-call.
      > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
      > >   sizes by 4/8, 5, and 5 bytes, respectively.
      > >
      > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      > > by using the same strategy as on arm64 (recalc deopt id from return
      > > address). Before:
      > >
      > >  e300a002       movw r10, <id>
      > >  e59fc024       ldr ip, [pc, <entry offset>]
      > >  e12fff3c       blx ip
      > >
      > > After:
      > >
      > >  e59acb35       ldr ip, [r10, <entry offset>]
      > >  e12fff3c       blx ip
      > >
      > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      > > object (max 32 bytes added overhead per Code object). Before:
      > >
      > >  9401cdae       bl <entry offset>
      > >
      > > After:
      > >
      > >  # eager deoptimization entry jump.
      > >  f95b1f50       ldr x16, [x26, <eager entry offset>]
      > >  d61f0200       br x16
      > >  # lazy deoptimization entry jump.
      > >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
      > >  d61f0200       br x16
      > >  # the deopt exit.
      > >  97fffffc       bl <eager deoptimization entry jump offset>
      > >
      > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      > >
      > >  bb00000000     mov ebx,<id>
      > >  e825f5372b     call <entry>
      > >
      > > After:
      > >
      > >  e8ea2256ba     call <entry>
      > >
      > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      > >
      > >  49c7c511000000 REX.W movq r13,<id>
      > >  e8ea2f0700     call <entry>
      > >
      > > After:
      > >
      > >  41ff9560360000 call [r13+<entry offset>]
      > >
      > > Bug: v8:8661,v8:8768
      > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      > > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
      > > Cr-Commit-Position: refs/heads/master@{#70597}
      >
      > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
      > Bug: v8:8661,v8:8768,chromium:1140165
      > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
      > Reviewed-by: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70655}
      
      TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org
      
      Change-Id: I4739a3475bfd8ee0cfbe4b9a20382f91a6ef1bf0
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
      Bug: v8:8661
      Bug: v8:8768
      Bug: chromium:1140165
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485223Reviewed-by: 's avatarMaya Lekova <mslekova@chromium.org>
      Commit-Queue: Maya Lekova <mslekova@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70658}
      7c7aa4fa
    • Jakob Gruber's avatar
      Reland "[deoptimizer] Change deopt entries into builtins" · fbfa9bf4
      Jakob Gruber authored
      This is a reland of 7f58ced7
      
      It fixes the different exit size emitted on x64/Atom CPUs due to
      performance tuning in TurboAssembler::Call. Additionally, add
      cctests to verify the fixed size exits.
      
      Original change's description:
      > [deoptimizer] Change deopt entries into builtins
      >
      > While the overall goal of this commit is to change deoptimization
      > entries into builtins, there are multiple related things happening:
      >
      > - Deoptimization entries, formerly stubs (i.e. Code objects generated
      >   at runtime, guaranteed to be immovable), have been converted into
      >   builtins. The major restriction is that we now need to preserve the
      >   kRootRegister, which was formerly used on most architectures to pass
      >   the deoptimization id. The solution differs based on platform.
      > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      > - Removed heap/ support for immovable Code generation.
      > - Removed the DeserializerData class (no longer needed).
      > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
      >   in which the final jump to the deoptimization entry is generated
      >   once per Code object, and deopt exits can continue to emit a
      >   near-call.
      > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
      >   sizes by 4/8, 5, and 5 bytes, respectively.
      >
      > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      > by using the same strategy as on arm64 (recalc deopt id from return
      > address). Before:
      >
      >  e300a002       movw r10, <id>
      >  e59fc024       ldr ip, [pc, <entry offset>]
      >  e12fff3c       blx ip
      >
      > After:
      >
      >  e59acb35       ldr ip, [r10, <entry offset>]
      >  e12fff3c       blx ip
      >
      > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      > object (max 32 bytes added overhead per Code object). Before:
      >
      >  9401cdae       bl <entry offset>
      >
      > After:
      >
      >  # eager deoptimization entry jump.
      >  f95b1f50       ldr x16, [x26, <eager entry offset>]
      >  d61f0200       br x16
      >  # lazy deoptimization entry jump.
      >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
      >  d61f0200       br x16
      >  # the deopt exit.
      >  97fffffc       bl <eager deoptimization entry jump offset>
      >
      > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      >
      >  bb00000000     mov ebx,<id>
      >  e825f5372b     call <entry>
      >
      > After:
      >
      >  e8ea2256ba     call <entry>
      >
      > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      >
      >  49c7c511000000 REX.W movq r13,<id>
      >  e8ea2f0700     call <entry>
      >
      > After:
      >
      >  41ff9560360000 call [r13+<entry offset>]
      >
      > Bug: v8:8661,v8:8768
      > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70597}
      
      Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
      Bug: v8:8661,v8:8768,chromium:1140165
      Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70655}
      fbfa9bf4
    • Jakob Gruber's avatar
      Revert "[deoptimizer] Change deopt entries into builtins" · 8bc9a794
      Jakob Gruber authored
      This reverts commit 7f58ced7.
      
      Reason for revert: Segfaults on Atom_x64 https://ci.chromium.org/p/v8-internal/builders/ci/v8_linux64_atom_perf/5686?
      
      Original change's description:
      > [deoptimizer] Change deopt entries into builtins
      >
      > While the overall goal of this commit is to change deoptimization
      > entries into builtins, there are multiple related things happening:
      >
      > - Deoptimization entries, formerly stubs (i.e. Code objects generated
      >   at runtime, guaranteed to be immovable), have been converted into
      >   builtins. The major restriction is that we now need to preserve the
      >   kRootRegister, which was formerly used on most architectures to pass
      >   the deoptimization id. The solution differs based on platform.
      > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      > - Removed heap/ support for immovable Code generation.
      > - Removed the DeserializerData class (no longer needed).
      > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
      >   in which the final jump to the deoptimization entry is generated
      >   once per Code object, and deopt exits can continue to emit a
      >   near-call.
      > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
      >   sizes by 4/8, 5, and 5 bytes, respectively.
      >
      > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      > by using the same strategy as on arm64 (recalc deopt id from return
      > address). Before:
      >
      >  e300a002       movw r10, <id>
      >  e59fc024       ldr ip, [pc, <entry offset>]
      >  e12fff3c       blx ip
      >
      > After:
      >
      >  e59acb35       ldr ip, [r10, <entry offset>]
      >  e12fff3c       blx ip
      >
      > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      > object (max 32 bytes added overhead per Code object). Before:
      >
      >  9401cdae       bl <entry offset>
      >
      > After:
      >
      >  # eager deoptimization entry jump.
      >  f95b1f50       ldr x16, [x26, <eager entry offset>]
      >  d61f0200       br x16
      >  # lazy deoptimization entry jump.
      >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
      >  d61f0200       br x16
      >  # the deopt exit.
      >  97fffffc       bl <eager deoptimization entry jump offset>
      >
      > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      >
      >  bb00000000     mov ebx,<id>
      >  e825f5372b     call <entry>
      >
      > After:
      >
      >  e8ea2256ba     call <entry>
      >
      > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      >
      >  49c7c511000000 REX.W movq r13,<id>
      >  e8ea2f0700     call <entry>
      >
      > After:
      >
      >  41ff9560360000 call [r13+<entry offset>]
      >
      > Bug: v8:8661,v8:8768
      > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70597}
      
      TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org
      
      # Not skipping CQ checks because original CL landed > 1 day ago.
      
      Bug: v8:8661,v8:8768,chromium:1140165
      Change-Id: I3df02ab42f6e02233d9f6fb80e8bb18f76870d91
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485504Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70649}
      8bc9a794
  21. 19 Oct, 2020 4 commits