1. 17 Nov, 2020 1 commit
  2. 12 Nov, 2020 1 commit
    • Pierre Langlois's avatar
      [heap] Do not use V8_LIKELY on FLAG_disable_write_barriers. · 4a89c018
      Pierre Langlois authored
      FLAG_disable_write_barriers is a constexpr so the V8_LIKELY macro isn't
      necessary. Interestingly, it can also cause clang to warn that the code
      is unreachable, whereas without `__builtin_expect()` the compiler
      doesn't mind. See for example:
      
      ```
      constexpr bool kNo = false;
      
      void warns() {
        if (__builtin_expect(kNo, 0)) {
          int a = 42;
        }
      }
      
      void does_not_warn() {
        if (kNo) {
          int a = 42;
        }
      }
      ```
      
      Compiling V8 for arm64 with both `v8_disable_write_barriers = true` and
      `v8_enable_pointer_compression = false` would trigger this warning.
      
      Bug: v8:9533
      Change-Id: Id2ae156d60217007bb9ebf50628e8908e0193d05
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2534811Reviewed-by: 's avatarUlan Degenbaev <ulan@chromium.org>
      Reviewed-by: 's avatarGeorg Neis <neis@chromium.org>
      Commit-Queue: Pierre Langlois <pierre.langlois@arm.com>
      Cr-Commit-Position: refs/heads/master@{#71157}
      4a89c018
  3. 05 Nov, 2020 1 commit
    • Zhi An Ng's avatar
      [wasm-simd][x64] Optimize integer splats of constant 0 · 7d7b25d9
      Zhi An Ng authored
      Integer splats (especially for sizes < 32-bits) does not directly
      translate to a single instruction on x64. We can do better for special
      values, like 0, which can be lowered to `xor dst dst`. We do this check
      in the instruction selector, and emit a special opcode kX64S128Zero.
      
      Also change the xor operation for kX64S128Zero from xorps to pxor. This
      can help reduce any potential data bypass delay (search for this on
      agner's microarchitecture manual for more details.). Since integer
      splats are likely to be followed by integer ops, we should remain in the
      integer domain, thus use pxor.
      
      For i64x2.splat the codegen goes from:
      
        xorl rdi,rdi
        vmovq xmm0,rdi
        vmovddup xmm0,xmm0
      
      to:
      
        vpxor xmm0,xmm0,xmm0
      
      Also add a unittest to verify this optimization, and necessary
      raw-assembler methods for the test.
      
      Bug: v8:11093
      Change-Id: I26b092032b6e672f1d5d26e35d79578ebe591cfe
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2516299Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70977}
      7d7b25d9
  4. 04 Nov, 2020 2 commits
  5. 02 Nov, 2020 1 commit
    • Zhi An Ng's avatar
      [wasm-simd] Enhance Shufps to copy src to dst · 14570fe0
      Zhi An Ng authored
      Extract Shufps to handle both AVX and SSE cases, in the SSE case it will
      copy src to dst if they are not the same. This allows us to use it in
      Liftoff as well, without the extra copy when AVX is supported.
      
      In other places, the usage of Shufps is unnecessary, since they are
      within a clause checking for non-AVX support, so we can simply use the
      shufps (non-macro-assembler).
      
      Bug: v8:9561
      Change-Id: Icb043d7a43397c1b0810ece2666be567f0f5986c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2513866Reviewed-by: 's avatarClemens Backes <clemensb@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70911}
      14570fe0
  6. 30 Oct, 2020 1 commit
  7. 29 Oct, 2020 2 commits
    • Zhi An Ng's avatar
      [wasm-simd][x64] Don't fix dst to src on AVX · d4f7ea80
      Zhi An Ng authored
      On AVX, many instructions can have 3 operands, unlike SSE which only has
      2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that
      will cause some unnecessary moves.
      
      This change moves a bunch of instructions that have single instruction
      codegen into a macro list which supports the this non-restricted AVX
      codegen.
      
      Bug: v8:9561
      Change-Id: I348a8396e8a1129daf2e1ed08ae8526e1bc3a73b
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505254Reviewed-by: 's avatarClemens Backes <clemensb@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70888}
      d4f7ea80
    • Zhi An Ng's avatar
      Reland "[wasm-simd][ia32][x64] Only use registers for shuffles" · 45cb1ce0
      Zhi An Ng authored
      This is a reland of 3fb07882
      
      Original change's description:
      > [wasm-simd][ia32][x64] Only use registers for shuffles
      >
      > Shuffles have pattern matching clauses which, depending on the
      > instruction used, can require src0 or src1 to be register or not.
      > However we do not have 16-byte alignment for SIMD operands yet, so it
      > will segfault when we use an SSE SIMD instruction with unaligned
      > operands.
      >
      > This patch fixes all the shuffle cases to always use a register for the
      > input nodes, and it does so by ignoring the values of src0_needs_reg and
      > src1_needs_reg. When we eventually have memory alignment, we can
      > re-enable this check, without mucking around too much in the logic in
      > each shuffle match clause.
      >
      > Bug: v8:9198
      > Change-Id: I264e136f017353019f19954c62c88206f7b90656
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849
      > Reviewed-by: Andreas Haas <ahaas@chromium.org>
      > Reviewed-by: Adam Klein <adamk@chromium.org>
      > Commit-Queue: Adam Klein <adamk@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70848}
      
      Bug: v8:9198
      Change-Id: I40c6c8f0cd8908a2d6ab7016d8ed4d4fb2ab4114
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505250Reviewed-by: 's avatarAdam Klein <adamk@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70862}
      45cb1ce0
  8. 28 Oct, 2020 4 commits
  9. 27 Oct, 2020 1 commit
  10. 22 Oct, 2020 1 commit
  11. 21 Oct, 2020 1 commit
    • Jakob Gruber's avatar
      Reland "Reland "[deoptimizer] Change deopt entries into builtins"" · c7cb9bec
      Jakob Gruber authored
      This is a reland of fbfa9bf4
      
      The arm64 was missing proper codegen for CFI, thus sizes were off.
      
      Original change's description:
      > Reland "[deoptimizer] Change deopt entries into builtins"
      >
      > This is a reland of 7f58ced7
      >
      > It fixes the different exit size emitted on x64/Atom CPUs due to
      > performance tuning in TurboAssembler::Call. Additionally, add
      > cctests to verify the fixed size exits.
      >
      > Original change's description:
      > > [deoptimizer] Change deopt entries into builtins
      > >
      > > While the overall goal of this commit is to change deoptimization
      > > entries into builtins, there are multiple related things happening:
      > >
      > > - Deoptimization entries, formerly stubs (i.e. Code objects generated
      > >   at runtime, guaranteed to be immovable), have been converted into
      > >   builtins. The major restriction is that we now need to preserve the
      > >   kRootRegister, which was formerly used on most architectures to pass
      > >   the deoptimization id. The solution differs based on platform.
      > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      > > - Removed heap/ support for immovable Code generation.
      > > - Removed the DeserializerData class (no longer needed).
      > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
      > >   in which the final jump to the deoptimization entry is generated
      > >   once per Code object, and deopt exits can continue to emit a
      > >   near-call.
      > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
      > >   sizes by 4/8, 5, and 5 bytes, respectively.
      > >
      > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      > > by using the same strategy as on arm64 (recalc deopt id from return
      > > address). Before:
      > >
      > >  e300a002       movw r10, <id>
      > >  e59fc024       ldr ip, [pc, <entry offset>]
      > >  e12fff3c       blx ip
      > >
      > > After:
      > >
      > >  e59acb35       ldr ip, [r10, <entry offset>]
      > >  e12fff3c       blx ip
      > >
      > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      > > object (max 32 bytes added overhead per Code object). Before:
      > >
      > >  9401cdae       bl <entry offset>
      > >
      > > After:
      > >
      > >  # eager deoptimization entry jump.
      > >  f95b1f50       ldr x16, [x26, <eager entry offset>]
      > >  d61f0200       br x16
      > >  # lazy deoptimization entry jump.
      > >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
      > >  d61f0200       br x16
      > >  # the deopt exit.
      > >  97fffffc       bl <eager deoptimization entry jump offset>
      > >
      > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      > >
      > >  bb00000000     mov ebx,<id>
      > >  e825f5372b     call <entry>
      > >
      > > After:
      > >
      > >  e8ea2256ba     call <entry>
      > >
      > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      > >
      > >  49c7c511000000 REX.W movq r13,<id>
      > >  e8ea2f0700     call <entry>
      > >
      > > After:
      > >
      > >  41ff9560360000 call [r13+<entry offset>]
      > >
      > > Bug: v8:8661,v8:8768
      > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      > > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
      > > Cr-Commit-Position: refs/heads/master@{#70597}
      >
      > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
      > Bug: v8:8661,v8:8768,chromium:1140165
      > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
      > Reviewed-by: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70655}
      
      Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
      Bug: v8:8661
      Bug: v8:8768
      Bug: chromium:1140165
      Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70672}
      c7cb9bec
  12. 20 Oct, 2020 4 commits
    • Georg Neis's avatar
      [ia32,x64] Make more use of the 'leave' instruction · 8f0ab471
      Georg Neis authored
      It is a little shorter and cheaper[1] than the equivalent
      "mov sp,bp; pop bp".
      
      Also remove support for the 'enter' instruction, since
      - it is unused,
      - it is neither shorter nor cheaper than the corresponding
        push and mov (in fact more expensive[1]), and
      - our disassembler doesn't support it.
      
      [1] See https://www.agner.org/optimize/instruction_tables.pdf
      
      Change-Id: I6c99c2f3e53081aea55445a54e18eaf45baa79c2
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2482822
      Commit-Queue: Georg Neis <neis@chromium.org>
      Reviewed-by: 's avatarVictor Gomes <victorgomes@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70660}
      8f0ab471
    • Maya Lekova's avatar
      Revert "Reland "[deoptimizer] Change deopt entries into builtins"" · 7c7aa4fa
      Maya Lekova authored
      This reverts commit fbfa9bf4.
      
      Reason for revert: Seems to break arm64 sim CFI build (please see DeoptExitSizeIfFixed) - https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/2808
      
      Original change's description:
      > Reland "[deoptimizer] Change deopt entries into builtins"
      >
      > This is a reland of 7f58ced7
      >
      > It fixes the different exit size emitted on x64/Atom CPUs due to
      > performance tuning in TurboAssembler::Call. Additionally, add
      > cctests to verify the fixed size exits.
      >
      > Original change's description:
      > > [deoptimizer] Change deopt entries into builtins
      > >
      > > While the overall goal of this commit is to change deoptimization
      > > entries into builtins, there are multiple related things happening:
      > >
      > > - Deoptimization entries, formerly stubs (i.e. Code objects generated
      > >   at runtime, guaranteed to be immovable), have been converted into
      > >   builtins. The major restriction is that we now need to preserve the
      > >   kRootRegister, which was formerly used on most architectures to pass
      > >   the deoptimization id. The solution differs based on platform.
      > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      > > - Removed heap/ support for immovable Code generation.
      > > - Removed the DeserializerData class (no longer needed).
      > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
      > >   in which the final jump to the deoptimization entry is generated
      > >   once per Code object, and deopt exits can continue to emit a
      > >   near-call.
      > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
      > >   sizes by 4/8, 5, and 5 bytes, respectively.
      > >
      > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      > > by using the same strategy as on arm64 (recalc deopt id from return
      > > address). Before:
      > >
      > >  e300a002       movw r10, <id>
      > >  e59fc024       ldr ip, [pc, <entry offset>]
      > >  e12fff3c       blx ip
      > >
      > > After:
      > >
      > >  e59acb35       ldr ip, [r10, <entry offset>]
      > >  e12fff3c       blx ip
      > >
      > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      > > object (max 32 bytes added overhead per Code object). Before:
      > >
      > >  9401cdae       bl <entry offset>
      > >
      > > After:
      > >
      > >  # eager deoptimization entry jump.
      > >  f95b1f50       ldr x16, [x26, <eager entry offset>]
      > >  d61f0200       br x16
      > >  # lazy deoptimization entry jump.
      > >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
      > >  d61f0200       br x16
      > >  # the deopt exit.
      > >  97fffffc       bl <eager deoptimization entry jump offset>
      > >
      > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      > >
      > >  bb00000000     mov ebx,<id>
      > >  e825f5372b     call <entry>
      > >
      > > After:
      > >
      > >  e8ea2256ba     call <entry>
      > >
      > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      > >
      > >  49c7c511000000 REX.W movq r13,<id>
      > >  e8ea2f0700     call <entry>
      > >
      > > After:
      > >
      > >  41ff9560360000 call [r13+<entry offset>]
      > >
      > > Bug: v8:8661,v8:8768
      > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      > > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
      > > Cr-Commit-Position: refs/heads/master@{#70597}
      >
      > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
      > Bug: v8:8661,v8:8768,chromium:1140165
      > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
      > Reviewed-by: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70655}
      
      TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org
      
      Change-Id: I4739a3475bfd8ee0cfbe4b9a20382f91a6ef1bf0
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
      Bug: v8:8661
      Bug: v8:8768
      Bug: chromium:1140165
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485223Reviewed-by: 's avatarMaya Lekova <mslekova@chromium.org>
      Commit-Queue: Maya Lekova <mslekova@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70658}
      7c7aa4fa
    • Jakob Gruber's avatar
      Reland "[deoptimizer] Change deopt entries into builtins" · fbfa9bf4
      Jakob Gruber authored
      This is a reland of 7f58ced7
      
      It fixes the different exit size emitted on x64/Atom CPUs due to
      performance tuning in TurboAssembler::Call. Additionally, add
      cctests to verify the fixed size exits.
      
      Original change's description:
      > [deoptimizer] Change deopt entries into builtins
      >
      > While the overall goal of this commit is to change deoptimization
      > entries into builtins, there are multiple related things happening:
      >
      > - Deoptimization entries, formerly stubs (i.e. Code objects generated
      >   at runtime, guaranteed to be immovable), have been converted into
      >   builtins. The major restriction is that we now need to preserve the
      >   kRootRegister, which was formerly used on most architectures to pass
      >   the deoptimization id. The solution differs based on platform.
      > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      > - Removed heap/ support for immovable Code generation.
      > - Removed the DeserializerData class (no longer needed).
      > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
      >   in which the final jump to the deoptimization entry is generated
      >   once per Code object, and deopt exits can continue to emit a
      >   near-call.
      > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
      >   sizes by 4/8, 5, and 5 bytes, respectively.
      >
      > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      > by using the same strategy as on arm64 (recalc deopt id from return
      > address). Before:
      >
      >  e300a002       movw r10, <id>
      >  e59fc024       ldr ip, [pc, <entry offset>]
      >  e12fff3c       blx ip
      >
      > After:
      >
      >  e59acb35       ldr ip, [r10, <entry offset>]
      >  e12fff3c       blx ip
      >
      > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      > object (max 32 bytes added overhead per Code object). Before:
      >
      >  9401cdae       bl <entry offset>
      >
      > After:
      >
      >  # eager deoptimization entry jump.
      >  f95b1f50       ldr x16, [x26, <eager entry offset>]
      >  d61f0200       br x16
      >  # lazy deoptimization entry jump.
      >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
      >  d61f0200       br x16
      >  # the deopt exit.
      >  97fffffc       bl <eager deoptimization entry jump offset>
      >
      > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      >
      >  bb00000000     mov ebx,<id>
      >  e825f5372b     call <entry>
      >
      > After:
      >
      >  e8ea2256ba     call <entry>
      >
      > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      >
      >  49c7c511000000 REX.W movq r13,<id>
      >  e8ea2f0700     call <entry>
      >
      > After:
      >
      >  41ff9560360000 call [r13+<entry offset>]
      >
      > Bug: v8:8661,v8:8768
      > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70597}
      
      Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
      Bug: v8:8661,v8:8768,chromium:1140165
      Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70655}
      fbfa9bf4
    • Jakob Gruber's avatar
      Revert "[deoptimizer] Change deopt entries into builtins" · 8bc9a794
      Jakob Gruber authored
      This reverts commit 7f58ced7.
      
      Reason for revert: Segfaults on Atom_x64 https://ci.chromium.org/p/v8-internal/builders/ci/v8_linux64_atom_perf/5686?
      
      Original change's description:
      > [deoptimizer] Change deopt entries into builtins
      >
      > While the overall goal of this commit is to change deoptimization
      > entries into builtins, there are multiple related things happening:
      >
      > - Deoptimization entries, formerly stubs (i.e. Code objects generated
      >   at runtime, guaranteed to be immovable), have been converted into
      >   builtins. The major restriction is that we now need to preserve the
      >   kRootRegister, which was formerly used on most architectures to pass
      >   the deoptimization id. The solution differs based on platform.
      > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      > - Removed heap/ support for immovable Code generation.
      > - Removed the DeserializerData class (no longer needed).
      > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
      >   in which the final jump to the deoptimization entry is generated
      >   once per Code object, and deopt exits can continue to emit a
      >   near-call.
      > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
      >   sizes by 4/8, 5, and 5 bytes, respectively.
      >
      > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      > by using the same strategy as on arm64 (recalc deopt id from return
      > address). Before:
      >
      >  e300a002       movw r10, <id>
      >  e59fc024       ldr ip, [pc, <entry offset>]
      >  e12fff3c       blx ip
      >
      > After:
      >
      >  e59acb35       ldr ip, [r10, <entry offset>]
      >  e12fff3c       blx ip
      >
      > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      > object (max 32 bytes added overhead per Code object). Before:
      >
      >  9401cdae       bl <entry offset>
      >
      > After:
      >
      >  # eager deoptimization entry jump.
      >  f95b1f50       ldr x16, [x26, <eager entry offset>]
      >  d61f0200       br x16
      >  # lazy deoptimization entry jump.
      >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
      >  d61f0200       br x16
      >  # the deopt exit.
      >  97fffffc       bl <eager deoptimization entry jump offset>
      >
      > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      >
      >  bb00000000     mov ebx,<id>
      >  e825f5372b     call <entry>
      >
      > After:
      >
      >  e8ea2256ba     call <entry>
      >
      > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      >
      >  49c7c511000000 REX.W movq r13,<id>
      >  e8ea2f0700     call <entry>
      >
      > After:
      >
      >  41ff9560360000 call [r13+<entry offset>]
      >
      > Bug: v8:8661,v8:8768
      > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
      > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
      > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70597}
      
      TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org
      
      # Not skipping CQ checks because original CL landed > 1 day ago.
      
      Bug: v8:8661,v8:8768,chromium:1140165
      Change-Id: I3df02ab42f6e02233d9f6fb80e8bb18f76870d91
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485504Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70649}
      8bc9a794
  13. 19 Oct, 2020 6 commits
    • Ng Zhi An's avatar
      [wasm-simd][x64] Optimize more ops for AVX · 89d9eb73
      Ng Zhi An authored
      All these opcodes have a simple lowering into a single x64 instruction.
      We can perform a similar optimization when AVX is supported to not force
      dst == src1.
      
      Bug: v8:10116
      Change-Id: I4ad2975b6f241d8209025682202b476c08b3491b
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486383Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70636}
      89d9eb73
    • Ng Zhi An's avatar
      [wasm-simd][x64] Consolidate v128.load_zero with movss/movsd · c77dd2ff
      Ng Zhi An authored
      We don't need separate Load32Zero and Load64Zero instructions, since the
      implementation is movss and movsd, which we already have.
      
      Bug: v8:10713
      Change-Id: I5d02e946f3bf9fe08f943a811f2d3cc8aec81ea8
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486233Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70635}
      c77dd2ff
    • Ng Zhi An's avatar
      [wasm-simd] Rename v128.load32_zero to follow proposal · 9738fb5e
      Ng Zhi An authored
      Not sure why I originally chose to name it LoadMem32Zero instead of
      Load32Zero like the proposal. This fixes it.
      
      Bug: v8:10713
      Change-Id: If05603f743213bc6b7aea0ce22c80ae4b3023ccf
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481824Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Reviewed-by: 's avatarGeorg Neis <neis@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70630}
      9738fb5e
    • Ng Zhi An's avatar
      [wasm-simd][x64] Optimize f32x4 splat and extract lanes · 4068b3d2
      Ng Zhi An authored
      For splats, we can make use of vshufps to avoid a movss. Without
      AVX, specific dst to be same as src in the instruction selector.
      
      For extract lane, we can use vshufps to extract a float into a dst xmm,
      and leave junk in the higher bits.
      
      On the meshopt_decoder.js benchmark in linked bug, it removes about 7
      movss instructions that did nothing. Hardware can do register renaming,
      but let's not rely on that :)
      
      R=bbudge@chromium.org
      
      Bug: v8:10116
      Change-Id: I4d68c10536a79659de673060d537d58113308477
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481473
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70628}
      4068b3d2
    • Ng Zhi An's avatar
      Rename LoadKind to MemoryAccessKind · 0301534c
      Ng Zhi An authored
      LoadKind is not longer just for load, we use it for stores as well
      (starting with https://crrev.com/c/2473383). Rename it to something more
      generic.
      
      Bug: v8:10975,v8:10933
      Change-Id: I5e5406ea475e06a83eb2eefe22d4824a99029944
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481822
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Reviewed-by: 's avatarGeorg Neis <neis@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70626}
      0301534c
    • Jakob Gruber's avatar
      [deoptimizer] Change deopt entries into builtins · 7f58ced7
      Jakob Gruber authored
      While the overall goal of this commit is to change deoptimization
      entries into builtins, there are multiple related things happening:
      
      - Deoptimization entries, formerly stubs (i.e. Code objects generated
        at runtime, guaranteed to be immovable), have been converted into
        builtins. The major restriction is that we now need to preserve the
        kRootRegister, which was formerly used on most architectures to pass
        the deoptimization id. The solution differs based on platform.
      - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
      - Removed heap/ support for immovable Code generation.
      - Removed the DeserializerData class (no longer needed).
      - arm64: to preserve 4-byte deopt exits, introduced a new optimization
        in which the final jump to the deoptimization entry is generated
        once per Code object, and deopt exits can continue to emit a
        near-call.
      - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
        sizes by 4/8, 5, and 5 bytes, respectively.
      
      On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
      by using the same strategy as on arm64 (recalc deopt id from return
      address). Before:
      
       e300a002       movw r10, <id>
       e59fc024       ldr ip, [pc, <entry offset>]
       e12fff3c       blx ip
      
      After:
      
       e59acb35       ldr ip, [r10, <entry offset>]
       e12fff3c       blx ip
      
      On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
      with CFI). Additionally, up to 4 builtin jumps are emitted per Code
      object (max 32 bytes added overhead per Code object). Before:
      
       9401cdae       bl <entry offset>
      
      After:
      
       # eager deoptimization entry jump.
       f95b1f50       ldr x16, [x26, <eager entry offset>]
       d61f0200       br x16
       # lazy deoptimization entry jump.
       f95b2b50       ldr x16, [x26, <lazy entry offset>]
       d61f0200       br x16
       # the deopt exit.
       97fffffc       bl <eager deoptimization entry jump offset>
      
      On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
      
       bb00000000     mov ebx,<id>
       e825f5372b     call <entry>
      
      After:
      
       e8ea2256ba     call <entry>
      
      On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
      
       49c7c511000000 REX.W movq r13,<id>
       e8ea2f0700     call <entry>
      
      After:
      
       41ff9560360000 call [r13+<entry offset>]
      
      Bug: v8:8661,v8:8768
      Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarRoss McIlroy <rmcilroy@chromium.org>
      Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Reviewed-by: 's avatarUlan Degenbaev <ulan@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70597}
      7f58ced7
  14. 16 Oct, 2020 5 commits
  15. 15 Oct, 2020 3 commits
  16. 14 Oct, 2020 2 commits
  17. 13 Oct, 2020 1 commit
  18. 12 Oct, 2020 3 commits
    • Ng Zhi An's avatar
      [wasm-simd][x64] Don't force dst to be same as src on AVX · 813ae013
      Ng Zhi An authored
      On AVX, many instructions can have 3 operands, unlike SSE which only has
      2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that
      will cause some unnecessary moves.
      
      This patch changes a couple of F32x4 and S128 instructions to remove
      this restriction when AVX is supported.
      
      We can't use AvxHelper since it duplicates the dst for the call to the
      AVX instruction, which isn't what we want. The alternative is to
      redefine Mulps and other functions here, but there are other callsites
      that depend on this duplicated-dst behavior, so it's harder to change.
      We can migrate this as we move more logic over to non-DefineSameAsFirst
      for AVX.
      
      With the meshopt_decoder.js in the linked bug, it removes 8 SIMD movs
      (from a function that has 300+ lines of assembly.)
      
      Note that from agner's microarchitecture.pdf, page 127, "Elimination of
      move instructions", many times such moves can be eliminated by the
      processor. So this change won't speed up perf, but it helps a bit with
      binary size, and decoder pressure.
      
      Bug: v8:10116,v8:9561
      Change-Id: I125bfd44e728ef08312620bc00f6433f376e69e3
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465653Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70462}
      813ae013
    • Ng Zhi An's avatar
      [wasm-simd][x64] Prototype i64x2.bitmask · ceee7cfe
      Ng Zhi An authored
      Implement on interpreter and x64.
      
      Bug: v8:10997
      Change-Id: I3537ce54e1b56cc3b04d91cb07c430c35b88c3aa
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2459109
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70459}
      ceee7cfe
    • Ng Zhi An's avatar
      [wasm-simd][x64] Prototype load lane · 673be63e
      Ng Zhi An authored
      Load lane loads a value from memory and replaces a single lane of a
      simd value.
      
      This implements the load (no stores yet) for x64 and interpreter.
      
      Bug: v8:10975
      Change-Id: I95d1b5e781ee9adaec23dda749e514f2485eda10
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2444578
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70456}
      673be63e