1. 12 Jan, 2021 1 commit
  2. 08 Jan, 2021 3 commits
  3. 05 Jan, 2021 2 commits
  4. 29 Dec, 2020 4 commits
  5. 23 Dec, 2020 1 commit
  6. 22 Dec, 2020 3 commits
  7. 21 Dec, 2020 1 commit
    • Zhi An Ng's avatar
      [x64][ia32][wasm-simd] Optimize v128.bitselect · 8e9ad4f8
      Zhi An Ng authored
      Couple of optimizations for v128.bitselect on both ia32 and x64.
      
      1. Remove an extra movaps when AVX is supported, since we have 3-operand
      instructions
      2. Tweak the algorithm from:
           xor(and(xor(src1, src2), mask) src2)
      
         To:
           or(and(src1, mask), andnot(src2, mask))
         It is easier to read and understand, and also eliminate a dependency
         chain (on kScratchDoubleReg) in the older algorithm.
      3. Use integer forms of the logical ops. Older processors have higher
      throughput on these, compared to the floating point ops. However, the
      integer forms are 1 byte longer, so on SSE, we stick to the floating
      point ops.
      
      For AVX, this reduces instruction count from 9948 to 9868.
      
      Change-Id: Idd5d26b99a76255dbfa63e2c304e6af3760c4ec6
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591859Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71845}
      8e9ad4f8
  8. 17 Dec, 2020 3 commits
    • Zhi An Ng's avatar
      [wasm-simd][x64] Optimize arch shuffle if AVX supported · 46ce9b05
      Zhi An Ng authored
      AVX has 3-operands shuffle/unpack operations. We currently always
      require that dst == src0 in all cases, which is not required if we have
      AVX. For the arch shuffles that map to a single native instruction, add
      support to check for AVX in the instruction-selector, to not require
      same as first, and in the code-gen to support generating AVX.
      
      The other arch shuffles are slightly more complicated, and can be
      optimized in a future change.
      
      Bug: v8:11270
      Change-Id: I25b271aeff71fbe860d5bcc8abb17c36bcdab32c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591858Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71820}
      46ce9b05
    • Zhi An Ng's avatar
      [x64][wasm-simd] Only require unique registers for shuffles that use temps · 12e37399
      Zhi An Ng authored
      An improvement to generic shuffle improvement
      (https://crrev.com/c/2152853) required a temporary SIMD register to hold
      the mask, rather than pushing it onto a stack. The temporary register
      requires that we UseUniqueRegister on the inputs, to prevent aliasing,
      as we will write to the temp. However, we only need this for the generic
      shuffle. We accidentally over-constraint all other pattern matched
      shuffles, since they don't use any temps.
      
      On a ~2000 line function containing ~150 shuffles (not all of which are
      generic shuffles), we get 16 less instruction in the native code, and
      actually see a very small improvement in the overall benchmarks.
      
      Bug: v8:11270
      Change-Id: I09974f7615e4b8f5e2416ed17ca47cc7613fd6b1
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591857Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71818}
      12e37399
    • Zhi An Ng's avatar
      [wasm-simd][ia32][x64] More optimization for f32x4.extract_lane · 741e5a66
      Zhi An Ng authored
      We can have more optimizations for this instruction, they leave some
      junk in the top lanes of dst, but that doesn't matter:
      
      - when lane is 1: we use movshdup, this is 4 bytes long
      - when lane is 2: use movhlps, this is 3 bytes long
      - otherwise use shufps (4 bytes) or pshufd (5 bytes)
      
      All of which are better than insertps (6 bytes).
      
      Change-Id: I0e524431d1832e297e8c8bb418d42382d93fa691
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591850
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71813}
      741e5a66
  9. 16 Dec, 2020 4 commits
  10. 15 Dec, 2020 1 commit
    • Zhi An Ng's avatar
      [x64][wasm-simd] Pattern match 32x4 rotate · 7c98abdb
      Zhi An Ng authored
      Code like:
      
        x = wasm_v32x4_shuffle(x, x, 1, 2, 3, 0);
      
      is currently matched by S8x16Concat, which lowers to two instructions:
      
        movapd xmm_dst, xmm_src
        palignr xmm_dst, xmm_src, 0x4
      
      There is a special case after a S8x16Concat is matched:.
      
      - is_swizzle, the inputs are the same
      - it is a 32x4 shuffle (offset % 4 == 0)
      
      Which can have a better codegen:
      
      - (dst == src) shufps dst, src, 0b00111001
      - (dst != src) pshufd dst, src, 0b00111001
      
      Add a new simd shuffle matcher which will match 32x4 rotate, and
      construct the appropriate indices referring to the 32x4 elements.
      
      pshufd for the given example. However, this matching happens after
      S8x16Concat, so we get the palignr first. We could move the pattern
      matching cases around, but it will lead to some cases where
      where it would have matched a S8x16Concat, but now matches a
      S32x4shuffle instead, leading to worse codegen.
      
      Note: we also pattern match on 32x4Swizzle, which correctly generates
      Change-Id: Ie3aca53bbc06826be2cf49632de4c24ec73d0a9a
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589062Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#71754}
      7c98abdb
  11. 14 Dec, 2020 2 commits
  12. 10 Dec, 2020 3 commits
  13. 07 Dec, 2020 1 commit
  14. 03 Dec, 2020 1 commit
  15. 01 Dec, 2020 1 commit
  16. 17 Nov, 2020 1 commit
  17. 12 Nov, 2020 1 commit
    • Pierre Langlois's avatar
      [heap] Do not use V8_LIKELY on FLAG_disable_write_barriers. · 4a89c018
      Pierre Langlois authored
      FLAG_disable_write_barriers is a constexpr so the V8_LIKELY macro isn't
      necessary. Interestingly, it can also cause clang to warn that the code
      is unreachable, whereas without `__builtin_expect()` the compiler
      doesn't mind. See for example:
      
      ```
      constexpr bool kNo = false;
      
      void warns() {
        if (__builtin_expect(kNo, 0)) {
          int a = 42;
        }
      }
      
      void does_not_warn() {
        if (kNo) {
          int a = 42;
        }
      }
      ```
      
      Compiling V8 for arm64 with both `v8_disable_write_barriers = true` and
      `v8_enable_pointer_compression = false` would trigger this warning.
      
      Bug: v8:9533
      Change-Id: Id2ae156d60217007bb9ebf50628e8908e0193d05
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2534811Reviewed-by: 's avatarUlan Degenbaev <ulan@chromium.org>
      Reviewed-by: 's avatarGeorg Neis <neis@chromium.org>
      Commit-Queue: Pierre Langlois <pierre.langlois@arm.com>
      Cr-Commit-Position: refs/heads/master@{#71157}
      4a89c018
  18. 05 Nov, 2020 1 commit
    • Zhi An Ng's avatar
      [wasm-simd][x64] Optimize integer splats of constant 0 · 7d7b25d9
      Zhi An Ng authored
      Integer splats (especially for sizes < 32-bits) does not directly
      translate to a single instruction on x64. We can do better for special
      values, like 0, which can be lowered to `xor dst dst`. We do this check
      in the instruction selector, and emit a special opcode kX64S128Zero.
      
      Also change the xor operation for kX64S128Zero from xorps to pxor. This
      can help reduce any potential data bypass delay (search for this on
      agner's microarchitecture manual for more details.). Since integer
      splats are likely to be followed by integer ops, we should remain in the
      integer domain, thus use pxor.
      
      For i64x2.splat the codegen goes from:
      
        xorl rdi,rdi
        vmovq xmm0,rdi
        vmovddup xmm0,xmm0
      
      to:
      
        vpxor xmm0,xmm0,xmm0
      
      Also add a unittest to verify this optimization, and necessary
      raw-assembler methods for the test.
      
      Bug: v8:11093
      Change-Id: I26b092032b6e672f1d5d26e35d79578ebe591cfe
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2516299Reviewed-by: 's avatarTobias Tebbi <tebbi@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70977}
      7d7b25d9
  19. 04 Nov, 2020 2 commits
  20. 02 Nov, 2020 1 commit
    • Zhi An Ng's avatar
      [wasm-simd] Enhance Shufps to copy src to dst · 14570fe0
      Zhi An Ng authored
      Extract Shufps to handle both AVX and SSE cases, in the SSE case it will
      copy src to dst if they are not the same. This allows us to use it in
      Liftoff as well, without the extra copy when AVX is supported.
      
      In other places, the usage of Shufps is unnecessary, since they are
      within a clause checking for non-AVX support, so we can simply use the
      shufps (non-macro-assembler).
      
      Bug: v8:9561
      Change-Id: Icb043d7a43397c1b0810ece2666be567f0f5986c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2513866Reviewed-by: 's avatarClemens Backes <clemensb@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70911}
      14570fe0
  21. 30 Oct, 2020 1 commit
  22. 29 Oct, 2020 2 commits
    • Zhi An Ng's avatar
      [wasm-simd][x64] Don't fix dst to src on AVX · d4f7ea80
      Zhi An Ng authored
      On AVX, many instructions can have 3 operands, unlike SSE which only has
      2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that
      will cause some unnecessary moves.
      
      This change moves a bunch of instructions that have single instruction
      codegen into a macro list which supports the this non-restricted AVX
      codegen.
      
      Bug: v8:9561
      Change-Id: I348a8396e8a1129daf2e1ed08ae8526e1bc3a73b
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505254Reviewed-by: 's avatarClemens Backes <clemensb@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70888}
      d4f7ea80
    • Zhi An Ng's avatar
      Reland "[wasm-simd][ia32][x64] Only use registers for shuffles" · 45cb1ce0
      Zhi An Ng authored
      This is a reland of 3fb07882
      
      Original change's description:
      > [wasm-simd][ia32][x64] Only use registers for shuffles
      >
      > Shuffles have pattern matching clauses which, depending on the
      > instruction used, can require src0 or src1 to be register or not.
      > However we do not have 16-byte alignment for SIMD operands yet, so it
      > will segfault when we use an SSE SIMD instruction with unaligned
      > operands.
      >
      > This patch fixes all the shuffle cases to always use a register for the
      > input nodes, and it does so by ignoring the values of src0_needs_reg and
      > src1_needs_reg. When we eventually have memory alignment, we can
      > re-enable this check, without mucking around too much in the logic in
      > each shuffle match clause.
      >
      > Bug: v8:9198
      > Change-Id: I264e136f017353019f19954c62c88206f7b90656
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849
      > Reviewed-by: Andreas Haas <ahaas@chromium.org>
      > Reviewed-by: Adam Klein <adamk@chromium.org>
      > Commit-Queue: Adam Klein <adamk@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#70848}
      
      Bug: v8:9198
      Change-Id: I40c6c8f0cd8908a2d6ab7016d8ed4d4fb2ab4114
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505250Reviewed-by: 's avatarAdam Klein <adamk@chromium.org>
      Commit-Queue: Zhi An Ng <zhin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#70862}
      45cb1ce0