Commits · 4113b70d43b2ec4ea7f3a42018ac2172c884ef29 · Linshizhi / V8

12 Jan, 2021 1 commit

[wasm-simd][ia32] Prototype i64x2 widen i32x4 instructions · bf9875e8

Zhi An Ng authored 4 years ago

Prototype these 4 instructions:

- i64x2.widen_low_i32x4_s
- i64x2.widen_high_i32x4_s
- i64x2.widen_low_i32x4_u
- i64x2.widen_high_i32x4_u

Implementation is the same as x64.

Drive-by fix to add a missing CpuFeatureScope to x64.

Bug: v8:10972
Change-Id: Iacc84bce156053d0ac39b1a419727c93c499a8c9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2612339
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#72025}

bf9875e8

08 Jan, 2021 3 commits

[x64] Use Abspd Negpd macro assembler helpers · 5100c7d6

Zhi An Ng authored 4 years ago

For Float64Abs, Float64Neg, F64x2Abs, and F64x2Neg, we can use the Abspd
and Negpd helpers. These helpers will load the necessary masks as an
ExternalReference.

We cannot do the same with AVX, since the AVX codegen can already have
one of the inputs as an Operand.

Change-Id: I85f0a7437747b9cfe8bff735d7b27a957736818c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599850
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71967}

5100c7d6

[wasm-simd][x64][avx2] Optimize f32x4.splat · ffc832be

Zhi An Ng authored 4 years ago

When AVX2 is available, we can use vbroadcastss. On AVX, use vshufps,
since it is non-destructive. On SSE, shufps is 1 byte shorter.

FIXED=b/175364402

Change-Id: I5bd10914579d8db012192a9c04f7b0038ec1c812
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599849Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71964}

ffc832be

Reland "[wasm-simd][liftoff][x64] Move v128.select into macro-assembler" · 025443a4

Zhi An Ng authored 4 years ago

This is a reland of 2d5f981a

The fix is in liftoff-assembler-x64, to call S128Select with dst
as the mask when AVX is not supported and dst != mask.

Original change's description:
> [wasm-simd][liftoff][x64] Move v128.select into macro-assembler
>
> This allows us to reuse this optimized code sequence in Liftoff.
>
> We can't do the same thing in IA32 yet, there is no kScratchDoubleReg
> defined in the macro-assembler-ia32.cc, it is defined in code-generator-ia32
> as xmm0 but I'm not sure if it is safe to just use that in the macro assembler.
>
> Change-Id: I6c761857c49d2518fbc82cd0796c62fc86665cb5
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596581
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Reviewed-by: Clemens Backes <clemensb@chromium.org>
> Reviewed-by: Bill Budge <bbudge@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#71915}

Change-Id: Ib96ce0e1d5762f6513ef87f240b25ef3ae59441f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2612324Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71961}

025443a4

05 Jan, 2021 2 commits

Revert "[wasm-simd][liftoff][x64] Move v128.select into macro-assembler" · b48824ff

Clemens Backes authored 4 years ago

This reverts commit 2d5f981a.

Reason for revert: Fails on noavx: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20-%20debug/35318

Original change's description:
> [wasm-simd][liftoff][x64] Move v128.select into macro-assembler
>
> This allows us to reuse this optimized code sequence in Liftoff.
>
> We can't do the same thing in IA32 yet, there is no kScratchDoubleReg
> defined in the macro-assembler-ia32.cc, it is defined in code-generator-ia32
> as xmm0 but I'm not sure if it is safe to just use that in the macro assembler.
>
> Change-Id: I6c761857c49d2518fbc82cd0796c62fc86665cb5
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596581
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Reviewed-by: Clemens Backes <clemensb@chromium.org>
> Reviewed-by: Bill Budge <bbudge@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#71915}

TBR=bbudge@chromium.org,clemensb@chromium.org,zhin@chromium.org

Change-Id: I2aacee02c89a16516a9cd6686d8cc6180362f78e
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2610730Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71916}

b48824ff

[wasm-simd][liftoff][x64] Move v128.select into macro-assembler · 2d5f981a

Zhi An Ng authored 4 years ago

This allows us to reuse this optimized code sequence in Liftoff.

We can't do the same thing in IA32 yet, there is no kScratchDoubleReg
defined in the macro-assembler-ia32.cc, it is defined in code-generator-ia32
as xmm0 but I'm not sure if it is safe to just use that in the macro assembler.

Change-Id: I6c761857c49d2518fbc82cd0796c62fc86665cb5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596581
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71915}

2d5f981a

29 Dec, 2020 4 commits

[wasm-simd][x64] Prototype i64x2 widen i32x4 · d5662577

Zhi An Ng authored 4 years ago

Prototype these 4 instructions:

- i64x2.widen_low_i32x4_s
- i64x2.widen_high_i32x4_s
- i64x2.widen_low_i32x4_u
- i64x2.widen_high_i32x4_u

Bug: v8:10972
Change-Id: I3defd0a2431252bc3f5bb45e022e62b37beb34ca
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2601012
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71888}

d5662577

[wasm-simd][x64] Prototype saturating rounding multiply high · e1935574

Zhi An Ng authored 4 years ago

Bug: v8:10971
Change-Id: I60186a445f3a5ad366cba4e6bcb16519098aa6ad
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2601009
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71886}

e1935574

[wasm-simd][x64] Convert ext mul macros into macro-assembler functions · 7ddcd92e

Zhi An Ng authored 4 years ago

This will make these functions usable from Liftoff when we later
implement extended multiply instructions in Liftoff.

Bug: v8:11262
Change-Id: I5fb105bc0184675eb60cd8ae63cc13955b0f767d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2601876
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71885}

7ddcd92e

[x64] Sort out move instructions in codegen · 506c0979

Zhi An Ng authored 4 years ago

In AVX, it is better to use the appropriate integer or floating point
moves depending on which instructions produce/consume these moves, since
there can be a delay moving from integer to floating point domain. On
SSE systems, it is less important, and we can move movaps/movups which
is 1 byte shorter than movdqa/movdqu.

This patch cleans up a couple of places, and defines macro-assembler
functions Movdqa, Movdqu, Movapd, to call into movaps/movups when AVX is
not supported.

Change-Id: Iba6c54e218875f1a70f61792978d7b3f69edfb4b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599843
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71884}

506c0979

23 Dec, 2020 1 commit

[wasm-simd][x64][avx2] Improve codegen for load{8,16}_splat · c9560d1d

Zhi An Ng authored 4 years ago

Detect AVX2 support and use vpbroadcastb or vpbroadcastw.

No new assembler helpers required because we are only emitting the
VEX-128 versions of these instructions.

Bug: v8:11258
Change-Id: Ic50178daa6fc8fe767dfc788e61e67538066bdea
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596582
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71866}

c9560d1d

22 Dec, 2020 3 commits

[wasm-simd][x64] Pattern match on shufps-style shuffles · 3bb0f51a

Zhi An Ng authored 4 years ago

When a 8x16 shuffle matches a 32x4 shuffle (every group of 4 indices are
consecutive), and the first 2 indices are in the range [0-3], and the
other 2 indices are in the range [4-7], then we can match it to a
shufps. E.g. [0,2,4,6], [1,3,5,7]. These shuffles are commonly used to
extract odd/even floats.

Change-Id: I031fe44f71a13bbc72115c22b02a5eaaf29d3794
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596579
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71860}

3bb0f51a

[wasm-simd][x64] Optimize blend and palignr shuffles for AVX · a66bb000

Zhi An Ng authored 4 years ago

For pblendw and palignr, if AVX is supported, we can use the 3-operand
AVX instruction, this can save us a move.

Bug: v8:11270
Change-Id: Ifd837e29c76886a3008bc63c17d4a68bc6aae364
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596578Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71857}

a66bb000

[wasm-simd][x64] Optimize some integer widen_high ops · b145152d

Zhi An Ng authored 4 years ago

Optimize:
- i32x4.widen_high_i16x8_s
- i32x4.widen_high_i16x8_u
- i16x8.widen_high_i8x16_s
- i16x8.widen_high_i8x16_u

These optimizations were suggested in http://b/175364869.

The main change is to move away from palignr, which has a dependency on
dst, and also the AVX version is 2 bytes longer than the punpckhqdq.

For the signed and unsigned variants, we have slightly different
optimizations. Unsigned variants can use an punpckh* instruction with a
zero-ed scratch register, that effectively zero-extends. Signed variants
use the movhlps instruction to move high half to low half of dst, then
use packed signed extension instructions.

The common fallback for these instructions is to use pshufd, which does
not have a dependency on dst, but is 1 byte longer than the punpckh*
instructions.

FIXED=b/175364869

Change-Id: If28da2aaa8f6e39a58e63b01cc9a81bbbb294606
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591853Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71856}

b145152d

21 Dec, 2020 1 commit

[x64][ia32][wasm-simd] Optimize v128.bitselect · 8e9ad4f8

Zhi An Ng authored 4 years ago

Couple of optimizations for v128.bitselect on both ia32 and x64.

1. Remove an extra movaps when AVX is supported, since we have 3-operand
instructions
2. Tweak the algorithm from:
     xor(and(xor(src1, src2), mask) src2)

   To:
     or(and(src1, mask), andnot(src2, mask))
   It is easier to read and understand, and also eliminate a dependency
   chain (on kScratchDoubleReg) in the older algorithm.
3. Use integer forms of the logical ops. Older processors have higher
throughput on these, compared to the floating point ops. However, the
integer forms are 1 byte longer, so on SSE, we stick to the floating
point ops.

For AVX, this reduces instruction count from 9948 to 9868.

Change-Id: Idd5d26b99a76255dbfa63e2c304e6af3760c4ec6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591859Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71845}

8e9ad4f8

17 Dec, 2020 3 commits

[wasm-simd][x64] Optimize arch shuffle if AVX supported · 46ce9b05

Zhi An Ng authored 4 years ago

AVX has 3-operands shuffle/unpack operations. We currently always
require that dst == src0 in all cases, which is not required if we have
AVX. For the arch shuffles that map to a single native instruction, add
support to check for AVX in the instruction-selector, to not require
same as first, and in the code-gen to support generating AVX.

The other arch shuffles are slightly more complicated, and can be
optimized in a future change.

Bug: v8:11270
Change-Id: I25b271aeff71fbe860d5bcc8abb17c36bcdab32c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591858Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71820}

46ce9b05

[x64][wasm-simd] Only require unique registers for shuffles that use temps · 12e37399

Zhi An Ng authored 4 years ago

An improvement to generic shuffle improvement
(https://crrev.com/c/2152853) required a temporary SIMD register to hold
the mask, rather than pushing it onto a stack. The temporary register
requires that we UseUniqueRegister on the inputs, to prevent aliasing,
as we will write to the temp. However, we only need this for the generic
shuffle. We accidentally over-constraint all other pattern matched
shuffles, since they don't use any temps.

On a ~2000 line function containing ~150 shuffles (not all of which are
generic shuffles), we get 16 less instruction in the native code, and
actually see a very small improvement in the overall benchmarks.

Bug: v8:11270
Change-Id: I09974f7615e4b8f5e2416ed17ca47cc7613fd6b1
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591857Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71818}

12e37399

[wasm-simd][ia32][x64] More optimization for f32x4.extract_lane · 741e5a66

Zhi An Ng authored 4 years ago

We can have more optimizations for this instruction, they leave some
junk in the top lanes of dst, but that doesn't matter:

- when lane is 1: we use movshdup, this is 4 bytes long
- when lane is 2: use movhlps, this is 3 bytes long
- otherwise use shufps (4 bytes) or pshufd (5 bytes)

All of which are better than insertps (6 bytes).

Change-Id: I0e524431d1832e297e8c8bb418d42382d93fa691
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591850
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71813}

741e5a66

16 Dec, 2020 4 commits

Reland "[Turboprop] Move dynamic check maps immediate args to deopt exit." · 7bdb0fbb

Ross McIlroy authored 4 years ago

This is a reland of b2a611d8

Original change's description:
> [Turboprop] Move dynamic check maps immediate args to deopt exit.
>
> Rather than loading the immediate arguments required by the
> dynamic check maps builtin into registers in the fast-path,
> instead insert them into the instruction stream in the deopt
> exit and have the builtin load them into registers itself.
>
> BUG=v8:10582
>
> Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736
> Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Sathya Gunasekaran  <gsathya@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#71790}

TBR=tebbi@chromium.org,gsathya@chromium.org

Bug: v8:10582
Change-Id: Ieda0295ee135bff983c67c3f04bb47115f0a2739
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2595311Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71803}

7bdb0fbb

Revert "[Turboprop] Move dynamic check maps immediate args to deopt exit." · a1ec77e6

Clemens Backes authored 4 years ago

This reverts commit b2a611d8.

Reason for revert: Several failures on https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/3743/overview

Original change's description:
> [Turboprop] Move dynamic check maps immediate args to deopt exit.
>
> Rather than loading the immediate arguments required by the
> dynamic check maps builtin into registers in the fast-path,
> instead insert them into the instruction stream in the deopt
> exit and have the builtin load them into registers itself.
>
> BUG=v8:10582
>
> Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736
> Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Sathya Gunasekaran  <gsathya@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#71790}

TBR=rmcilroy@chromium.org,gsathya@chromium.org,tebbi@chromium.org

Change-Id: I4c56bee156ffcea8de0aeaff9ac1bf03e03134c9
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:10582
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2595308Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71793}

a1ec77e6

[Turboprop] Move dynamic check maps immediate args to deopt exit. · b2a611d8

Ross McIlroy authored 4 years ago

Rather than loading the immediate arguments required by the
dynamic check maps builtin into registers in the fast-path,
instead insert them into the instruction stream in the deopt
exit and have the builtin load them into registers itself.

BUG=v8:10582

Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Sathya Gunasekaran  <gsathya@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71790}

b2a611d8

[wasm-simd][x64] Fix definition of Shufps · 5f4b0e47

Zhi An Ng authored 4 years ago

The definition of Shufps is wrong, we are incorrectly passing 0 as the
immediate in all cases. No tests broke because we only used Shufps for
splats, which has imm8 == 0 anyway.

Also, it was using movss, which only moves a single 32-bit. Because we
were using it only for f32x4 splat, this ended up being enough (imm8 ==
0 meant that we only shuffled the low 32-bit). This is fixed to use
movaps, which moves the entire 128-bit register.

Also tweak the definition of Shufps to take 4 arguments. `vshufps dst,
src1, src2, imm8` shuffles src1 and src2 into dst. `shufps dst, src,
imm8`, shuffles dst and src into dst.

So `Shufps(dst, src, imm8)` is ambiguous in the AVX case, it could be:
1. vshufps(dst, src, src, imm8), or
2. vshufps(dst, dst, src, imm8)

2. is more likely to be the intended behavior, but it introduces a false
dependency on the value of dst.

With `Shufps(dst, src1, src2, imm8)`, it is clearer what the behavior
should be:
1. shufps(dst, src2, imm8) matches the AVX behavior IFF dst == src1.

Change-Id: I60dc4ec868023d28d00f2b09d2c53b82a729bc4d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591849Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71775}

5f4b0e47

15 Dec, 2020 1 commit

[x64][wasm-simd] Pattern match 32x4 rotate · 7c98abdb

Zhi An Ng authored 4 years ago

Code like:

  x = wasm_v32x4_shuffle(x, x, 1, 2, 3, 0);

is currently matched by S8x16Concat, which lowers to two instructions:

  movapd xmm_dst, xmm_src
  palignr xmm_dst, xmm_src, 0x4

There is a special case after a S8x16Concat is matched:.

- is_swizzle, the inputs are the same
- it is a 32x4 shuffle (offset % 4 == 0)

Which can have a better codegen:

- (dst == src) shufps dst, src, 0b00111001
- (dst != src) pshufd dst, src, 0b00111001

Add a new simd shuffle matcher which will match 32x4 rotate, and
construct the appropriate indices referring to the 32x4 elements.

pshufd for the given example. However, this matching happens after
S8x16Concat, so we get the palignr first. We could move the pattern
matching cases around, but it will lead to some cases where
where it would have matched a S8x16Concat, but now matches a
S32x4shuffle instead, leading to worse codegen.

Note: we also pattern match on 32x4Swizzle, which correctly generates
Change-Id: Ie3aca53bbc06826be2cf49632de4c24ec73d0a9a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589062Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71754}

7c98abdb

14 Dec, 2020 2 commits

[wasm-simd][x64] Optimize f64x2.extract_lane · 6cb61e63

Zhi An Ng authored 4 years ago

pextrq + movq crosses register files twice, which is not efficient.

Optimize this by:
- checking if lane 0, do nothing if dst == src (macro-assembler helper)
- use vmovhlps on AVX, with src as the operands to avoid false
dependency on dst
- use movhlps otherwise, this is shorter than shufpd, and faster on
older system

Change-Id: I3486d87224c048b3229c2f92359b8b8e6d5fd025
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589056
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71751}

6cb61e63

[x64][wasm-simd] Optimize f32x4.extract_lane · 3ea458be

Zhi An Ng authored 4 years ago

Change the codegen for f32x4.extract_lane from shufps to insertps when
AVX is supported. They have the same performance, but shufps has a false
dependency on dst (it shuffles dst and src, but we don't care about dst
at all).

Also for SSE, extractps + movd crosses register files, so change it to
use insertps as well.

Change-Id: Idf45849d37ac3499bf3371ba2fa6ae05829aa8a7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589048
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71747}

3ea458be

10 Dec, 2020 3 commits

Revert "[compiler][wasm] Align Frame slots to value size" · ba4c08a9

Bill Budge authored 4 years ago

This reverts commit cddaf66c.

Reason for revert: Multiple fuzzer failures

TBR=neis@chromium.org,ahaas@chromium.org

Original change's description:
> [compiler][wasm] Align Frame slots to value size
>
> - Adds an AlignedSlotAllocator class and tests, to unify slot
>   allocation. This attempts to use alignment holes for smaller
>   values.
> - Reworks Frame to use the new allocator for stack slots.
> - Reworks LinkageAllocator to use the new allocator for stack
>   slots and for ARMv7 FP register aliasing.
> - Fixes the RegisterAllocator to align spill slots.
> - Fixes InstructionSelector to align spill slots.
>
> Bug: v8:9198
>
> Change-Id: Ida148db428be89ef95de748ec5fc0e7b0358f523
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2512840
> Commit-Queue: Bill Budge <bbudge@chromium.org>
> Reviewed-by: Georg Neis <neis@chromium.org>
> Reviewed-by: Andreas Haas <ahaas@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#71644}

TBR=bbudge@chromium.org,neis@chromium.org,ahaas@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: v8:9198
Change-Id: Ib26d016df6f30f333d30b5ac14eed9630bba8252
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2584200
Commit-Queue: Bill Budge <bbudge@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71703}

ba4c08a9

[wasm-simd][x64] Prototype extended pairwise addition · aee85229

Zhi An Ng authored 4 years ago

Add new macro-assembler instructions that can handle both AVX and SSE.
In the SSE case it checks that dst == src1. (This is different from that
the AvxHelper does, which passes dst as the first operand to AVX
instructions.)

Sorted SSSE3_INSTRUCTION_LIST by instruction code.

Header additions are added by clangd, we were already using something
from those headers via transitive includes, adding them explicitly gets
us closer to IWYU.

Codegen sequences are from https://github.com/WebAssembly/simd/pull/380
and also
https://github.com/WebAssembly/simd/pull/380#issuecomment-707440671.

Bug: v8:11086
Change-Id: I4c04f836e471ed8b00f9ff1a1b2e6348a593d4de
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2578797
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71688}

aee85229

[wasm-simd][x64] Prototype extended multiply · baf7e902

Zhi An Ng authored 4 years ago

Bug: v8:11008
Change-Id: Ic72e71eb10a5b47c97467bf6d25e55d20425273a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2575784Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71686}

baf7e902

07 Dec, 2020 1 commit

[compiler][wasm] Align Frame slots to value size · cddaf66c

Bill Budge authored 4 years ago

- Adds an AlignedSlotAllocator class and tests, to unify slot
  allocation. This attempts to use alignment holes for smaller
  values.
- Reworks Frame to use the new allocator for stack slots.
- Reworks LinkageAllocator to use the new allocator for stack
  slots and for ARMv7 FP register aliasing.
- Fixes the RegisterAllocator to align spill slots.
- Fixes InstructionSelector to align spill slots.

Bug: v8:9198

Change-Id: Ida148db428be89ef95de748ec5fc0e7b0358f523
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2512840
Commit-Queue: Bill Budge <bbudge@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71644}

cddaf66c

03 Dec, 2020 1 commit

[wasm-simd][x64] Small optimization for i64x2.splat · f6291634

Zhi An Ng authored 4 years ago

Movddup can take a memory operand, so we can save a move from gp reg to
xmm reg in that case. No problem with unaligned memory since we are
loading 64 bits (not 128 bits).

Also drive-by comment on i32x4.splat, it uses pshufd, which can also take
a memory operand (saving a mov), but we need aligned memory for that
first.

Bug: v8:9198
Change-Id: I55969888db1debb6ed4d193f767589d0da598386
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2567538Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71580}

f6291634

01 Dec, 2020 1 commit

[compiler] Adjust slot calculations for return slots. · 366e5e24

Bill Budge authored 4 years ago

- Uses linkage location information, to keep in sync with how
  LinkageAllocator and Frame work to assign stack slots.

Bug: v8:9198

Change-Id: I299038e4cff706355263f00603ba32515449fefe
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2556259Reviewed-by: Maya Lekova <mslekova@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Thibaud Michaud <thibaudm@chromium.org>
Commit-Queue: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71532}

366e5e24

17 Nov, 2020 1 commit

Replace libc functions with base wrappers · ba681fdb

John Xu authored 4 years ago

Bug: v8:10927
Change-Id: Icbdc0d7329ddd466e7d67a954246a35795b4dece
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2507310
Commit-Queue: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Toon Verwaest <verwaest@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71220}

ba681fdb

12 Nov, 2020 1 commit

[heap] Do not use V8_LIKELY on FLAG_disable_write_barriers. · 4a89c018

Pierre Langlois authored 4 years ago

FLAG_disable_write_barriers is a constexpr so the V8_LIKELY macro isn't
necessary. Interestingly, it can also cause clang to warn that the code
is unreachable, whereas without `__builtin_expect()` the compiler
doesn't mind. See for example:

```
constexpr bool kNo = false;

void warns() {
  if (__builtin_expect(kNo, 0)) {
    int a = 42;
  }
}

void does_not_warn() {
  if (kNo) {
    int a = 42;
  }
}
```

Compiling V8 for arm64 with both `v8_disable_write_barriers = true` and
`v8_enable_pointer_compression = false` would trigger this warning.

Bug: v8:9533
Change-Id: Id2ae156d60217007bb9ebf50628e8908e0193d05
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2534811Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Commit-Queue: Pierre Langlois <pierre.langlois@arm.com>
Cr-Commit-Position: refs/heads/master@{#71157}

4a89c018

05 Nov, 2020 1 commit

[wasm-simd][x64] Optimize integer splats of constant 0 · 7d7b25d9

Zhi An Ng authored 4 years ago

Integer splats (especially for sizes < 32-bits) does not directly
translate to a single instruction on x64. We can do better for special
values, like 0, which can be lowered to `xor dst dst`. We do this check
in the instruction selector, and emit a special opcode kX64S128Zero.

Also change the xor operation for kX64S128Zero from xorps to pxor. This
can help reduce any potential data bypass delay (search for this on
agner's microarchitecture manual for more details.). Since integer
splats are likely to be followed by integer ops, we should remain in the
integer domain, thus use pxor.

For i64x2.splat the codegen goes from:

  xorl rdi,rdi
  vmovq xmm0,rdi
  vmovddup xmm0,xmm0

to:

  vpxor xmm0,xmm0,xmm0

Also add a unittest to verify this optimization, and necessary
raw-assembler methods for the test.

Bug: v8:11093
Change-Id: I26b092032b6e672f1d5d26e35d79578ebe591cfe
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2516299Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70977}

7d7b25d9

04 Nov, 2020 2 commits

Revert "[wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX" · 4f4dda3f

Clemens Backes authored 4 years ago

This reverts commit 3c4e434f.

Reason for revert: Fails noavx tests: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20-%20debug/34613

Original change's description:
> [wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX
>
> The AVX versions of these instructions can take 3 operands, so we don't
> need to force dst == src.
>
> Bug: v8:9561
> Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337
> Reviewed-by: Clemens Backes <clemensb@chromium.org>
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70958}

TBR=clemensb@chromium.org,zhin@chromium.org

Change-Id: I5fcdd2e51d418cb32a1b1e2bec7c0dff19f29154
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:9561
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2519558Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70961}

4f4dda3f

[wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX · 3c4e434f

Zhi An Ng authored 4 years ago

The AVX versions of these instructions can take 3 operands, so we don't
need to force dst == src.

Bug: v8:9561
Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70958}

3c4e434f

02 Nov, 2020 1 commit

[wasm-simd] Enhance Shufps to copy src to dst · 14570fe0

Zhi An Ng authored 4 years ago

Extract Shufps to handle both AVX and SSE cases, in the SSE case it will
copy src to dst if they are not the same. This allows us to use it in
Liftoff as well, without the extra copy when AVX is supported.

In other places, the usage of Shufps is unnecessary, since they are
within a clause checking for non-AVX support, so we can simply use the
shufps (non-macro-assembler).

Bug: v8:9561
Change-Id: Icb043d7a43397c1b0810ece2666be567f0f5986c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2513866Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70911}

14570fe0

30 Oct, 2020 1 commit

[wasm-simd][x64] Consolidate some instructions into macro list · 02b79c2b

Zhi An Ng authored 4 years ago

These operations can be moved into an existing macro list, since they
are simple operations that generate only 1 instruction. The benefit is
that they have support for AVX 3-operand instruction, and does not have
to force dst to be equals to src.

Bug: v8:9561
Change-Id: I9ec1d2496d14cb9f0fb3b4854ca39887eb5bf49b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505240Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70893}

02b79c2b

29 Oct, 2020 2 commits

[wasm-simd][x64] Don't fix dst to src on AVX · d4f7ea80

Zhi An Ng authored 4 years ago

On AVX, many instructions can have 3 operands, unlike SSE which only has
2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that
will cause some unnecessary moves.

This change moves a bunch of instructions that have single instruction
codegen into a macro list which supports the this non-restricted AVX
codegen.

Bug: v8:9561
Change-Id: I348a8396e8a1129daf2e1ed08ae8526e1bc3a73b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505254Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70888}

d4f7ea80

Reland "[wasm-simd][ia32][x64] Only use registers for shuffles" · 45cb1ce0

Zhi An Ng authored 4 years ago

This is a reland of 3fb07882

Original change's description:
> [wasm-simd][ia32][x64] Only use registers for shuffles
>
> Shuffles have pattern matching clauses which, depending on the
> instruction used, can require src0 or src1 to be register or not.
> However we do not have 16-byte alignment for SIMD operands yet, so it
> will segfault when we use an SSE SIMD instruction with unaligned
> operands.
>
> This patch fixes all the shuffle cases to always use a register for the
> input nodes, and it does so by ignoring the values of src0_needs_reg and
> src1_needs_reg. When we eventually have memory alignment, we can
> re-enable this check, without mucking around too much in the logic in
> each shuffle match clause.
>
> Bug: v8:9198
> Change-Id: I264e136f017353019f19954c62c88206f7b90656
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849
> Reviewed-by: Andreas Haas <ahaas@chromium.org>
> Reviewed-by: Adam Klein <adamk@chromium.org>
> Commit-Queue: Adam Klein <adamk@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70848}

Bug: v8:9198
Change-Id: I40c6c8f0cd8908a2d6ab7016d8ed4d4fb2ab4114
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505250Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70862}

45cb1ce0