Commits · 89314801c02bbe1464fff3dbca676a4aac12e57a · Linshizhi / V8

17 Nov, 2020 1 commit

Replace libc functions with base wrappers · ba681fdb

John Xu authored 4 years ago

Bug: v8:10927
Change-Id: Icbdc0d7329ddd466e7d67a954246a35795b4dece
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2507310
Commit-Queue: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Toon Verwaest <verwaest@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71220}

ba681fdb

12 Nov, 2020 1 commit

[heap] Do not use V8_LIKELY on FLAG_disable_write_barriers. · 4a89c018

Pierre Langlois authored 4 years ago

FLAG_disable_write_barriers is a constexpr so the V8_LIKELY macro isn't
necessary. Interestingly, it can also cause clang to warn that the code
is unreachable, whereas without `__builtin_expect()` the compiler
doesn't mind. See for example:

```
constexpr bool kNo = false;

void warns() {
  if (__builtin_expect(kNo, 0)) {
    int a = 42;
  }
}

void does_not_warn() {
  if (kNo) {
    int a = 42;
  }
}
```

Compiling V8 for arm64 with both `v8_disable_write_barriers = true` and
`v8_enable_pointer_compression = false` would trigger this warning.

Bug: v8:9533
Change-Id: Id2ae156d60217007bb9ebf50628e8908e0193d05
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2534811Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Commit-Queue: Pierre Langlois <pierre.langlois@arm.com>
Cr-Commit-Position: refs/heads/master@{#71157}

4a89c018

05 Nov, 2020 1 commit

[wasm-simd][x64] Optimize integer splats of constant 0 · 7d7b25d9

Zhi An Ng authored 4 years ago

Integer splats (especially for sizes < 32-bits) does not directly
translate to a single instruction on x64. We can do better for special
values, like 0, which can be lowered to `xor dst dst`. We do this check
in the instruction selector, and emit a special opcode kX64S128Zero.

Also change the xor operation for kX64S128Zero from xorps to pxor. This
can help reduce any potential data bypass delay (search for this on
agner's microarchitecture manual for more details.). Since integer
splats are likely to be followed by integer ops, we should remain in the
integer domain, thus use pxor.

For i64x2.splat the codegen goes from:

  xorl rdi,rdi
  vmovq xmm0,rdi
  vmovddup xmm0,xmm0

to:

  vpxor xmm0,xmm0,xmm0

Also add a unittest to verify this optimization, and necessary
raw-assembler methods for the test.

Bug: v8:11093
Change-Id: I26b092032b6e672f1d5d26e35d79578ebe591cfe
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2516299Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70977}

7d7b25d9

04 Nov, 2020 2 commits

Revert "[wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX" · 4f4dda3f

Clemens Backes authored 4 years ago

This reverts commit 3c4e434f.

Reason for revert: Fails noavx tests: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20-%20debug/34613

Original change's description:
> [wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX
>
> The AVX versions of these instructions can take 3 operands, so we don't
> need to force dst == src.
>
> Bug: v8:9561
> Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337
> Reviewed-by: Clemens Backes <clemensb@chromium.org>
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70958}

TBR=clemensb@chromium.org,zhin@chromium.org

Change-Id: I5fcdd2e51d418cb32a1b1e2bec7c0dff19f29154
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:9561
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2519558Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70961}

4f4dda3f

[wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX · 3c4e434f

Zhi An Ng authored 4 years ago

The AVX versions of these instructions can take 3 operands, so we don't
need to force dst == src.

Bug: v8:9561
Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70958}

3c4e434f

02 Nov, 2020 1 commit

[wasm-simd] Enhance Shufps to copy src to dst · 14570fe0

Zhi An Ng authored 4 years ago

Extract Shufps to handle both AVX and SSE cases, in the SSE case it will
copy src to dst if they are not the same. This allows us to use it in
Liftoff as well, without the extra copy when AVX is supported.

In other places, the usage of Shufps is unnecessary, since they are
within a clause checking for non-AVX support, so we can simply use the
shufps (non-macro-assembler).

Bug: v8:9561
Change-Id: Icb043d7a43397c1b0810ece2666be567f0f5986c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2513866Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70911}

14570fe0

30 Oct, 2020 1 commit

[wasm-simd][x64] Consolidate some instructions into macro list · 02b79c2b

Zhi An Ng authored 4 years ago

These operations can be moved into an existing macro list, since they
are simple operations that generate only 1 instruction. The benefit is
that they have support for AVX 3-operand instruction, and does not have
to force dst to be equals to src.

Bug: v8:9561
Change-Id: I9ec1d2496d14cb9f0fb3b4854ca39887eb5bf49b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505240Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70893}

02b79c2b

29 Oct, 2020 2 commits

[wasm-simd][x64] Don't fix dst to src on AVX · d4f7ea80

Zhi An Ng authored 4 years ago

On AVX, many instructions can have 3 operands, unlike SSE which only has
2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that
will cause some unnecessary moves.

This change moves a bunch of instructions that have single instruction
codegen into a macro list which supports the this non-restricted AVX
codegen.

Bug: v8:9561
Change-Id: I348a8396e8a1129daf2e1ed08ae8526e1bc3a73b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505254Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70888}

d4f7ea80

Reland "[wasm-simd][ia32][x64] Only use registers for shuffles" · 45cb1ce0

Zhi An Ng authored 4 years ago

This is a reland of 3fb07882

Original change's description:
> [wasm-simd][ia32][x64] Only use registers for shuffles
>
> Shuffles have pattern matching clauses which, depending on the
> instruction used, can require src0 or src1 to be register or not.
> However we do not have 16-byte alignment for SIMD operands yet, so it
> will segfault when we use an SSE SIMD instruction with unaligned
> operands.
>
> This patch fixes all the shuffle cases to always use a register for the
> input nodes, and it does so by ignoring the values of src0_needs_reg and
> src1_needs_reg. When we eventually have memory alignment, we can
> re-enable this check, without mucking around too much in the logic in
> each shuffle match clause.
>
> Bug: v8:9198
> Change-Id: I264e136f017353019f19954c62c88206f7b90656
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849
> Reviewed-by: Andreas Haas <ahaas@chromium.org>
> Reviewed-by: Adam Klein <adamk@chromium.org>
> Commit-Queue: Adam Klein <adamk@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70848}

Bug: v8:9198
Change-Id: I40c6c8f0cd8908a2d6ab7016d8ed4d4fb2ab4114
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505250Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70862}

45cb1ce0

28 Oct, 2020 4 commits

Revert "[wasm-simd][ia32][x64] Only use registers for shuffles" · a308d3fe

Francis McCabe authored 4 years ago

This reverts commit 3fb07882.

Reason for revert: failing noavx tests:
https://ci.chromium.org/p/v8/builders/ci/V8%20Linux/39390?


Original change's description:
> [wasm-simd][ia32][x64] Only use registers for shuffles
>
> Shuffles have pattern matching clauses which, depending on the
> instruction used, can require src0 or src1 to be register or not.
> However we do not have 16-byte alignment for SIMD operands yet, so it
> will segfault when we use an SSE SIMD instruction with unaligned
> operands.
>
> This patch fixes all the shuffle cases to always use a register for the
> input nodes, and it does so by ignoring the values of src0_needs_reg and
> src1_needs_reg. When we eventually have memory alignment, we can
> re-enable this check, without mucking around too much in the logic in
> each shuffle match clause.
>
> Bug: v8:9198
> Change-Id: I264e136f017353019f19954c62c88206f7b90656
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849
> Reviewed-by: Andreas Haas <ahaas@chromium.org>
> Reviewed-by: Adam Klein <adamk@chromium.org>
> Commit-Queue: Adam Klein <adamk@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70848}

TBR=adamk@chromium.org,ahaas@chromium.org,zhin@chromium.org

Change-Id: Icc7cc1ceb7ca5aa5d859239330743dde2e5f213c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:9198
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505719Reviewed-by: Francis McCabe <fgm@chromium.org>
Commit-Queue: Francis McCabe <fgm@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70852}

a308d3fe

[turbofan] Pierce TypeGuards and FoldConstants in ValueMatcher · 34610db8

Shu-yu Guo authored 4 years ago

Change-Id: I4ab54dac771bb551c2435a98f9e53194a6f27853
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2495494
Commit-Queue: Shu-yu Guo <syg@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70851}

34610db8

[wasm-simd][ia32][x64] Only use registers for shuffles · 3fb07882

Zhi An Ng authored 4 years ago

Shuffles have pattern matching clauses which, depending on the
instruction used, can require src0 or src1 to be register or not.
However we do not have 16-byte alignment for SIMD operands yet, so it
will segfault when we use an SSE SIMD instruction with unaligned
operands.

This patch fixes all the shuffle cases to always use a register for the
input nodes, and it does so by ignoring the values of src0_needs_reg and
src1_needs_reg. When we eventually have memory alignment, we can
re-enable this check, without mucking around too much in the logic in
each shuffle match clause.

Bug: v8:9198
Change-Id: I264e136f017353019f19954c62c88206f7b90656
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70848}

3fb07882

[wasm-simd][x64] Prototype sign select · b0d79120

Zhi An Ng authored 4 years ago

Prototype i8x16, i16x8, i32x4, i64x2 sign select on x64 and interpreter.

Bug: v8:10983
Change-Id: I7d6f39a2cb4c2aefe31daac782978fe8b363dd1a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486235
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70818}

b0d79120

27 Oct, 2020 1 commit

[wasm-simd][x64] Merge i8x16 and i16x8 extract lane with pextr · 165a7ad6

Ng Zhi An authored 4 years ago

i8x16.extract_lane_u is pextrb, and i16x8.extract_lane_u is pextrw, we
can merge them instead of having separate opcodes.

R=bbudge@chromium.org

Bug: v8:10975
Change-Id: I7793a795905157b6094b1470d3437988c982af91
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481834Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70771}

165a7ad6

22 Oct, 2020 1 commit

Revert "[ia32,x64] Make more use of the 'leave' instruction" · e02a625a

Georg Neis authored 4 years ago

This reverts half of commit 8f0ab471.

Reason for revert: some performance regressions, possibly due
to 'leave' needing MSROM on some microarchitectures.

The half that is not reverted is the removal of 'enter'.


Original change's description:
> [ia32,x64] Make more use of the 'leave' instruction
>
> It is a little shorter and cheaper[1] than the equivalent
> "mov sp,bp; pop bp".
>
> Also remove support for the 'enter' instruction, since
> - it is unused,
> - it is neither shorter nor cheaper than the corresponding
>   push and mov (in fact more expensive[1]), and
> - our disassembler doesn't support it.
>
> [1] See https://www.agner.org/optimize/instruction_tables.pdf
>
> Change-Id: I6c99c2f3e53081aea55445a54e18eaf45baa79c2
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2482822
> Commit-Queue: Georg Neis <neis@chromium.org>
> Reviewed-by: Victor Gomes <victorgomes@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70660}

TBR=neis@chromium.org,victorgomes@chromium.org
Bug: chromium:1141069
# Not skipping CQ checks because original CL landed > 1 day ago.

Change-Id: I5c9ad64ee06b71c93eff256044ce49d1523737fb
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2492327
Commit-Queue: Georg Neis <neis@chromium.org>
Reviewed-by: Victor Gomes <victorgomes@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70718}

e02a625a

21 Oct, 2020 1 commit

Reland "Reland "[deoptimizer] Change deopt entries into builtins"" · c7cb9bec

Jakob Gruber authored 4 years ago

This is a reland of fbfa9bf4

The arm64 was missing proper codegen for CFI, thus sizes were off.

Original change's description:
> Reland "[deoptimizer] Change deopt entries into builtins"
>
> This is a reland of 7f58ced7
>
> It fixes the different exit size emitted on x64/Atom CPUs due to
> performance tuning in TurboAssembler::Call. Additionally, add
> cctests to verify the fixed size exits.
>
> Original change's description:
> > [deoptimizer] Change deopt entries into builtins
> >
> > While the overall goal of this commit is to change deoptimization
> > entries into builtins, there are multiple related things happening:
> >
> > - Deoptimization entries, formerly stubs (i.e. Code objects generated
> >   at runtime, guaranteed to be immovable), have been converted into
> >   builtins. The major restriction is that we now need to preserve the
> >   kRootRegister, which was formerly used on most architectures to pass
> >   the deoptimization id. The solution differs based on platform.
> > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> > - Removed heap/ support for immovable Code generation.
> > - Removed the DeserializerData class (no longer needed).
> > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
> >   in which the final jump to the deoptimization entry is generated
> >   once per Code object, and deopt exits can continue to emit a
> >   near-call.
> > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
> >   sizes by 4/8, 5, and 5 bytes, respectively.
> >
> > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> > by using the same strategy as on arm64 (recalc deopt id from return
> > address). Before:
> >
> >  e300a002       movw r10, <id>
> >  e59fc024       ldr ip, [pc, <entry offset>]
> >  e12fff3c       blx ip
> >
> > After:
> >
> >  e59acb35       ldr ip, [r10, <entry offset>]
> >  e12fff3c       blx ip
> >
> > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> > object (max 32 bytes added overhead per Code object). Before:
> >
> >  9401cdae       bl <entry offset>
> >
> > After:
> >
> >  # eager deoptimization entry jump.
> >  f95b1f50       ldr x16, [x26, <eager entry offset>]
> >  d61f0200       br x16
> >  # lazy deoptimization entry jump.
> >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
> >  d61f0200       br x16
> >  # the deopt exit.
> >  97fffffc       bl <eager deoptimization entry jump offset>
> >
> > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
> >
> >  bb00000000     mov ebx,<id>
> >  e825f5372b     call <entry>
> >
> > After:
> >
> >  e8ea2256ba     call <entry>
> >
> > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
> >
> >  49c7c511000000 REX.W movq r13,<id>
> >  e8ea2f0700     call <entry>
> >
> > After:
> >
> >  41ff9560360000 call [r13+<entry offset>]
> >
> > Bug: v8:8661,v8:8768
> > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#70597}
>
> Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
> Bug: v8:8661,v8:8768,chromium:1140165
> Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70655}

Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
Bug: v8:8661
Bug: v8:8768
Bug: chromium:1140165
Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70672}

c7cb9bec

20 Oct, 2020 4 commits

[ia32,x64] Make more use of the 'leave' instruction · 8f0ab471

Georg Neis authored 4 years ago

It is a little shorter and cheaper[1] than the equivalent
"mov sp,bp; pop bp".

Also remove support for the 'enter' instruction, since
- it is unused,
- it is neither shorter nor cheaper than the corresponding
  push and mov (in fact more expensive[1]), and
- our disassembler doesn't support it.

[1] See https://www.agner.org/optimize/instruction_tables.pdf

Change-Id: I6c99c2f3e53081aea55445a54e18eaf45baa79c2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2482822
Commit-Queue: Georg Neis <neis@chromium.org>
Reviewed-by: Victor Gomes <victorgomes@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70660}

8f0ab471

Revert "Reland "[deoptimizer] Change deopt entries into builtins"" · 7c7aa4fa

Maya Lekova authored 4 years ago

This reverts commit fbfa9bf4.

Reason for revert: Seems to break arm64 sim CFI build (please see DeoptExitSizeIfFixed) - https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/2808

Original change's description:
> Reland "[deoptimizer] Change deopt entries into builtins"
>
> This is a reland of 7f58ced7
>
> It fixes the different exit size emitted on x64/Atom CPUs due to
> performance tuning in TurboAssembler::Call. Additionally, add
> cctests to verify the fixed size exits.
>
> Original change's description:
> > [deoptimizer] Change deopt entries into builtins
> >
> > While the overall goal of this commit is to change deoptimization
> > entries into builtins, there are multiple related things happening:
> >
> > - Deoptimization entries, formerly stubs (i.e. Code objects generated
> >   at runtime, guaranteed to be immovable), have been converted into
> >   builtins. The major restriction is that we now need to preserve the
> >   kRootRegister, which was formerly used on most architectures to pass
> >   the deoptimization id. The solution differs based on platform.
> > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> > - Removed heap/ support for immovable Code generation.
> > - Removed the DeserializerData class (no longer needed).
> > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
> >   in which the final jump to the deoptimization entry is generated
> >   once per Code object, and deopt exits can continue to emit a
> >   near-call.
> > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
> >   sizes by 4/8, 5, and 5 bytes, respectively.
> >
> > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> > by using the same strategy as on arm64 (recalc deopt id from return
> > address). Before:
> >
> >  e300a002       movw r10, <id>
> >  e59fc024       ldr ip, [pc, <entry offset>]
> >  e12fff3c       blx ip
> >
> > After:
> >
> >  e59acb35       ldr ip, [r10, <entry offset>]
> >  e12fff3c       blx ip
> >
> > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> > object (max 32 bytes added overhead per Code object). Before:
> >
> >  9401cdae       bl <entry offset>
> >
> > After:
> >
> >  # eager deoptimization entry jump.
> >  f95b1f50       ldr x16, [x26, <eager entry offset>]
> >  d61f0200       br x16
> >  # lazy deoptimization entry jump.
> >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
> >  d61f0200       br x16
> >  # the deopt exit.
> >  97fffffc       bl <eager deoptimization entry jump offset>
> >
> > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
> >
> >  bb00000000     mov ebx,<id>
> >  e825f5372b     call <entry>
> >
> > After:
> >
> >  e8ea2256ba     call <entry>
> >
> > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
> >
> >  49c7c511000000 REX.W movq r13,<id>
> >  e8ea2f0700     call <entry>
> >
> > After:
> >
> >  41ff9560360000 call [r13+<entry offset>]
> >
> > Bug: v8:8661,v8:8768
> > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#70597}
>
> Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
> Bug: v8:8661,v8:8768,chromium:1140165
> Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70655}

TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org

Change-Id: I4739a3475bfd8ee0cfbe4b9a20382f91a6ef1bf0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:8661
Bug: v8:8768
Bug: chromium:1140165
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485223Reviewed-by: Maya Lekova <mslekova@chromium.org>
Commit-Queue: Maya Lekova <mslekova@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70658}

7c7aa4fa

Reland "[deoptimizer] Change deopt entries into builtins" · fbfa9bf4

Jakob Gruber authored 4 years ago

This is a reland of 7f58ced7

It fixes the different exit size emitted on x64/Atom CPUs due to
performance tuning in TurboAssembler::Call. Additionally, add
cctests to verify the fixed size exits.

Original change's description:
> [deoptimizer] Change deopt entries into builtins
>
> While the overall goal of this commit is to change deoptimization
> entries into builtins, there are multiple related things happening:
>
> - Deoptimization entries, formerly stubs (i.e. Code objects generated
>   at runtime, guaranteed to be immovable), have been converted into
>   builtins. The major restriction is that we now need to preserve the
>   kRootRegister, which was formerly used on most architectures to pass
>   the deoptimization id. The solution differs based on platform.
> - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> - Removed heap/ support for immovable Code generation.
> - Removed the DeserializerData class (no longer needed).
> - arm64: to preserve 4-byte deopt exits, introduced a new optimization
>   in which the final jump to the deoptimization entry is generated
>   once per Code object, and deopt exits can continue to emit a
>   near-call.
> - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
>   sizes by 4/8, 5, and 5 bytes, respectively.
>
> On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> by using the same strategy as on arm64 (recalc deopt id from return
> address). Before:
>
>  e300a002       movw r10, <id>
>  e59fc024       ldr ip, [pc, <entry offset>]
>  e12fff3c       blx ip
>
> After:
>
>  e59acb35       ldr ip, [r10, <entry offset>]
>  e12fff3c       blx ip
>
> On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> object (max 32 bytes added overhead per Code object). Before:
>
>  9401cdae       bl <entry offset>
>
> After:
>
>  # eager deoptimization entry jump.
>  f95b1f50       ldr x16, [x26, <eager entry offset>]
>  d61f0200       br x16
>  # lazy deoptimization entry jump.
>  f95b2b50       ldr x16, [x26, <lazy entry offset>]
>  d61f0200       br x16
>  # the deopt exit.
>  97fffffc       bl <eager deoptimization entry jump offset>
>
> On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
>
>  bb00000000     mov ebx,<id>
>  e825f5372b     call <entry>
>
> After:
>
>  e8ea2256ba     call <entry>
>
> On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
>
>  49c7c511000000 REX.W movq r13,<id>
>  e8ea2f0700     call <entry>
>
> After:
>
>  41ff9560360000 call [r13+<entry offset>]
>
> Bug: v8:8661,v8:8768
> Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70597}

Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
Bug: v8:8661,v8:8768,chromium:1140165
Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70655}

fbfa9bf4

Revert "[deoptimizer] Change deopt entries into builtins" · 8bc9a794

Jakob Gruber authored 4 years ago

This reverts commit 7f58ced7.

Reason for revert: Segfaults on Atom_x64 https://ci.chromium.org/p/v8-internal/builders/ci/v8_linux64_atom_perf/5686?

Original change's description:
> [deoptimizer] Change deopt entries into builtins
>
> While the overall goal of this commit is to change deoptimization
> entries into builtins, there are multiple related things happening:
>
> - Deoptimization entries, formerly stubs (i.e. Code objects generated
>   at runtime, guaranteed to be immovable), have been converted into
>   builtins. The major restriction is that we now need to preserve the
>   kRootRegister, which was formerly used on most architectures to pass
>   the deoptimization id. The solution differs based on platform.
> - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> - Removed heap/ support for immovable Code generation.
> - Removed the DeserializerData class (no longer needed).
> - arm64: to preserve 4-byte deopt exits, introduced a new optimization
>   in which the final jump to the deoptimization entry is generated
>   once per Code object, and deopt exits can continue to emit a
>   near-call.
> - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
>   sizes by 4/8, 5, and 5 bytes, respectively.
>
> On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> by using the same strategy as on arm64 (recalc deopt id from return
> address). Before:
>
>  e300a002       movw r10, <id>
>  e59fc024       ldr ip, [pc, <entry offset>]
>  e12fff3c       blx ip
>
> After:
>
>  e59acb35       ldr ip, [r10, <entry offset>]
>  e12fff3c       blx ip
>
> On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> object (max 32 bytes added overhead per Code object). Before:
>
>  9401cdae       bl <entry offset>
>
> After:
>
>  # eager deoptimization entry jump.
>  f95b1f50       ldr x16, [x26, <eager entry offset>]
>  d61f0200       br x16
>  # lazy deoptimization entry jump.
>  f95b2b50       ldr x16, [x26, <lazy entry offset>]
>  d61f0200       br x16
>  # the deopt exit.
>  97fffffc       bl <eager deoptimization entry jump offset>
>
> On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
>
>  bb00000000     mov ebx,<id>
>  e825f5372b     call <entry>
>
> After:
>
>  e8ea2256ba     call <entry>
>
> On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
>
>  49c7c511000000 REX.W movq r13,<id>
>  e8ea2f0700     call <entry>
>
> After:
>
>  41ff9560360000 call [r13+<entry offset>]
>
> Bug: v8:8661,v8:8768
> Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70597}

TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: v8:8661,v8:8768,chromium:1140165
Change-Id: I3df02ab42f6e02233d9f6fb80e8bb18f76870d91
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485504Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70649}

8bc9a794

19 Oct, 2020 6 commits

[wasm-simd][x64] Optimize more ops for AVX · 89d9eb73

Ng Zhi An authored 4 years ago

All these opcodes have a simple lowering into a single x64 instruction.
We can perform a similar optimization when AVX is supported to not force
dst == src1.

Bug: v8:10116
Change-Id: I4ad2975b6f241d8209025682202b476c08b3491b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486383Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70636}

89d9eb73

[wasm-simd][x64] Consolidate v128.load_zero with movss/movsd · c77dd2ff

Ng Zhi An authored 4 years ago

We don't need separate Load32Zero and Load64Zero instructions, since the
implementation is movss and movsd, which we already have.

Bug: v8:10713
Change-Id: I5d02e946f3bf9fe08f943a811f2d3cc8aec81ea8
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486233Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70635}

c77dd2ff

[wasm-simd] Rename v128.load32_zero to follow proposal · 9738fb5e

Ng Zhi An authored 4 years ago

Not sure why I originally chose to name it LoadMem32Zero instead of
Load32Zero like the proposal. This fixes it.

Bug: v8:10713
Change-Id: If05603f743213bc6b7aea0ce22c80ae4b3023ccf
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481824Reviewed-by: Bill Budge <bbudge@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70630}

9738fb5e

[wasm-simd][x64] Optimize f32x4 splat and extract lanes · 4068b3d2

Ng Zhi An authored 4 years ago

For splats, we can make use of vshufps to avoid a movss. Without
AVX, specific dst to be same as src in the instruction selector.

For extract lane, we can use vshufps to extract a float into a dst xmm,
and leave junk in the higher bits.

On the meshopt_decoder.js benchmark in linked bug, it removes about 7
movss instructions that did nothing. Hardware can do register renaming,
but let's not rely on that :)

R=bbudge@chromium.org

Bug: v8:10116
Change-Id: I4d68c10536a79659de673060d537d58113308477
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481473
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70628}

4068b3d2

Rename LoadKind to MemoryAccessKind · 0301534c

Ng Zhi An authored 4 years ago

LoadKind is not longer just for load, we use it for stores as well
(starting with https://crrev.com/c/2473383). Rename it to something more
generic.

Bug: v8:10975,v8:10933
Change-Id: I5e5406ea475e06a83eb2eefe22d4824a99029944
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481822
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70626}

0301534c

[deoptimizer] Change deopt entries into builtins · 7f58ced7

Jakob Gruber authored 4 years ago

While the overall goal of this commit is to change deoptimization
entries into builtins, there are multiple related things happening:

- Deoptimization entries, formerly stubs (i.e. Code objects generated
  at runtime, guaranteed to be immovable), have been converted into
  builtins. The major restriction is that we now need to preserve the
  kRootRegister, which was formerly used on most architectures to pass
  the deoptimization id. The solution differs based on platform.
- Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
- Removed heap/ support for immovable Code generation.
- Removed the DeserializerData class (no longer needed).
- arm64: to preserve 4-byte deopt exits, introduced a new optimization
  in which the final jump to the deoptimization entry is generated
  once per Code object, and deopt exits can continue to emit a
  near-call.
- arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
  sizes by 4/8, 5, and 5 bytes, respectively.

On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
by using the same strategy as on arm64 (recalc deopt id from return
address). Before:

 e300a002       movw r10, <id>
 e59fc024       ldr ip, [pc, <entry offset>]
 e12fff3c       blx ip

After:

 e59acb35       ldr ip, [r10, <entry offset>]
 e12fff3c       blx ip

On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
with CFI). Additionally, up to 4 builtin jumps are emitted per Code
object (max 32 bytes added overhead per Code object). Before:

 9401cdae       bl <entry offset>

After:

 # eager deoptimization entry jump.
 f95b1f50       ldr x16, [x26, <eager entry offset>]
 d61f0200       br x16
 # lazy deoptimization entry jump.
 f95b2b50       ldr x16, [x26, <lazy entry offset>]
 d61f0200       br x16
 # the deopt exit.
 97fffffc       bl <eager deoptimization entry jump offset>

On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:

 bb00000000     mov ebx,<id>
 e825f5372b     call <entry>

After:

 e8ea2256ba     call <entry>

On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:

 49c7c511000000 REX.W movq r13,<id>
 e8ea2f0700     call <entry>

After:

 41ff9560360000 call [r13+<entry offset>]

Bug: v8:8661,v8:8768
Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70597}

7f58ced7

16 Oct, 2020 5 commits

[wasm-simd][x64] Prototype store lane · 208578dc

Ng Zhi An authored 4 years ago

Store lane loads a value from memory and replaces a single lane of a
simd value.

This implements store lane for x64 and interpreter.

Bug: v8:10975
Change-Id: Ida79a03e0fd2bc18f2c06687311936b3cb550ed5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2473383Reviewed-by: Bill Budge <bbudge@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70586}

208578dc

[wasm-simd][x64] Optimize shifts for AVX · 2bc0b357

Ng Zhi An authored 4 years ago

With AVX, we don't need to force dst to be the same as first operand,
this can eliminate some moves.

(On the js file in linked bug, we can eliminate all movs before shifts,
saving ~20 movs.)

Bug: v8:10116
Change-Id: I7951b5d8e42995098ddee2a326d0fe6f183c0fb9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2477494
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70581}

2bc0b357

Reland "[TurboProp] Avoid marking the output of a call live in its catch handler" · 0403beb4

Ross McIlroy authored 4 years ago

This is a reland of cdc8d9a5

Skipped tests on gc_stress and fixed CONSTEXPR_DCHECK for gcc.

Original change's description:
> [TurboProp] Avoid marking the output of a call live in its catch handler
>
> The output of a call won't be live if an exception is thrown while the
> call is on the stack and we unwind to a catch handler.
>
> BUG=chromium:1138075,v8:9684
>
> Change-Id: I95bf535bac388940869eb213e25565d64fe96df1
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2476317
> Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Georg Neis <neis@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70562}

Bug: chromium:1138075
Bug: v8:9684
Change-Id: I685c94ee2ffcf06658df07fcef06f58c4f01f54b
Cq-Include-Trybots: luci.v8.try:v8_linux64_gcc_compile_dbg
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2479009
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Auto-Submit: Ross McIlroy <rmcilroy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70573}

0403beb4

Revert "[TurboProp] Avoid marking the output of a call live in its catch handler" · 56b55f3f

Michael Achenbach authored 4 years ago

This reverts commit cdc8d9a5.

Reason for revert: The regression test is too slow:
https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20gc%20stress/30454

Also gcc failures:
https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20gcc%20-%20debug/9528

Original change's description:
> [TurboProp] Avoid marking the output of a call live in its catch handler
>
> The output of a call won't be live if an exception is thrown while the
> call is on the stack and we unwind to a catch handler.
>
> BUG=chromium:1138075,v8:9684
>
> Change-Id: I95bf535bac388940869eb213e25565d64fe96df1
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2476317
> Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Georg Neis <neis@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70562}

TBR=rmcilroy@chromium.org,neis@chromium.org

Change-Id: I0f6b9378d516a70401fc429fb3612bbf962b0fb2
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: chromium:1138075
Bug: v8:9684
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2479007Reviewed-by: Michael Achenbach <machenbach@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70564}

56b55f3f

[TurboProp] Avoid marking the output of a call live in its catch handler · cdc8d9a5

Ross McIlroy authored 4 years ago

The output of a call won't be live if an exception is thrown while the
call is on the stack and we unwind to a catch handler.

BUG=chromium:1138075,v8:9684

Change-Id: I95bf535bac388940869eb213e25565d64fe96df1
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2476317
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70562}

cdc8d9a5

15 Oct, 2020 3 commits

[wasm-simd] Rename add saturate and sub saturate instructions · 35d23016

Ng Zhi An authored 4 years ago

Rename AddSaturate and SubSaturate to the shorter version, AddSat and
SubSat, following the spec.

Bug: v8:10946,v8:10933
Change-Id: Idf74b3a1eb2e2f6d4e37d2b8e5fa6d96ea090db4
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2436615Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Jakob Kummerow <jkummerow@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70549}

35d23016

[x64] Simplify AssembleReturn · 15237298

Victor Gomes authored 4 years ago

- Shortcut return when argc < param_count
- Simplify return code due to PopAndReturn invariant

Change-Id: Ie41d559cdbe0ba2cc4fdbfbbb622b0aec8429f03
Bug: v8:10201
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2474777
Commit-Queue: Victor Gomes <victorgomes@chromium.org>
Commit-Queue: Georg Neis <neis@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Reviewed-by: Igor Sheludko <ishell@chromium.org>
Auto-Submit: Victor Gomes <victorgomes@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70541}

15237298

[compiler,cleanup] Remove some redundant stuff · 193cfbf0

Georg Neis authored 4 years ago

In particular: initial values of local ArchOpcode variables that
get overwritten anyways. Creating these variables uninitialized makes
that obvious.

Change-Id: Ia205b5397c60769a46bf28ed60b299ac652f4b28
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2470557
Auto-Submit: Georg Neis <neis@chromium.org>
Reviewed-by: Nico Hartmann <nicohartmann@chromium.org>
Commit-Queue: Nico Hartmann <nicohartmann@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70526}

193cfbf0

14 Oct, 2020 2 commits

[x64] Change pextrw and pextrb to take uint8_t immediates · 8387acfa

Ng Zhi An authored 4 years ago

Make everything consistent, pinsr family was converted in
https://crrev.com/c/2443494.

Bug: v8:10933
Change-Id: I9d09bd477520ce71fccdcf4336135b54c058185c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2470203Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70517}

8387acfa

[x64] Use macro list to define AVX instructions · 96b7d98a

Ng Zhi An authored 4 years ago

We were already using it to define the SSE instructions.

Bug: v8:10933
Change-Id: I8c70c027449ee8b0d00a06298087310ced11cafc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2470200Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70516}

96b7d98a

13 Oct, 2020 1 commit

[wasm-simd] Merge extract lane ops into pinsr{b,w,d,q} · 99e252ba

Ng Zhi An authored 4 years ago

The only one that doesn't use a pinsr* is f32x4, which uses insertps, so
that is kept as it is.

Bug: v8:10933
Change-Id: I7442668812c674d4242949e13ef595978290bc8d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2458787Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70493}

99e252ba

12 Oct, 2020 3 commits

[wasm-simd][x64] Don't force dst to be same as src on AVX · 813ae013

Ng Zhi An authored 4 years ago

On AVX, many instructions can have 3 operands, unlike SSE which only has
2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that
will cause some unnecessary moves.

This patch changes a couple of F32x4 and S128 instructions to remove
this restriction when AVX is supported.

We can't use AvxHelper since it duplicates the dst for the call to the
AVX instruction, which isn't what we want. The alternative is to
redefine Mulps and other functions here, but there are other callsites
that depend on this duplicated-dst behavior, so it's harder to change.
We can migrate this as we move more logic over to non-DefineSameAsFirst
for AVX.

With the meshopt_decoder.js in the linked bug, it removes 8 SIMD movs
(from a function that has 300+ lines of assembly.)

Note that from agner's microarchitecture.pdf, page 127, "Elimination of
move instructions", many times such moves can be eliminated by the
processor. So this change won't speed up perf, but it helps a bit with
binary size, and decoder pressure.

Bug: v8:10116,v8:9561
Change-Id: I125bfd44e728ef08312620bc00f6433f376e69e3
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465653Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70462}

813ae013

[wasm-simd][x64] Prototype i64x2.bitmask · ceee7cfe

Ng Zhi An authored 4 years ago

Implement on interpreter and x64.

Bug: v8:10997
Change-Id: I3537ce54e1b56cc3b04d91cb07c430c35b88c3aa
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2459109
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70459}

ceee7cfe

[wasm-simd][x64] Prototype load lane · 673be63e

Ng Zhi An authored 4 years ago

Load lane loads a value from memory and replaces a single lane of a
simd value.

This implements the load (no stores yet) for x64 and interpreter.

Bug: v8:10975
Change-Id: I95d1b5e781ee9adaec23dda749e514f2485eda10
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2444578
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70456}

673be63e