- 12 Jan, 2021 1 commit
-
-
Zhi An Ng authored
Prototype these 4 instructions: - i64x2.widen_low_i32x4_s - i64x2.widen_high_i32x4_s - i64x2.widen_low_i32x4_u - i64x2.widen_high_i32x4_u Implementation is the same as x64. Drive-by fix to add a missing CpuFeatureScope to x64. Bug: v8:10972 Change-Id: Iacc84bce156053d0ac39b1a419727c93c499a8c9 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2612339 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Deepti Gandluri <gdeepti@chromium.org> Cr-Commit-Position: refs/heads/master@{#72025}
-
- 08 Jan, 2021 3 commits
-
-
Zhi An Ng authored
For Float64Abs, Float64Neg, F64x2Abs, and F64x2Neg, we can use the Abspd and Negpd helpers. These helpers will load the necessary masks as an ExternalReference. We cannot do the same with AVX, since the AVX codegen can already have one of the inputs as an Operand. Change-Id: I85f0a7437747b9cfe8bff735d7b27a957736818c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599850 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Deepti Gandluri <gdeepti@chromium.org> Cr-Commit-Position: refs/heads/master@{#71967}
-
Zhi An Ng authored
When AVX2 is available, we can use vbroadcastss. On AVX, use vshufps, since it is non-destructive. On SSE, shufps is 1 byte shorter. FIXED=b/175364402 Change-Id: I5bd10914579d8db012192a9c04f7b0038ec1c812 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599849Reviewed-by:
Deepti Gandluri <gdeepti@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71964}
-
Zhi An Ng authored
This is a reland of 2d5f981a The fix is in liftoff-assembler-x64, to call S128Select with dst as the mask when AVX is not supported and dst != mask. Original change's description: > [wasm-simd][liftoff][x64] Move v128.select into macro-assembler > > This allows us to reuse this optimized code sequence in Liftoff. > > We can't do the same thing in IA32 yet, there is no kScratchDoubleReg > defined in the macro-assembler-ia32.cc, it is defined in code-generator-ia32 > as xmm0 but I'm not sure if it is safe to just use that in the macro assembler. > > Change-Id: I6c761857c49d2518fbc82cd0796c62fc86665cb5 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596581 > Commit-Queue: Zhi An Ng <zhin@chromium.org> > Reviewed-by: Clemens Backes <clemensb@chromium.org> > Reviewed-by: Bill Budge <bbudge@chromium.org> > Cr-Commit-Position: refs/heads/master@{#71915} Change-Id: Ib96ce0e1d5762f6513ef87f240b25ef3ae59441f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2612324Reviewed-by:
Clemens Backes <clemensb@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71961}
-
- 05 Jan, 2021 2 commits
-
-
Clemens Backes authored
This reverts commit 2d5f981a. Reason for revert: Fails on noavx: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20-%20debug/35318 Original change's description: > [wasm-simd][liftoff][x64] Move v128.select into macro-assembler > > This allows us to reuse this optimized code sequence in Liftoff. > > We can't do the same thing in IA32 yet, there is no kScratchDoubleReg > defined in the macro-assembler-ia32.cc, it is defined in code-generator-ia32 > as xmm0 but I'm not sure if it is safe to just use that in the macro assembler. > > Change-Id: I6c761857c49d2518fbc82cd0796c62fc86665cb5 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596581 > Commit-Queue: Zhi An Ng <zhin@chromium.org> > Reviewed-by: Clemens Backes <clemensb@chromium.org> > Reviewed-by: Bill Budge <bbudge@chromium.org> > Cr-Commit-Position: refs/heads/master@{#71915} TBR=bbudge@chromium.org,clemensb@chromium.org,zhin@chromium.org Change-Id: I2aacee02c89a16516a9cd6686d8cc6180362f78e No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2610730Reviewed-by:
Clemens Backes <clemensb@chromium.org> Commit-Queue: Clemens Backes <clemensb@chromium.org> Cr-Commit-Position: refs/heads/master@{#71916}
-
Zhi An Ng authored
This allows us to reuse this optimized code sequence in Liftoff. We can't do the same thing in IA32 yet, there is no kScratchDoubleReg defined in the macro-assembler-ia32.cc, it is defined in code-generator-ia32 as xmm0 but I'm not sure if it is safe to just use that in the macro assembler. Change-Id: I6c761857c49d2518fbc82cd0796c62fc86665cb5 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596581 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Clemens Backes <clemensb@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71915}
-
- 29 Dec, 2020 4 commits
-
-
Zhi An Ng authored
Prototype these 4 instructions: - i64x2.widen_low_i32x4_s - i64x2.widen_high_i32x4_s - i64x2.widen_low_i32x4_u - i64x2.widen_high_i32x4_u Bug: v8:10972 Change-Id: I3defd0a2431252bc3f5bb45e022e62b37beb34ca Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2601012 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71888}
-
Zhi An Ng authored
Bug: v8:10971 Change-Id: I60186a445f3a5ad366cba4e6bcb16519098aa6ad Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2601009 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71886}
-
Zhi An Ng authored
This will make these functions usable from Liftoff when we later implement extended multiply instructions in Liftoff. Bug: v8:11262 Change-Id: I5fb105bc0184675eb60cd8ae63cc13955b0f767d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2601876 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71885}
-
Zhi An Ng authored
In AVX, it is better to use the appropriate integer or floating point moves depending on which instructions produce/consume these moves, since there can be a delay moving from integer to floating point domain. On SSE systems, it is less important, and we can move movaps/movups which is 1 byte shorter than movdqa/movdqu. This patch cleans up a couple of places, and defines macro-assembler functions Movdqa, Movdqu, Movapd, to call into movaps/movups when AVX is not supported. Change-Id: Iba6c54e218875f1a70f61792978d7b3f69edfb4b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599843 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71884}
-
- 23 Dec, 2020 1 commit
-
-
Zhi An Ng authored
Detect AVX2 support and use vpbroadcastb or vpbroadcastw. No new assembler helpers required because we are only emitting the VEX-128 versions of these instructions. Bug: v8:11258 Change-Id: Ic50178daa6fc8fe767dfc788e61e67538066bdea Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596582 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71866}
-
- 22 Dec, 2020 3 commits
-
-
Zhi An Ng authored
When a 8x16 shuffle matches a 32x4 shuffle (every group of 4 indices are consecutive), and the first 2 indices are in the range [0-3], and the other 2 indices are in the range [4-7], then we can match it to a shufps. E.g. [0,2,4,6], [1,3,5,7]. These shuffles are commonly used to extract odd/even floats. Change-Id: I031fe44f71a13bbc72115c22b02a5eaaf29d3794 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596579 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71860}
-
Zhi An Ng authored
For pblendw and palignr, if AVX is supported, we can use the 3-operand AVX instruction, this can save us a move. Bug: v8:11270 Change-Id: Ifd837e29c76886a3008bc63c17d4a68bc6aae364 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596578Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71857}
-
Zhi An Ng authored
Optimize: - i32x4.widen_high_i16x8_s - i32x4.widen_high_i16x8_u - i16x8.widen_high_i8x16_s - i16x8.widen_high_i8x16_u These optimizations were suggested in http://b/175364869. The main change is to move away from palignr, which has a dependency on dst, and also the AVX version is 2 bytes longer than the punpckhqdq. For the signed and unsigned variants, we have slightly different optimizations. Unsigned variants can use an punpckh* instruction with a zero-ed scratch register, that effectively zero-extends. Signed variants use the movhlps instruction to move high half to low half of dst, then use packed signed extension instructions. The common fallback for these instructions is to use pshufd, which does not have a dependency on dst, but is 1 byte longer than the punpckh* instructions. FIXED=b/175364869 Change-Id: If28da2aaa8f6e39a58e63b01cc9a81bbbb294606 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591853Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71856}
-
- 21 Dec, 2020 1 commit
-
-
Zhi An Ng authored
Couple of optimizations for v128.bitselect on both ia32 and x64. 1. Remove an extra movaps when AVX is supported, since we have 3-operand instructions 2. Tweak the algorithm from: xor(and(xor(src1, src2), mask) src2) To: or(and(src1, mask), andnot(src2, mask)) It is easier to read and understand, and also eliminate a dependency chain (on kScratchDoubleReg) in the older algorithm. 3. Use integer forms of the logical ops. Older processors have higher throughput on these, compared to the floating point ops. However, the integer forms are 1 byte longer, so on SSE, we stick to the floating point ops. For AVX, this reduces instruction count from 9948 to 9868. Change-Id: Idd5d26b99a76255dbfa63e2c304e6af3760c4ec6 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591859Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71845}
-
- 17 Dec, 2020 3 commits
-
-
Zhi An Ng authored
AVX has 3-operands shuffle/unpack operations. We currently always require that dst == src0 in all cases, which is not required if we have AVX. For the arch shuffles that map to a single native instruction, add support to check for AVX in the instruction-selector, to not require same as first, and in the code-gen to support generating AVX. The other arch shuffles are slightly more complicated, and can be optimized in a future change. Bug: v8:11270 Change-Id: I25b271aeff71fbe860d5bcc8abb17c36bcdab32c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591858Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71820}
-
Zhi An Ng authored
An improvement to generic shuffle improvement (https://crrev.com/c/2152853) required a temporary SIMD register to hold the mask, rather than pushing it onto a stack. The temporary register requires that we UseUniqueRegister on the inputs, to prevent aliasing, as we will write to the temp. However, we only need this for the generic shuffle. We accidentally over-constraint all other pattern matched shuffles, since they don't use any temps. On a ~2000 line function containing ~150 shuffles (not all of which are generic shuffles), we get 16 less instruction in the native code, and actually see a very small improvement in the overall benchmarks. Bug: v8:11270 Change-Id: I09974f7615e4b8f5e2416ed17ca47cc7613fd6b1 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591857Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71818}
-
Zhi An Ng authored
We can have more optimizations for this instruction, they leave some junk in the top lanes of dst, but that doesn't matter: - when lane is 1: we use movshdup, this is 4 bytes long - when lane is 2: use movhlps, this is 3 bytes long - otherwise use shufps (4 bytes) or pshufd (5 bytes) All of which are better than insertps (6 bytes). Change-Id: I0e524431d1832e297e8c8bb418d42382d93fa691 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591850 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71813}
-
- 16 Dec, 2020 4 commits
-
-
Ross McIlroy authored
This is a reland of b2a611d8 Original change's description: > [Turboprop] Move dynamic check maps immediate args to deopt exit. > > Rather than loading the immediate arguments required by the > dynamic check maps builtin into registers in the fast-path, > instead insert them into the instruction stream in the deopt > exit and have the builtin load them into registers itself. > > BUG=v8:10582 > > Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736 > Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Cr-Commit-Position: refs/heads/master@{#71790} TBR=tebbi@chromium.org,gsathya@chromium.org Bug: v8:10582 Change-Id: Ieda0295ee135bff983c67c3f04bb47115f0a2739 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2595311Reviewed-by:
Ross McIlroy <rmcilroy@chromium.org> Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Cr-Commit-Position: refs/heads/master@{#71803}
-
Clemens Backes authored
This reverts commit b2a611d8. Reason for revert: Several failures on https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/3743/overview Original change's description: > [Turboprop] Move dynamic check maps immediate args to deopt exit. > > Rather than loading the immediate arguments required by the > dynamic check maps builtin into registers in the fast-path, > instead insert them into the instruction stream in the deopt > exit and have the builtin load them into registers itself. > > BUG=v8:10582 > > Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736 > Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Cr-Commit-Position: refs/heads/master@{#71790} TBR=rmcilroy@chromium.org,gsathya@chromium.org,tebbi@chromium.org Change-Id: I4c56bee156ffcea8de0aeaff9ac1bf03e03134c9 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:10582 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2595308Reviewed-by:
Clemens Backes <clemensb@chromium.org> Commit-Queue: Clemens Backes <clemensb@chromium.org> Cr-Commit-Position: refs/heads/master@{#71793}
-
Ross McIlroy authored
Rather than loading the immediate arguments required by the dynamic check maps builtin into registers in the fast-path, instead insert them into the instruction stream in the deopt exit and have the builtin load them into registers itself. BUG=v8:10582 Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736 Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by:
Sathya Gunasekaran <gsathya@chromium.org> Reviewed-by:
Tobias Tebbi <tebbi@chromium.org> Cr-Commit-Position: refs/heads/master@{#71790}
-
Zhi An Ng authored
The definition of Shufps is wrong, we are incorrectly passing 0 as the immediate in all cases. No tests broke because we only used Shufps for splats, which has imm8 == 0 anyway. Also, it was using movss, which only moves a single 32-bit. Because we were using it only for f32x4 splat, this ended up being enough (imm8 == 0 meant that we only shuffled the low 32-bit). This is fixed to use movaps, which moves the entire 128-bit register. Also tweak the definition of Shufps to take 4 arguments. `vshufps dst, src1, src2, imm8` shuffles src1 and src2 into dst. `shufps dst, src, imm8`, shuffles dst and src into dst. So `Shufps(dst, src, imm8)` is ambiguous in the AVX case, it could be: 1. vshufps(dst, src, src, imm8), or 2. vshufps(dst, dst, src, imm8) 2. is more likely to be the intended behavior, but it introduces a false dependency on the value of dst. With `Shufps(dst, src1, src2, imm8)`, it is clearer what the behavior should be: 1. shufps(dst, src2, imm8) matches the AVX behavior IFF dst == src1. Change-Id: I60dc4ec868023d28d00f2b09d2c53b82a729bc4d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591849Reviewed-by:
Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71775}
-
- 15 Dec, 2020 1 commit
-
-
Zhi An Ng authored
Code like: x = wasm_v32x4_shuffle(x, x, 1, 2, 3, 0); is currently matched by S8x16Concat, which lowers to two instructions: movapd xmm_dst, xmm_src palignr xmm_dst, xmm_src, 0x4 There is a special case after a S8x16Concat is matched:. - is_swizzle, the inputs are the same - it is a 32x4 shuffle (offset % 4 == 0) Which can have a better codegen: - (dst == src) shufps dst, src, 0b00111001 - (dst != src) pshufd dst, src, 0b00111001 Add a new simd shuffle matcher which will match 32x4 rotate, and construct the appropriate indices referring to the 32x4 elements. pshufd for the given example. However, this matching happens after S8x16Concat, so we get the palignr first. We could move the pattern matching cases around, but it will lead to some cases where where it would have matched a S8x16Concat, but now matches a S32x4shuffle instead, leading to worse codegen. Note: we also pattern match on 32x4Swizzle, which correctly generates Change-Id: Ie3aca53bbc06826be2cf49632de4c24ec73d0a9a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589062Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71754}
-
- 14 Dec, 2020 2 commits
-
-
Zhi An Ng authored
pextrq + movq crosses register files twice, which is not efficient. Optimize this by: - checking if lane 0, do nothing if dst == src (macro-assembler helper) - use vmovhlps on AVX, with src as the operands to avoid false dependency on dst - use movhlps otherwise, this is shorter than shufpd, and faster on older system Change-Id: I3486d87224c048b3229c2f92359b8b8e6d5fd025 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589056 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71751}
-
Zhi An Ng authored
Change the codegen for f32x4.extract_lane from shufps to insertps when AVX is supported. They have the same performance, but shufps has a false dependency on dst (it shuffles dst and src, but we don't care about dst at all). Also for SSE, extractps + movd crosses register files, so change it to use insertps as well. Change-Id: Idf45849d37ac3499bf3371ba2fa6ae05829aa8a7 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589048 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71747}
-
- 10 Dec, 2020 3 commits
-
-
Bill Budge authored
This reverts commit cddaf66c. Reason for revert: Multiple fuzzer failures TBR=neis@chromium.org,ahaas@chromium.org Original change's description: > [compiler][wasm] Align Frame slots to value size > > - Adds an AlignedSlotAllocator class and tests, to unify slot > allocation. This attempts to use alignment holes for smaller > values. > - Reworks Frame to use the new allocator for stack slots. > - Reworks LinkageAllocator to use the new allocator for stack > slots and for ARMv7 FP register aliasing. > - Fixes the RegisterAllocator to align spill slots. > - Fixes InstructionSelector to align spill slots. > > Bug: v8:9198 > > Change-Id: Ida148db428be89ef95de748ec5fc0e7b0358f523 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2512840 > Commit-Queue: Bill Budge <bbudge@chromium.org> > Reviewed-by: Georg Neis <neis@chromium.org> > Reviewed-by: Andreas Haas <ahaas@chromium.org> > Cr-Commit-Position: refs/heads/master@{#71644} TBR=bbudge@chromium.org,neis@chromium.org,ahaas@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: v8:9198 Change-Id: Ib26d016df6f30f333d30b5ac14eed9630bba8252 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2584200 Commit-Queue: Bill Budge <bbudge@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71703}
-
Zhi An Ng authored
Add new macro-assembler instructions that can handle both AVX and SSE. In the SSE case it checks that dst == src1. (This is different from that the AvxHelper does, which passes dst as the first operand to AVX instructions.) Sorted SSSE3_INSTRUCTION_LIST by instruction code. Header additions are added by clangd, we were already using something from those headers via transitive includes, adding them explicitly gets us closer to IWYU. Codegen sequences are from https://github.com/WebAssembly/simd/pull/380 and also https://github.com/WebAssembly/simd/pull/380#issuecomment-707440671. Bug: v8:11086 Change-Id: I4c04f836e471ed8b00f9ff1a1b2e6348a593d4de Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2578797 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by:
Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71688}
-
Zhi An Ng authored
Bug: v8:11008 Change-Id: Ic72e71eb10a5b47c97467bf6d25e55d20425273a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2575784Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71686}
-
- 07 Dec, 2020 1 commit
-
-
Bill Budge authored
- Adds an AlignedSlotAllocator class and tests, to unify slot allocation. This attempts to use alignment holes for smaller values. - Reworks Frame to use the new allocator for stack slots. - Reworks LinkageAllocator to use the new allocator for stack slots and for ARMv7 FP register aliasing. - Fixes the RegisterAllocator to align spill slots. - Fixes InstructionSelector to align spill slots. Bug: v8:9198 Change-Id: Ida148db428be89ef95de748ec5fc0e7b0358f523 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2512840 Commit-Queue: Bill Budge <bbudge@chromium.org> Reviewed-by:
Georg Neis <neis@chromium.org> Reviewed-by:
Andreas Haas <ahaas@chromium.org> Cr-Commit-Position: refs/heads/master@{#71644}
-
- 03 Dec, 2020 1 commit
-
-
Zhi An Ng authored
Movddup can take a memory operand, so we can save a move from gp reg to xmm reg in that case. No problem with unaligned memory since we are loading 64 bits (not 128 bits). Also drive-by comment on i32x4.splat, it uses pshufd, which can also take a memory operand (saving a mov), but we need aligned memory for that first. Bug: v8:9198 Change-Id: I55969888db1debb6ed4d193f767589d0da598386 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2567538Reviewed-by:
Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71580}
-
- 01 Dec, 2020 1 commit
-
-
Bill Budge authored
- Uses linkage location information, to keep in sync with how LinkageAllocator and Frame work to assign stack slots. Bug: v8:9198 Change-Id: I299038e4cff706355263f00603ba32515449fefe Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2556259Reviewed-by:
Maya Lekova <mslekova@chromium.org> Reviewed-by:
Andreas Haas <ahaas@chromium.org> Reviewed-by:
Thibaud Michaud <thibaudm@chromium.org> Commit-Queue: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71532}
-
- 17 Nov, 2020 1 commit
-
-
John Xu authored
Bug: v8:10927 Change-Id: Icbdc0d7329ddd466e7d67a954246a35795b4dece Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2507310 Commit-Queue: Ulan Degenbaev <ulan@chromium.org> Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Reviewed-by:
Michael Lippautz <mlippautz@chromium.org> Reviewed-by:
Clemens Backes <clemensb@chromium.org> Reviewed-by:
Toon Verwaest <verwaest@chromium.org> Reviewed-by:
Ulan Degenbaev <ulan@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#71220}
-
- 12 Nov, 2020 1 commit
-
-
Pierre Langlois authored
FLAG_disable_write_barriers is a constexpr so the V8_LIKELY macro isn't necessary. Interestingly, it can also cause clang to warn that the code is unreachable, whereas without `__builtin_expect()` the compiler doesn't mind. See for example: ``` constexpr bool kNo = false; void warns() { if (__builtin_expect(kNo, 0)) { int a = 42; } } void does_not_warn() { if (kNo) { int a = 42; } } ``` Compiling V8 for arm64 with both `v8_disable_write_barriers = true` and `v8_enable_pointer_compression = false` would trigger this warning. Bug: v8:9533 Change-Id: Id2ae156d60217007bb9ebf50628e8908e0193d05 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2534811Reviewed-by:
Ulan Degenbaev <ulan@chromium.org> Reviewed-by:
Georg Neis <neis@chromium.org> Commit-Queue: Pierre Langlois <pierre.langlois@arm.com> Cr-Commit-Position: refs/heads/master@{#71157}
-
- 05 Nov, 2020 1 commit
-
-
Zhi An Ng authored
Integer splats (especially for sizes < 32-bits) does not directly translate to a single instruction on x64. We can do better for special values, like 0, which can be lowered to `xor dst dst`. We do this check in the instruction selector, and emit a special opcode kX64S128Zero. Also change the xor operation for kX64S128Zero from xorps to pxor. This can help reduce any potential data bypass delay (search for this on agner's microarchitecture manual for more details.). Since integer splats are likely to be followed by integer ops, we should remain in the integer domain, thus use pxor. For i64x2.splat the codegen goes from: xorl rdi,rdi vmovq xmm0,rdi vmovddup xmm0,xmm0 to: vpxor xmm0,xmm0,xmm0 Also add a unittest to verify this optimization, and necessary raw-assembler methods for the test. Bug: v8:11093 Change-Id: I26b092032b6e672f1d5d26e35d79578ebe591cfe Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2516299Reviewed-by:
Tobias Tebbi <tebbi@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70977}
-
- 04 Nov, 2020 2 commits
-
-
Clemens Backes authored
This reverts commit 3c4e434f. Reason for revert: Fails noavx tests: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20-%20debug/34613 Original change's description: > [wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX > > The AVX versions of these instructions can take 3 operands, so we don't > need to force dst == src. > > Bug: v8:9561 > Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337 > Reviewed-by: Clemens Backes <clemensb@chromium.org> > Commit-Queue: Zhi An Ng <zhin@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70958} TBR=clemensb@chromium.org,zhin@chromium.org Change-Id: I5fcdd2e51d418cb32a1b1e2bec7c0dff19f29154 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:9561 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2519558Reviewed-by:
Clemens Backes <clemensb@chromium.org> Commit-Queue: Clemens Backes <clemensb@chromium.org> Cr-Commit-Position: refs/heads/master@{#70961}
-
Zhi An Ng authored
The AVX versions of these instructions can take 3 operands, so we don't need to force dst == src. Bug: v8:9561 Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337Reviewed-by:
Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70958}
-
- 02 Nov, 2020 1 commit
-
-
Zhi An Ng authored
Extract Shufps to handle both AVX and SSE cases, in the SSE case it will copy src to dst if they are not the same. This allows us to use it in Liftoff as well, without the extra copy when AVX is supported. In other places, the usage of Shufps is unnecessary, since they are within a clause checking for non-AVX support, so we can simply use the shufps (non-macro-assembler). Bug: v8:9561 Change-Id: Icb043d7a43397c1b0810ece2666be567f0f5986c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2513866Reviewed-by:
Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70911}
-
- 30 Oct, 2020 1 commit
-
-
Zhi An Ng authored
These operations can be moved into an existing macro list, since they are simple operations that generate only 1 instruction. The benefit is that they have support for AVX 3-operand instruction, and does not have to force dst to be equals to src. Bug: v8:9561 Change-Id: I9ec1d2496d14cb9f0fb3b4854ca39887eb5bf49b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505240Reviewed-by:
Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70893}
-
- 29 Oct, 2020 2 commits
-
-
Zhi An Ng authored
On AVX, many instructions can have 3 operands, unlike SSE which only has 2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that will cause some unnecessary moves. This change moves a bunch of instructions that have single instruction codegen into a macro list which supports the this non-restricted AVX codegen. Bug: v8:9561 Change-Id: I348a8396e8a1129daf2e1ed08ae8526e1bc3a73b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505254Reviewed-by:
Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70888}
-
Zhi An Ng authored
This is a reland of 3fb07882 Original change's description: > [wasm-simd][ia32][x64] Only use registers for shuffles > > Shuffles have pattern matching clauses which, depending on the > instruction used, can require src0 or src1 to be register or not. > However we do not have 16-byte alignment for SIMD operands yet, so it > will segfault when we use an SSE SIMD instruction with unaligned > operands. > > This patch fixes all the shuffle cases to always use a register for the > input nodes, and it does so by ignoring the values of src0_needs_reg and > src1_needs_reg. When we eventually have memory alignment, we can > re-enable this check, without mucking around too much in the logic in > each shuffle match clause. > > Bug: v8:9198 > Change-Id: I264e136f017353019f19954c62c88206f7b90656 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849 > Reviewed-by: Andreas Haas <ahaas@chromium.org> > Reviewed-by: Adam Klein <adamk@chromium.org> > Commit-Queue: Adam Klein <adamk@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70848} Bug: v8:9198 Change-Id: I40c6c8f0cd8908a2d6ab7016d8ed4d4fb2ab4114 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505250Reviewed-by:
Adam Klein <adamk@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70862}
-