- 17 Dec, 2020 3 commits
-
-
Zhi An Ng authored
AVX has 3-operands shuffle/unpack operations. We currently always require that dst == src0 in all cases, which is not required if we have AVX. For the arch shuffles that map to a single native instruction, add support to check for AVX in the instruction-selector, to not require same as first, and in the code-gen to support generating AVX. The other arch shuffles are slightly more complicated, and can be optimized in a future change. Bug: v8:11270 Change-Id: I25b271aeff71fbe860d5bcc8abb17c36bcdab32c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591858Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71820}
-
Zhi An Ng authored
An improvement to generic shuffle improvement (https://crrev.com/c/2152853) required a temporary SIMD register to hold the mask, rather than pushing it onto a stack. The temporary register requires that we UseUniqueRegister on the inputs, to prevent aliasing, as we will write to the temp. However, we only need this for the generic shuffle. We accidentally over-constraint all other pattern matched shuffles, since they don't use any temps. On a ~2000 line function containing ~150 shuffles (not all of which are generic shuffles), we get 16 less instruction in the native code, and actually see a very small improvement in the overall benchmarks. Bug: v8:11270 Change-Id: I09974f7615e4b8f5e2416ed17ca47cc7613fd6b1 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591857Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71818}
-
Zhi An Ng authored
We can have more optimizations for this instruction, they leave some junk in the top lanes of dst, but that doesn't matter: - when lane is 1: we use movshdup, this is 4 bytes long - when lane is 2: use movhlps, this is 3 bytes long - otherwise use shufps (4 bytes) or pshufd (5 bytes) All of which are better than insertps (6 bytes). Change-Id: I0e524431d1832e297e8c8bb418d42382d93fa691 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591850 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71813}
-
- 16 Dec, 2020 4 commits
-
-
Ross McIlroy authored
This is a reland of b2a611d8 Original change's description: > [Turboprop] Move dynamic check maps immediate args to deopt exit. > > Rather than loading the immediate arguments required by the > dynamic check maps builtin into registers in the fast-path, > instead insert them into the instruction stream in the deopt > exit and have the builtin load them into registers itself. > > BUG=v8:10582 > > Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736 > Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Cr-Commit-Position: refs/heads/master@{#71790} TBR=tebbi@chromium.org,gsathya@chromium.org Bug: v8:10582 Change-Id: Ieda0295ee135bff983c67c3f04bb47115f0a2739 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2595311Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Cr-Commit-Position: refs/heads/master@{#71803}
-
Clemens Backes authored
This reverts commit b2a611d8. Reason for revert: Several failures on https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/3743/overview Original change's description: > [Turboprop] Move dynamic check maps immediate args to deopt exit. > > Rather than loading the immediate arguments required by the > dynamic check maps builtin into registers in the fast-path, > instead insert them into the instruction stream in the deopt > exit and have the builtin load them into registers itself. > > BUG=v8:10582 > > Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736 > Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Cr-Commit-Position: refs/heads/master@{#71790} TBR=rmcilroy@chromium.org,gsathya@chromium.org,tebbi@chromium.org Change-Id: I4c56bee156ffcea8de0aeaff9ac1bf03e03134c9 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:10582 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2595308Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Clemens Backes <clemensb@chromium.org> Cr-Commit-Position: refs/heads/master@{#71793}
-
Ross McIlroy authored
Rather than loading the immediate arguments required by the dynamic check maps builtin into registers in the fast-path, instead insert them into the instruction stream in the deopt exit and have the builtin load them into registers itself. BUG=v8:10582 Change-Id: I66716570b408501374eed8f5e6432df64c6deb7c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589736 Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Cr-Commit-Position: refs/heads/master@{#71790}
-
Zhi An Ng authored
The definition of Shufps is wrong, we are incorrectly passing 0 as the immediate in all cases. No tests broke because we only used Shufps for splats, which has imm8 == 0 anyway. Also, it was using movss, which only moves a single 32-bit. Because we were using it only for f32x4 splat, this ended up being enough (imm8 == 0 meant that we only shuffled the low 32-bit). This is fixed to use movaps, which moves the entire 128-bit register. Also tweak the definition of Shufps to take 4 arguments. `vshufps dst, src1, src2, imm8` shuffles src1 and src2 into dst. `shufps dst, src, imm8`, shuffles dst and src into dst. So `Shufps(dst, src, imm8)` is ambiguous in the AVX case, it could be: 1. vshufps(dst, src, src, imm8), or 2. vshufps(dst, dst, src, imm8) 2. is more likely to be the intended behavior, but it introduces a false dependency on the value of dst. With `Shufps(dst, src1, src2, imm8)`, it is clearer what the behavior should be: 1. shufps(dst, src2, imm8) matches the AVX behavior IFF dst == src1. Change-Id: I60dc4ec868023d28d00f2b09d2c53b82a729bc4d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591849Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71775}
-
- 15 Dec, 2020 1 commit
-
-
Zhi An Ng authored
Code like: x = wasm_v32x4_shuffle(x, x, 1, 2, 3, 0); is currently matched by S8x16Concat, which lowers to two instructions: movapd xmm_dst, xmm_src palignr xmm_dst, xmm_src, 0x4 There is a special case after a S8x16Concat is matched:. - is_swizzle, the inputs are the same - it is a 32x4 shuffle (offset % 4 == 0) Which can have a better codegen: - (dst == src) shufps dst, src, 0b00111001 - (dst != src) pshufd dst, src, 0b00111001 Add a new simd shuffle matcher which will match 32x4 rotate, and construct the appropriate indices referring to the 32x4 elements. pshufd for the given example. However, this matching happens after S8x16Concat, so we get the palignr first. We could move the pattern matching cases around, but it will lead to some cases where where it would have matched a S8x16Concat, but now matches a S32x4shuffle instead, leading to worse codegen. Note: we also pattern match on 32x4Swizzle, which correctly generates Change-Id: Ie3aca53bbc06826be2cf49632de4c24ec73d0a9a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589062Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71754}
-
- 14 Dec, 2020 2 commits
-
-
Zhi An Ng authored
pextrq + movq crosses register files twice, which is not efficient. Optimize this by: - checking if lane 0, do nothing if dst == src (macro-assembler helper) - use vmovhlps on AVX, with src as the operands to avoid false dependency on dst - use movhlps otherwise, this is shorter than shufpd, and faster on older system Change-Id: I3486d87224c048b3229c2f92359b8b8e6d5fd025 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589056 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71751}
-
Zhi An Ng authored
Change the codegen for f32x4.extract_lane from shufps to insertps when AVX is supported. They have the same performance, but shufps has a false dependency on dst (it shuffles dst and src, but we don't care about dst at all). Also for SSE, extractps + movd crosses register files, so change it to use insertps as well. Change-Id: Idf45849d37ac3499bf3371ba2fa6ae05829aa8a7 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589048 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71747}
-
- 10 Dec, 2020 3 commits
-
-
Bill Budge authored
This reverts commit cddaf66c. Reason for revert: Multiple fuzzer failures TBR=neis@chromium.org,ahaas@chromium.org Original change's description: > [compiler][wasm] Align Frame slots to value size > > - Adds an AlignedSlotAllocator class and tests, to unify slot > allocation. This attempts to use alignment holes for smaller > values. > - Reworks Frame to use the new allocator for stack slots. > - Reworks LinkageAllocator to use the new allocator for stack > slots and for ARMv7 FP register aliasing. > - Fixes the RegisterAllocator to align spill slots. > - Fixes InstructionSelector to align spill slots. > > Bug: v8:9198 > > Change-Id: Ida148db428be89ef95de748ec5fc0e7b0358f523 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2512840 > Commit-Queue: Bill Budge <bbudge@chromium.org> > Reviewed-by: Georg Neis <neis@chromium.org> > Reviewed-by: Andreas Haas <ahaas@chromium.org> > Cr-Commit-Position: refs/heads/master@{#71644} TBR=bbudge@chromium.org,neis@chromium.org,ahaas@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: v8:9198 Change-Id: Ib26d016df6f30f333d30b5ac14eed9630bba8252 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2584200 Commit-Queue: Bill Budge <bbudge@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71703}
-
Zhi An Ng authored
Add new macro-assembler instructions that can handle both AVX and SSE. In the SSE case it checks that dst == src1. (This is different from that the AvxHelper does, which passes dst as the first operand to AVX instructions.) Sorted SSSE3_INSTRUCTION_LIST by instruction code. Header additions are added by clangd, we were already using something from those headers via transitive includes, adding them explicitly gets us closer to IWYU. Codegen sequences are from https://github.com/WebAssembly/simd/pull/380 and also https://github.com/WebAssembly/simd/pull/380#issuecomment-707440671. Bug: v8:11086 Change-Id: I4c04f836e471ed8b00f9ff1a1b2e6348a593d4de Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2578797 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71688}
-
Zhi An Ng authored
Bug: v8:11008 Change-Id: Ic72e71eb10a5b47c97467bf6d25e55d20425273a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2575784Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71686}
-
- 07 Dec, 2020 1 commit
-
-
Bill Budge authored
- Adds an AlignedSlotAllocator class and tests, to unify slot allocation. This attempts to use alignment holes for smaller values. - Reworks Frame to use the new allocator for stack slots. - Reworks LinkageAllocator to use the new allocator for stack slots and for ARMv7 FP register aliasing. - Fixes the RegisterAllocator to align spill slots. - Fixes InstructionSelector to align spill slots. Bug: v8:9198 Change-Id: Ida148db428be89ef95de748ec5fc0e7b0358f523 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2512840 Commit-Queue: Bill Budge <bbudge@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Reviewed-by: Andreas Haas <ahaas@chromium.org> Cr-Commit-Position: refs/heads/master@{#71644}
-
- 03 Dec, 2020 1 commit
-
-
Zhi An Ng authored
Movddup can take a memory operand, so we can save a move from gp reg to xmm reg in that case. No problem with unaligned memory since we are loading 64 bits (not 128 bits). Also drive-by comment on i32x4.splat, it uses pshufd, which can also take a memory operand (saving a mov), but we need aligned memory for that first. Bug: v8:9198 Change-Id: I55969888db1debb6ed4d193f767589d0da598386 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2567538Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71580}
-
- 01 Dec, 2020 1 commit
-
-
Bill Budge authored
- Uses linkage location information, to keep in sync with how LinkageAllocator and Frame work to assign stack slots. Bug: v8:9198 Change-Id: I299038e4cff706355263f00603ba32515449fefe Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2556259Reviewed-by: Maya Lekova <mslekova@chromium.org> Reviewed-by: Andreas Haas <ahaas@chromium.org> Reviewed-by: Thibaud Michaud <thibaudm@chromium.org> Commit-Queue: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#71532}
-
- 17 Nov, 2020 1 commit
-
-
John Xu authored
Bug: v8:10927 Change-Id: Icbdc0d7329ddd466e7d67a954246a35795b4dece Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2507310 Commit-Queue: Ulan Degenbaev <ulan@chromium.org> Reviewed-by: Peter Marshall <petermarshall@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Clemens Backes <clemensb@chromium.org> Reviewed-by: Toon Verwaest <verwaest@chromium.org> Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Reviewed-by: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#71220}
-
- 12 Nov, 2020 1 commit
-
-
Pierre Langlois authored
FLAG_disable_write_barriers is a constexpr so the V8_LIKELY macro isn't necessary. Interestingly, it can also cause clang to warn that the code is unreachable, whereas without `__builtin_expect()` the compiler doesn't mind. See for example: ``` constexpr bool kNo = false; void warns() { if (__builtin_expect(kNo, 0)) { int a = 42; } } void does_not_warn() { if (kNo) { int a = 42; } } ``` Compiling V8 for arm64 with both `v8_disable_write_barriers = true` and `v8_enable_pointer_compression = false` would trigger this warning. Bug: v8:9533 Change-Id: Id2ae156d60217007bb9ebf50628e8908e0193d05 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2534811Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Commit-Queue: Pierre Langlois <pierre.langlois@arm.com> Cr-Commit-Position: refs/heads/master@{#71157}
-
- 05 Nov, 2020 1 commit
-
-
Zhi An Ng authored
Integer splats (especially for sizes < 32-bits) does not directly translate to a single instruction on x64. We can do better for special values, like 0, which can be lowered to `xor dst dst`. We do this check in the instruction selector, and emit a special opcode kX64S128Zero. Also change the xor operation for kX64S128Zero from xorps to pxor. This can help reduce any potential data bypass delay (search for this on agner's microarchitecture manual for more details.). Since integer splats are likely to be followed by integer ops, we should remain in the integer domain, thus use pxor. For i64x2.splat the codegen goes from: xorl rdi,rdi vmovq xmm0,rdi vmovddup xmm0,xmm0 to: vpxor xmm0,xmm0,xmm0 Also add a unittest to verify this optimization, and necessary raw-assembler methods for the test. Bug: v8:11093 Change-Id: I26b092032b6e672f1d5d26e35d79578ebe591cfe Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2516299Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70977}
-
- 04 Nov, 2020 2 commits
-
-
Clemens Backes authored
This reverts commit 3c4e434f. Reason for revert: Fails noavx tests: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20-%20debug/34613 Original change's description: > [wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX > > The AVX versions of these instructions can take 3 operands, so we don't > need to force dst == src. > > Bug: v8:9561 > Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337 > Reviewed-by: Clemens Backes <clemensb@chromium.org> > Commit-Queue: Zhi An Ng <zhin@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70958} TBR=clemensb@chromium.org,zhin@chromium.org Change-Id: I5fcdd2e51d418cb32a1b1e2bec7c0dff19f29154 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:9561 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2519558Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Clemens Backes <clemensb@chromium.org> Cr-Commit-Position: refs/heads/master@{#70961}
-
Zhi An Ng authored
The AVX versions of these instructions can take 3 operands, so we don't need to force dst == src. Bug: v8:9561 Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70958}
-
- 02 Nov, 2020 1 commit
-
-
Zhi An Ng authored
Extract Shufps to handle both AVX and SSE cases, in the SSE case it will copy src to dst if they are not the same. This allows us to use it in Liftoff as well, without the extra copy when AVX is supported. In other places, the usage of Shufps is unnecessary, since they are within a clause checking for non-AVX support, so we can simply use the shufps (non-macro-assembler). Bug: v8:9561 Change-Id: Icb043d7a43397c1b0810ece2666be567f0f5986c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2513866Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70911}
-
- 30 Oct, 2020 1 commit
-
-
Zhi An Ng authored
These operations can be moved into an existing macro list, since they are simple operations that generate only 1 instruction. The benefit is that they have support for AVX 3-operand instruction, and does not have to force dst to be equals to src. Bug: v8:9561 Change-Id: I9ec1d2496d14cb9f0fb3b4854ca39887eb5bf49b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505240Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70893}
-
- 29 Oct, 2020 2 commits
-
-
Zhi An Ng authored
On AVX, many instructions can have 3 operands, unlike SSE which only has 2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that will cause some unnecessary moves. This change moves a bunch of instructions that have single instruction codegen into a macro list which supports the this non-restricted AVX codegen. Bug: v8:9561 Change-Id: I348a8396e8a1129daf2e1ed08ae8526e1bc3a73b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505254Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70888}
-
Zhi An Ng authored
This is a reland of 3fb07882 Original change's description: > [wasm-simd][ia32][x64] Only use registers for shuffles > > Shuffles have pattern matching clauses which, depending on the > instruction used, can require src0 or src1 to be register or not. > However we do not have 16-byte alignment for SIMD operands yet, so it > will segfault when we use an SSE SIMD instruction with unaligned > operands. > > This patch fixes all the shuffle cases to always use a register for the > input nodes, and it does so by ignoring the values of src0_needs_reg and > src1_needs_reg. When we eventually have memory alignment, we can > re-enable this check, without mucking around too much in the logic in > each shuffle match clause. > > Bug: v8:9198 > Change-Id: I264e136f017353019f19954c62c88206f7b90656 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849 > Reviewed-by: Andreas Haas <ahaas@chromium.org> > Reviewed-by: Adam Klein <adamk@chromium.org> > Commit-Queue: Adam Klein <adamk@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70848} Bug: v8:9198 Change-Id: I40c6c8f0cd8908a2d6ab7016d8ed4d4fb2ab4114 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505250Reviewed-by: Adam Klein <adamk@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70862}
-
- 28 Oct, 2020 4 commits
-
-
Francis McCabe authored
This reverts commit 3fb07882. Reason for revert: failing noavx tests: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux/39390? Original change's description: > [wasm-simd][ia32][x64] Only use registers for shuffles > > Shuffles have pattern matching clauses which, depending on the > instruction used, can require src0 or src1 to be register or not. > However we do not have 16-byte alignment for SIMD operands yet, so it > will segfault when we use an SSE SIMD instruction with unaligned > operands. > > This patch fixes all the shuffle cases to always use a register for the > input nodes, and it does so by ignoring the values of src0_needs_reg and > src1_needs_reg. When we eventually have memory alignment, we can > re-enable this check, without mucking around too much in the logic in > each shuffle match clause. > > Bug: v8:9198 > Change-Id: I264e136f017353019f19954c62c88206f7b90656 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849 > Reviewed-by: Andreas Haas <ahaas@chromium.org> > Reviewed-by: Adam Klein <adamk@chromium.org> > Commit-Queue: Adam Klein <adamk@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70848} TBR=adamk@chromium.org,ahaas@chromium.org,zhin@chromium.org Change-Id: Icc7cc1ceb7ca5aa5d859239330743dde2e5f213c No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:9198 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505719Reviewed-by: Francis McCabe <fgm@chromium.org> Commit-Queue: Francis McCabe <fgm@chromium.org> Cr-Commit-Position: refs/heads/master@{#70852}
-
Shu-yu Guo authored
Change-Id: I4ab54dac771bb551c2435a98f9e53194a6f27853 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2495494 Commit-Queue: Shu-yu Guo <syg@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Cr-Commit-Position: refs/heads/master@{#70851}
-
Zhi An Ng authored
Shuffles have pattern matching clauses which, depending on the instruction used, can require src0 or src1 to be register or not. However we do not have 16-byte alignment for SIMD operands yet, so it will segfault when we use an SSE SIMD instruction with unaligned operands. This patch fixes all the shuffle cases to always use a register for the input nodes, and it does so by ignoring the values of src0_needs_reg and src1_needs_reg. When we eventually have memory alignment, we can re-enable this check, without mucking around too much in the logic in each shuffle match clause. Bug: v8:9198 Change-Id: I264e136f017353019f19954c62c88206f7b90656 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849Reviewed-by: Andreas Haas <ahaas@chromium.org> Reviewed-by: Adam Klein <adamk@chromium.org> Commit-Queue: Adam Klein <adamk@chromium.org> Cr-Commit-Position: refs/heads/master@{#70848}
-
Zhi An Ng authored
Prototype i8x16, i16x8, i32x4, i64x2 sign select on x64 and interpreter. Bug: v8:10983 Change-Id: I7d6f39a2cb4c2aefe31daac782978fe8b363dd1a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486235 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#70818}
-
- 27 Oct, 2020 1 commit
-
-
Ng Zhi An authored
i8x16.extract_lane_u is pextrb, and i16x8.extract_lane_u is pextrw, we can merge them instead of having separate opcodes. R=bbudge@chromium.org Bug: v8:10975 Change-Id: I7793a795905157b6094b1470d3437988c982af91 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481834Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70771}
-
- 22 Oct, 2020 1 commit
-
-
Georg Neis authored
This reverts half of commit 8f0ab471. Reason for revert: some performance regressions, possibly due to 'leave' needing MSROM on some microarchitectures. The half that is not reverted is the removal of 'enter'. Original change's description: > [ia32,x64] Make more use of the 'leave' instruction > > It is a little shorter and cheaper[1] than the equivalent > "mov sp,bp; pop bp". > > Also remove support for the 'enter' instruction, since > - it is unused, > - it is neither shorter nor cheaper than the corresponding > push and mov (in fact more expensive[1]), and > - our disassembler doesn't support it. > > [1] See https://www.agner.org/optimize/instruction_tables.pdf > > Change-Id: I6c99c2f3e53081aea55445a54e18eaf45baa79c2 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2482822 > Commit-Queue: Georg Neis <neis@chromium.org> > Reviewed-by: Victor Gomes <victorgomes@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70660} TBR=neis@chromium.org,victorgomes@chromium.org Bug: chromium:1141069 # Not skipping CQ checks because original CL landed > 1 day ago. Change-Id: I5c9ad64ee06b71c93eff256044ce49d1523737fb Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2492327 Commit-Queue: Georg Neis <neis@chromium.org> Reviewed-by: Victor Gomes <victorgomes@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Cr-Commit-Position: refs/heads/master@{#70718}
-
- 21 Oct, 2020 1 commit
-
-
Jakob Gruber authored
This is a reland of fbfa9bf4 The arm64 was missing proper codegen for CFI, thus sizes were off. Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced7 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70672}
-
- 20 Oct, 2020 4 commits
-
-
Georg Neis authored
It is a little shorter and cheaper[1] than the equivalent "mov sp,bp; pop bp". Also remove support for the 'enter' instruction, since - it is unused, - it is neither shorter nor cheaper than the corresponding push and mov (in fact more expensive[1]), and - our disassembler doesn't support it. [1] See https://www.agner.org/optimize/instruction_tables.pdf Change-Id: I6c99c2f3e53081aea55445a54e18eaf45baa79c2 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2482822 Commit-Queue: Georg Neis <neis@chromium.org> Reviewed-by: Victor Gomes <victorgomes@chromium.org> Cr-Commit-Position: refs/heads/master@{#70660}
-
Maya Lekova authored
This reverts commit fbfa9bf4. Reason for revert: Seems to break arm64 sim CFI build (please see DeoptExitSizeIfFixed) - https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/2808 Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced7 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org Change-Id: I4739a3475bfd8ee0cfbe4b9a20382f91a6ef1bf0 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485223Reviewed-by: Maya Lekova <mslekova@chromium.org> Commit-Queue: Maya Lekova <mslekova@chromium.org> Cr-Commit-Position: refs/heads/master@{#70658}
-
Jakob Gruber authored
This is a reland of 7f58ced7 It fixes the different exit size emitted on x64/Atom CPUs due to performance tuning in TurboAssembler::Call. Additionally, add cctests to verify the fixed size exits. Original change's description: > [deoptimizer] Change deopt entries into builtins > > While the overall goal of this commit is to change deoptimization > entries into builtins, there are multiple related things happening: > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > at runtime, guaranteed to be immovable), have been converted into > builtins. The major restriction is that we now need to preserve the > kRootRegister, which was formerly used on most architectures to pass > the deoptimization id. The solution differs based on platform. > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > - Removed heap/ support for immovable Code generation. > - Removed the DeserializerData class (no longer needed). > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > in which the final jump to the deoptimization entry is generated > once per Code object, and deopt exits can continue to emit a > near-call. > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > sizes by 4/8, 5, and 5 bytes, respectively. > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > by using the same strategy as on arm64 (recalc deopt id from return > address). Before: > > e300a002 movw r10, <id> > e59fc024 ldr ip, [pc, <entry offset>] > e12fff3c blx ip > > After: > > e59acb35 ldr ip, [r10, <entry offset>] > e12fff3c blx ip > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > object (max 32 bytes added overhead per Code object). Before: > > 9401cdae bl <entry offset> > > After: > > # eager deoptimization entry jump. > f95b1f50 ldr x16, [x26, <eager entry offset>] > d61f0200 br x16 > # lazy deoptimization entry jump. > f95b2b50 ldr x16, [x26, <lazy entry offset>] > d61f0200 br x16 > # the deopt exit. > 97fffffc bl <eager deoptimization entry jump offset> > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > bb00000000 mov ebx,<id> > e825f5372b call <entry> > > After: > > e8ea2256ba call <entry> > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > 49c7c511000000 REX.W movq r13,<id> > e8ea2f0700 call <entry> > > After: > > 41ff9560360000 call [r13+<entry offset>] > > Bug: v8:8661,v8:8768 > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70597} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661,v8:8768,chromium:1140165 Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506Reviewed-by: Jakob Gruber <jgruber@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70655}
-
Jakob Gruber authored
This reverts commit 7f58ced7. Reason for revert: Segfaults on Atom_x64 https://ci.chromium.org/p/v8-internal/builders/ci/v8_linux64_atom_perf/5686? Original change's description: > [deoptimizer] Change deopt entries into builtins > > While the overall goal of this commit is to change deoptimization > entries into builtins, there are multiple related things happening: > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > at runtime, guaranteed to be immovable), have been converted into > builtins. The major restriction is that we now need to preserve the > kRootRegister, which was formerly used on most architectures to pass > the deoptimization id. The solution differs based on platform. > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > - Removed heap/ support for immovable Code generation. > - Removed the DeserializerData class (no longer needed). > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > in which the final jump to the deoptimization entry is generated > once per Code object, and deopt exits can continue to emit a > near-call. > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > sizes by 4/8, 5, and 5 bytes, respectively. > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > by using the same strategy as on arm64 (recalc deopt id from return > address). Before: > > e300a002 movw r10, <id> > e59fc024 ldr ip, [pc, <entry offset>] > e12fff3c blx ip > > After: > > e59acb35 ldr ip, [r10, <entry offset>] > e12fff3c blx ip > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > object (max 32 bytes added overhead per Code object). Before: > > 9401cdae bl <entry offset> > > After: > > # eager deoptimization entry jump. > f95b1f50 ldr x16, [x26, <eager entry offset>] > d61f0200 br x16 > # lazy deoptimization entry jump. > f95b2b50 ldr x16, [x26, <lazy entry offset>] > d61f0200 br x16 > # the deopt exit. > 97fffffc bl <eager deoptimization entry jump offset> > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > bb00000000 mov ebx,<id> > e825f5372b call <entry> > > After: > > e8ea2256ba call <entry> > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > 49c7c511000000 REX.W movq r13,<id> > e8ea2f0700 call <entry> > > After: > > 41ff9560360000 call [r13+<entry offset>] > > Bug: v8:8661,v8:8768 > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70597} TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: v8:8661,v8:8768,chromium:1140165 Change-Id: I3df02ab42f6e02233d9f6fb80e8bb18f76870d91 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485504Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70649}
-
- 19 Oct, 2020 4 commits
-
-
Ng Zhi An authored
All these opcodes have a simple lowering into a single x64 instruction. We can perform a similar optimization when AVX is supported to not force dst == src1. Bug: v8:10116 Change-Id: I4ad2975b6f241d8209025682202b476c08b3491b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486383Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70636}
-
Ng Zhi An authored
We don't need separate Load32Zero and Load64Zero instructions, since the implementation is movss and movsd, which we already have. Bug: v8:10713 Change-Id: I5d02e946f3bf9fe08f943a811f2d3cc8aec81ea8 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486233Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70635}
-
Ng Zhi An authored
Not sure why I originally chose to name it LoadMem32Zero instead of Load32Zero like the proposal. This fixes it. Bug: v8:10713 Change-Id: If05603f743213bc6b7aea0ce22c80ae4b3023ccf Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481824Reviewed-by: Bill Budge <bbudge@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70630}
-
Ng Zhi An authored
For splats, we can make use of vshufps to avoid a movss. Without AVX, specific dst to be same as src in the instruction selector. For extract lane, we can use vshufps to extract a float into a dst xmm, and leave junk in the higher bits. On the meshopt_decoder.js benchmark in linked bug, it removes about 7 movss instructions that did nothing. Hardware can do register renaming, but let's not rely on that :) R=bbudge@chromium.org Bug: v8:10116 Change-Id: I4d68c10536a79659de673060d537d58113308477 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481473 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#70628}
-