- 17 Nov, 2020 1 commit
-
-
John Xu authored
Bug: v8:10927 Change-Id: Icbdc0d7329ddd466e7d67a954246a35795b4dece Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2507310 Commit-Queue: Ulan Degenbaev <ulan@chromium.org> Reviewed-by: Peter Marshall <petermarshall@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Clemens Backes <clemensb@chromium.org> Reviewed-by: Toon Verwaest <verwaest@chromium.org> Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Reviewed-by: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#71220}
-
- 12 Nov, 2020 1 commit
-
-
Pierre Langlois authored
FLAG_disable_write_barriers is a constexpr so the V8_LIKELY macro isn't necessary. Interestingly, it can also cause clang to warn that the code is unreachable, whereas without `__builtin_expect()` the compiler doesn't mind. See for example: ``` constexpr bool kNo = false; void warns() { if (__builtin_expect(kNo, 0)) { int a = 42; } } void does_not_warn() { if (kNo) { int a = 42; } } ``` Compiling V8 for arm64 with both `v8_disable_write_barriers = true` and `v8_enable_pointer_compression = false` would trigger this warning. Bug: v8:9533 Change-Id: Id2ae156d60217007bb9ebf50628e8908e0193d05 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2534811Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Commit-Queue: Pierre Langlois <pierre.langlois@arm.com> Cr-Commit-Position: refs/heads/master@{#71157}
-
- 05 Nov, 2020 1 commit
-
-
Zhi An Ng authored
Integer splats (especially for sizes < 32-bits) does not directly translate to a single instruction on x64. We can do better for special values, like 0, which can be lowered to `xor dst dst`. We do this check in the instruction selector, and emit a special opcode kX64S128Zero. Also change the xor operation for kX64S128Zero from xorps to pxor. This can help reduce any potential data bypass delay (search for this on agner's microarchitecture manual for more details.). Since integer splats are likely to be followed by integer ops, we should remain in the integer domain, thus use pxor. For i64x2.splat the codegen goes from: xorl rdi,rdi vmovq xmm0,rdi vmovddup xmm0,xmm0 to: vpxor xmm0,xmm0,xmm0 Also add a unittest to verify this optimization, and necessary raw-assembler methods for the test. Bug: v8:11093 Change-Id: I26b092032b6e672f1d5d26e35d79578ebe591cfe Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2516299Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70977}
-
- 04 Nov, 2020 2 commits
-
-
Clemens Backes authored
This reverts commit 3c4e434f. Reason for revert: Fails noavx tests: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20-%20debug/34613 Original change's description: > [wasm-simd][x64] Optimize pmin/pmax and add horiz for AVX > > The AVX versions of these instructions can take 3 operands, so we don't > need to force dst == src. > > Bug: v8:9561 > Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337 > Reviewed-by: Clemens Backes <clemensb@chromium.org> > Commit-Queue: Zhi An Ng <zhin@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70958} TBR=clemensb@chromium.org,zhin@chromium.org Change-Id: I5fcdd2e51d418cb32a1b1e2bec7c0dff19f29154 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:9561 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2519558Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Clemens Backes <clemensb@chromium.org> Cr-Commit-Position: refs/heads/master@{#70961}
-
Zhi An Ng authored
The AVX versions of these instructions can take 3 operands, so we don't need to force dst == src. Bug: v8:9561 Change-Id: If346a05f7d599bf0d636263cafc3bc823c3b8452 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2515337Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70958}
-
- 02 Nov, 2020 1 commit
-
-
Zhi An Ng authored
Extract Shufps to handle both AVX and SSE cases, in the SSE case it will copy src to dst if they are not the same. This allows us to use it in Liftoff as well, without the extra copy when AVX is supported. In other places, the usage of Shufps is unnecessary, since they are within a clause checking for non-AVX support, so we can simply use the shufps (non-macro-assembler). Bug: v8:9561 Change-Id: Icb043d7a43397c1b0810ece2666be567f0f5986c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2513866Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70911}
-
- 30 Oct, 2020 1 commit
-
-
Zhi An Ng authored
These operations can be moved into an existing macro list, since they are simple operations that generate only 1 instruction. The benefit is that they have support for AVX 3-operand instruction, and does not have to force dst to be equals to src. Bug: v8:9561 Change-Id: I9ec1d2496d14cb9f0fb3b4854ca39887eb5bf49b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505240Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70893}
-
- 29 Oct, 2020 2 commits
-
-
Zhi An Ng authored
On AVX, many instructions can have 3 operands, unlike SSE which only has 2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that will cause some unnecessary moves. This change moves a bunch of instructions that have single instruction codegen into a macro list which supports the this non-restricted AVX codegen. Bug: v8:9561 Change-Id: I348a8396e8a1129daf2e1ed08ae8526e1bc3a73b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505254Reviewed-by: Clemens Backes <clemensb@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70888}
-
Zhi An Ng authored
This is a reland of 3fb07882 Original change's description: > [wasm-simd][ia32][x64] Only use registers for shuffles > > Shuffles have pattern matching clauses which, depending on the > instruction used, can require src0 or src1 to be register or not. > However we do not have 16-byte alignment for SIMD operands yet, so it > will segfault when we use an SSE SIMD instruction with unaligned > operands. > > This patch fixes all the shuffle cases to always use a register for the > input nodes, and it does so by ignoring the values of src0_needs_reg and > src1_needs_reg. When we eventually have memory alignment, we can > re-enable this check, without mucking around too much in the logic in > each shuffle match clause. > > Bug: v8:9198 > Change-Id: I264e136f017353019f19954c62c88206f7b90656 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849 > Reviewed-by: Andreas Haas <ahaas@chromium.org> > Reviewed-by: Adam Klein <adamk@chromium.org> > Commit-Queue: Adam Klein <adamk@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70848} Bug: v8:9198 Change-Id: I40c6c8f0cd8908a2d6ab7016d8ed4d4fb2ab4114 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505250Reviewed-by: Adam Klein <adamk@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70862}
-
- 28 Oct, 2020 4 commits
-
-
Francis McCabe authored
This reverts commit 3fb07882. Reason for revert: failing noavx tests: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux/39390? Original change's description: > [wasm-simd][ia32][x64] Only use registers for shuffles > > Shuffles have pattern matching clauses which, depending on the > instruction used, can require src0 or src1 to be register or not. > However we do not have 16-byte alignment for SIMD operands yet, so it > will segfault when we use an SSE SIMD instruction with unaligned > operands. > > This patch fixes all the shuffle cases to always use a register for the > input nodes, and it does so by ignoring the values of src0_needs_reg and > src1_needs_reg. When we eventually have memory alignment, we can > re-enable this check, without mucking around too much in the logic in > each shuffle match clause. > > Bug: v8:9198 > Change-Id: I264e136f017353019f19954c62c88206f7b90656 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849 > Reviewed-by: Andreas Haas <ahaas@chromium.org> > Reviewed-by: Adam Klein <adamk@chromium.org> > Commit-Queue: Adam Klein <adamk@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70848} TBR=adamk@chromium.org,ahaas@chromium.org,zhin@chromium.org Change-Id: Icc7cc1ceb7ca5aa5d859239330743dde2e5f213c No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:9198 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2505719Reviewed-by: Francis McCabe <fgm@chromium.org> Commit-Queue: Francis McCabe <fgm@chromium.org> Cr-Commit-Position: refs/heads/master@{#70852}
-
Shu-yu Guo authored
Change-Id: I4ab54dac771bb551c2435a98f9e53194a6f27853 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2495494 Commit-Queue: Shu-yu Guo <syg@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Cr-Commit-Position: refs/heads/master@{#70851}
-
Zhi An Ng authored
Shuffles have pattern matching clauses which, depending on the instruction used, can require src0 or src1 to be register or not. However we do not have 16-byte alignment for SIMD operands yet, so it will segfault when we use an SSE SIMD instruction with unaligned operands. This patch fixes all the shuffle cases to always use a register for the input nodes, and it does so by ignoring the values of src0_needs_reg and src1_needs_reg. When we eventually have memory alignment, we can re-enable this check, without mucking around too much in the logic in each shuffle match clause. Bug: v8:9198 Change-Id: I264e136f017353019f19954c62c88206f7b90656 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2504849Reviewed-by: Andreas Haas <ahaas@chromium.org> Reviewed-by: Adam Klein <adamk@chromium.org> Commit-Queue: Adam Klein <adamk@chromium.org> Cr-Commit-Position: refs/heads/master@{#70848}
-
Zhi An Ng authored
Prototype i8x16, i16x8, i32x4, i64x2 sign select on x64 and interpreter. Bug: v8:10983 Change-Id: I7d6f39a2cb4c2aefe31daac782978fe8b363dd1a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486235 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#70818}
-
- 27 Oct, 2020 1 commit
-
-
Ng Zhi An authored
i8x16.extract_lane_u is pextrb, and i16x8.extract_lane_u is pextrw, we can merge them instead of having separate opcodes. R=bbudge@chromium.org Bug: v8:10975 Change-Id: I7793a795905157b6094b1470d3437988c982af91 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481834Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70771}
-
- 22 Oct, 2020 1 commit
-
-
Georg Neis authored
This reverts half of commit 8f0ab471. Reason for revert: some performance regressions, possibly due to 'leave' needing MSROM on some microarchitectures. The half that is not reverted is the removal of 'enter'. Original change's description: > [ia32,x64] Make more use of the 'leave' instruction > > It is a little shorter and cheaper[1] than the equivalent > "mov sp,bp; pop bp". > > Also remove support for the 'enter' instruction, since > - it is unused, > - it is neither shorter nor cheaper than the corresponding > push and mov (in fact more expensive[1]), and > - our disassembler doesn't support it. > > [1] See https://www.agner.org/optimize/instruction_tables.pdf > > Change-Id: I6c99c2f3e53081aea55445a54e18eaf45baa79c2 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2482822 > Commit-Queue: Georg Neis <neis@chromium.org> > Reviewed-by: Victor Gomes <victorgomes@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70660} TBR=neis@chromium.org,victorgomes@chromium.org Bug: chromium:1141069 # Not skipping CQ checks because original CL landed > 1 day ago. Change-Id: I5c9ad64ee06b71c93eff256044ce49d1523737fb Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2492327 Commit-Queue: Georg Neis <neis@chromium.org> Reviewed-by: Victor Gomes <victorgomes@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Cr-Commit-Position: refs/heads/master@{#70718}
-
- 21 Oct, 2020 1 commit
-
-
Jakob Gruber authored
This is a reland of fbfa9bf4 The arm64 was missing proper codegen for CFI, thus sizes were off. Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced7 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70672}
-
- 20 Oct, 2020 4 commits
-
-
Georg Neis authored
It is a little shorter and cheaper[1] than the equivalent "mov sp,bp; pop bp". Also remove support for the 'enter' instruction, since - it is unused, - it is neither shorter nor cheaper than the corresponding push and mov (in fact more expensive[1]), and - our disassembler doesn't support it. [1] See https://www.agner.org/optimize/instruction_tables.pdf Change-Id: I6c99c2f3e53081aea55445a54e18eaf45baa79c2 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2482822 Commit-Queue: Georg Neis <neis@chromium.org> Reviewed-by: Victor Gomes <victorgomes@chromium.org> Cr-Commit-Position: refs/heads/master@{#70660}
-
Maya Lekova authored
This reverts commit fbfa9bf4. Reason for revert: Seems to break arm64 sim CFI build (please see DeoptExitSizeIfFixed) - https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/2808 Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced7 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org Change-Id: I4739a3475bfd8ee0cfbe4b9a20382f91a6ef1bf0 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485223Reviewed-by: Maya Lekova <mslekova@chromium.org> Commit-Queue: Maya Lekova <mslekova@chromium.org> Cr-Commit-Position: refs/heads/master@{#70658}
-
Jakob Gruber authored
This is a reland of 7f58ced7 It fixes the different exit size emitted on x64/Atom CPUs due to performance tuning in TurboAssembler::Call. Additionally, add cctests to verify the fixed size exits. Original change's description: > [deoptimizer] Change deopt entries into builtins > > While the overall goal of this commit is to change deoptimization > entries into builtins, there are multiple related things happening: > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > at runtime, guaranteed to be immovable), have been converted into > builtins. The major restriction is that we now need to preserve the > kRootRegister, which was formerly used on most architectures to pass > the deoptimization id. The solution differs based on platform. > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > - Removed heap/ support for immovable Code generation. > - Removed the DeserializerData class (no longer needed). > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > in which the final jump to the deoptimization entry is generated > once per Code object, and deopt exits can continue to emit a > near-call. > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > sizes by 4/8, 5, and 5 bytes, respectively. > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > by using the same strategy as on arm64 (recalc deopt id from return > address). Before: > > e300a002 movw r10, <id> > e59fc024 ldr ip, [pc, <entry offset>] > e12fff3c blx ip > > After: > > e59acb35 ldr ip, [r10, <entry offset>] > e12fff3c blx ip > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > object (max 32 bytes added overhead per Code object). Before: > > 9401cdae bl <entry offset> > > After: > > # eager deoptimization entry jump. > f95b1f50 ldr x16, [x26, <eager entry offset>] > d61f0200 br x16 > # lazy deoptimization entry jump. > f95b2b50 ldr x16, [x26, <lazy entry offset>] > d61f0200 br x16 > # the deopt exit. > 97fffffc bl <eager deoptimization entry jump offset> > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > bb00000000 mov ebx,<id> > e825f5372b call <entry> > > After: > > e8ea2256ba call <entry> > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > 49c7c511000000 REX.W movq r13,<id> > e8ea2f0700 call <entry> > > After: > > 41ff9560360000 call [r13+<entry offset>] > > Bug: v8:8661,v8:8768 > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70597} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661,v8:8768,chromium:1140165 Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506Reviewed-by: Jakob Gruber <jgruber@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70655}
-
Jakob Gruber authored
This reverts commit 7f58ced7. Reason for revert: Segfaults on Atom_x64 https://ci.chromium.org/p/v8-internal/builders/ci/v8_linux64_atom_perf/5686? Original change's description: > [deoptimizer] Change deopt entries into builtins > > While the overall goal of this commit is to change deoptimization > entries into builtins, there are multiple related things happening: > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > at runtime, guaranteed to be immovable), have been converted into > builtins. The major restriction is that we now need to preserve the > kRootRegister, which was formerly used on most architectures to pass > the deoptimization id. The solution differs based on platform. > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > - Removed heap/ support for immovable Code generation. > - Removed the DeserializerData class (no longer needed). > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > in which the final jump to the deoptimization entry is generated > once per Code object, and deopt exits can continue to emit a > near-call. > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > sizes by 4/8, 5, and 5 bytes, respectively. > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > by using the same strategy as on arm64 (recalc deopt id from return > address). Before: > > e300a002 movw r10, <id> > e59fc024 ldr ip, [pc, <entry offset>] > e12fff3c blx ip > > After: > > e59acb35 ldr ip, [r10, <entry offset>] > e12fff3c blx ip > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > object (max 32 bytes added overhead per Code object). Before: > > 9401cdae bl <entry offset> > > After: > > # eager deoptimization entry jump. > f95b1f50 ldr x16, [x26, <eager entry offset>] > d61f0200 br x16 > # lazy deoptimization entry jump. > f95b2b50 ldr x16, [x26, <lazy entry offset>] > d61f0200 br x16 > # the deopt exit. > 97fffffc bl <eager deoptimization entry jump offset> > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > bb00000000 mov ebx,<id> > e825f5372b call <entry> > > After: > > e8ea2256ba call <entry> > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > 49c7c511000000 REX.W movq r13,<id> > e8ea2f0700 call <entry> > > After: > > 41ff9560360000 call [r13+<entry offset>] > > Bug: v8:8661,v8:8768 > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70597} TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: v8:8661,v8:8768,chromium:1140165 Change-Id: I3df02ab42f6e02233d9f6fb80e8bb18f76870d91 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485504Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70649}
-
- 19 Oct, 2020 6 commits
-
-
Ng Zhi An authored
All these opcodes have a simple lowering into a single x64 instruction. We can perform a similar optimization when AVX is supported to not force dst == src1. Bug: v8:10116 Change-Id: I4ad2975b6f241d8209025682202b476c08b3491b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486383Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70636}
-
Ng Zhi An authored
We don't need separate Load32Zero and Load64Zero instructions, since the implementation is movss and movsd, which we already have. Bug: v8:10713 Change-Id: I5d02e946f3bf9fe08f943a811f2d3cc8aec81ea8 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486233Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70635}
-
Ng Zhi An authored
Not sure why I originally chose to name it LoadMem32Zero instead of Load32Zero like the proposal. This fixes it. Bug: v8:10713 Change-Id: If05603f743213bc6b7aea0ce22c80ae4b3023ccf Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481824Reviewed-by: Bill Budge <bbudge@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70630}
-
Ng Zhi An authored
For splats, we can make use of vshufps to avoid a movss. Without AVX, specific dst to be same as src in the instruction selector. For extract lane, we can use vshufps to extract a float into a dst xmm, and leave junk in the higher bits. On the meshopt_decoder.js benchmark in linked bug, it removes about 7 movss instructions that did nothing. Hardware can do register renaming, but let's not rely on that :) R=bbudge@chromium.org Bug: v8:10116 Change-Id: I4d68c10536a79659de673060d537d58113308477 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481473 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#70628}
-
Ng Zhi An authored
LoadKind is not longer just for load, we use it for stores as well (starting with https://crrev.com/c/2473383). Rename it to something more generic. Bug: v8:10975,v8:10933 Change-Id: I5e5406ea475e06a83eb2eefe22d4824a99029944 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2481822 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Cr-Commit-Position: refs/heads/master@{#70626}
-
Jakob Gruber authored
While the overall goal of this commit is to change deoptimization entries into builtins, there are multiple related things happening: - Deoptimization entries, formerly stubs (i.e. Code objects generated at runtime, guaranteed to be immovable), have been converted into builtins. The major restriction is that we now need to preserve the kRootRegister, which was formerly used on most architectures to pass the deoptimization id. The solution differs based on platform. - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. - Removed heap/ support for immovable Code generation. - Removed the DeserializerData class (no longer needed). - arm64: to preserve 4-byte deopt exits, introduced a new optimization in which the final jump to the deoptimization entry is generated once per Code object, and deopt exits can continue to emit a near-call. - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit sizes by 4/8, 5, and 5 bytes, respectively. On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes by using the same strategy as on arm64 (recalc deopt id from return address). Before: e300a002 movw r10, <id> e59fc024 ldr ip, [pc, <entry offset>] e12fff3c blx ip After: e59acb35 ldr ip, [r10, <entry offset>] e12fff3c blx ip On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases with CFI). Additionally, up to 4 builtin jumps are emitted per Code object (max 32 bytes added overhead per Code object). Before: 9401cdae bl <entry offset> After: # eager deoptimization entry jump. f95b1f50 ldr x16, [x26, <eager entry offset>] d61f0200 br x16 # lazy deoptimization entry jump. f95b2b50 ldr x16, [x26, <lazy entry offset>] d61f0200 br x16 # the deopt exit. 97fffffc bl <eager deoptimization entry jump offset> On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: bb00000000 mov ebx,<id> e825f5372b call <entry> After: e8ea2256ba call <entry> On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: 49c7c511000000 REX.W movq r13,<id> e8ea2f0700 call <entry> After: 41ff9560360000 call [r13+<entry offset>] Bug: v8:8661,v8:8768 Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Cr-Commit-Position: refs/heads/master@{#70597}
-
- 16 Oct, 2020 5 commits
-
-
Ng Zhi An authored
Store lane loads a value from memory and replaces a single lane of a simd value. This implements store lane for x64 and interpreter. Bug: v8:10975 Change-Id: Ida79a03e0fd2bc18f2c06687311936b3cb550ed5 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2473383Reviewed-by: Bill Budge <bbudge@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70586}
-
Ng Zhi An authored
With AVX, we don't need to force dst to be the same as first operand, this can eliminate some moves. (On the js file in linked bug, we can eliminate all movs before shifts, saving ~20 movs.) Bug: v8:10116 Change-Id: I7951b5d8e42995098ddee2a326d0fe6f183c0fb9 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2477494 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#70581}
-
Ross McIlroy authored
This is a reland of cdc8d9a5 Skipped tests on gc_stress and fixed CONSTEXPR_DCHECK for gcc. Original change's description: > [TurboProp] Avoid marking the output of a call live in its catch handler > > The output of a call won't be live if an exception is thrown while the > call is on the stack and we unwind to a catch handler. > > BUG=chromium:1138075,v8:9684 > > Change-Id: I95bf535bac388940869eb213e25565d64fe96df1 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2476317 > Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Georg Neis <neis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70562} Bug: chromium:1138075 Bug: v8:9684 Change-Id: I685c94ee2ffcf06658df07fcef06f58c4f01f54b Cq-Include-Trybots: luci.v8.try:v8_linux64_gcc_compile_dbg Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2479009 Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Auto-Submit: Ross McIlroy <rmcilroy@chromium.org> Cr-Commit-Position: refs/heads/master@{#70573}
-
Michael Achenbach authored
This reverts commit cdc8d9a5. Reason for revert: The regression test is too slow: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20gc%20stress/30454 Also gcc failures: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux64%20gcc%20-%20debug/9528 Original change's description: > [TurboProp] Avoid marking the output of a call live in its catch handler > > The output of a call won't be live if an exception is thrown while the > call is on the stack and we unwind to a catch handler. > > BUG=chromium:1138075,v8:9684 > > Change-Id: I95bf535bac388940869eb213e25565d64fe96df1 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2476317 > Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> > Reviewed-by: Georg Neis <neis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70562} TBR=rmcilroy@chromium.org,neis@chromium.org Change-Id: I0f6b9378d516a70401fc429fb3612bbf962b0fb2 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: chromium:1138075 Bug: v8:9684 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2479007Reviewed-by: Michael Achenbach <machenbach@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> Cr-Commit-Position: refs/heads/master@{#70564}
-
Ross McIlroy authored
The output of a call won't be live if an exception is thrown while the call is on the stack and we unwind to a catch handler. BUG=chromium:1138075,v8:9684 Change-Id: I95bf535bac388940869eb213e25565d64fe96df1 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2476317 Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Cr-Commit-Position: refs/heads/master@{#70562}
-
- 15 Oct, 2020 3 commits
-
-
Ng Zhi An authored
Rename AddSaturate and SubSaturate to the shorter version, AddSat and SubSat, following the spec. Bug: v8:10946,v8:10933 Change-Id: Idf74b3a1eb2e2f6d4e37d2b8e5fa6d96ea090db4 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2436615Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Reviewed-by: Jakob Kummerow <jkummerow@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70549}
-
Victor Gomes authored
- Shortcut return when argc < param_count - Simplify return code due to PopAndReturn invariant Change-Id: Ie41d559cdbe0ba2cc4fdbfbbb622b0aec8429f03 Bug: v8:10201 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2474777 Commit-Queue: Victor Gomes <victorgomes@chromium.org> Commit-Queue: Georg Neis <neis@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Reviewed-by: Igor Sheludko <ishell@chromium.org> Auto-Submit: Victor Gomes <victorgomes@chromium.org> Cr-Commit-Position: refs/heads/master@{#70541}
-
Georg Neis authored
In particular: initial values of local ArchOpcode variables that get overwritten anyways. Creating these variables uninitialized makes that obvious. Change-Id: Ia205b5397c60769a46bf28ed60b299ac652f4b28 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2470557 Auto-Submit: Georg Neis <neis@chromium.org> Reviewed-by: Nico Hartmann <nicohartmann@chromium.org> Commit-Queue: Nico Hartmann <nicohartmann@chromium.org> Cr-Commit-Position: refs/heads/master@{#70526}
-
- 14 Oct, 2020 2 commits
-
-
Ng Zhi An authored
Make everything consistent, pinsr family was converted in https://crrev.com/c/2443494. Bug: v8:10933 Change-Id: I9d09bd477520ce71fccdcf4336135b54c058185c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2470203Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70517}
-
Ng Zhi An authored
We were already using it to define the SSE instructions. Bug: v8:10933 Change-Id: I8c70c027449ee8b0d00a06298087310ced11cafc Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2470200Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70516}
-
- 13 Oct, 2020 1 commit
-
-
Ng Zhi An authored
The only one that doesn't use a pinsr* is f32x4, which uses insertps, so that is kept as it is. Bug: v8:10933 Change-Id: I7442668812c674d4242949e13ef595978290bc8d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2458787Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70493}
-
- 12 Oct, 2020 3 commits
-
-
Ng Zhi An authored
On AVX, many instructions can have 3 operands, unlike SSE which only has 2. So on SSE we use DefineSameAsFirst on the dst. But on AVX, using that will cause some unnecessary moves. This patch changes a couple of F32x4 and S128 instructions to remove this restriction when AVX is supported. We can't use AvxHelper since it duplicates the dst for the call to the AVX instruction, which isn't what we want. The alternative is to redefine Mulps and other functions here, but there are other callsites that depend on this duplicated-dst behavior, so it's harder to change. We can migrate this as we move more logic over to non-DefineSameAsFirst for AVX. With the meshopt_decoder.js in the linked bug, it removes 8 SIMD movs (from a function that has 300+ lines of assembly.) Note that from agner's microarchitecture.pdf, page 127, "Elimination of move instructions", many times such moves can be eliminated by the processor. So this change won't speed up perf, but it helps a bit with binary size, and decoder pressure. Bug: v8:10116,v8:9561 Change-Id: I125bfd44e728ef08312620bc00f6433f376e69e3 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465653Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#70462}
-
Ng Zhi An authored
Implement on interpreter and x64. Bug: v8:10997 Change-Id: I3537ce54e1b56cc3b04d91cb07c430c35b88c3aa Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2459109 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Cr-Commit-Position: refs/heads/master@{#70459}
-
Ng Zhi An authored
Load lane loads a value from memory and replaces a single lane of a simd value. This implements the load (no stores yet) for x64 and interpreter. Bug: v8:10975 Change-Id: I95d1b5e781ee9adaec23dda749e514f2485eda10 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2444578 Commit-Queue: Zhi An Ng <zhin@chromium.org> Reviewed-by: Tobias Tebbi <tebbi@chromium.org> Reviewed-by: Bill Budge <bbudge@chromium.org> Cr-Commit-Position: refs/heads/master@{#70456}
-