1. 25 Apr, 2017 1 commit
    • Clemens Hammacher's avatar
      [wasm] [interpreter] Precompute side table for breaks · 92bf8327
      Clemens Hammacher authored
      Instead of dynamically tracking the block nesting, precompute the
      information statically.
      The interpreter was already using a side table to store the pc diff for
      each break, conditional break and others. The information needed to
      adjust the stack was tracked dynamically, however. This CL also
      precomputes this information, as it is statically known.
      Instead of just storing the pc diff in the side table, we now store the
      pc diff, the stack height diff and the arity of the target block.
      
      Local measurements show speedups of 5-6% on average, sometimes >10%.
      
      R=ahaas@chromium.org
      BUG=v8:5822
      
      Change-Id: I986cfa989aabe1488f2ff79ddbfbb28aeffe1452
      Reviewed-on: https://chromium-review.googlesource.com/485482Reviewed-by: 's avatarAndreas Haas <ahaas@chromium.org>
      Commit-Queue: Clemens Hammacher <clemensh@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#44837}
      92bf8327
  2. 24 Apr, 2017 1 commit
  3. 21 Apr, 2017 1 commit
    • bbudge's avatar
      [WASM SIMD] Remove opcodes that are slow on some platforms. · dddfcfd0
      bbudge authored
      These can be synthesized from existing operations and scheduled for
      better performance than if we have to generate blocks of instructions
      that take many cycles to complete.
      - Remove F32x4RecipRefine, F32x4RecipSqrtRefine. Clients are better off
        synthesizing these from splats, multiplies and adds.
      - Remove F32x4Div, F32x4Sqrt, F32x4MinNum, F32x4MaxNum. Clients are
        better off synthesizing these or using the reciprocal approximations,
        possibly with a refinement step.
      
      LOG=N
      BUG=v8:6020
      
      Review-Url: https://codereview.chromium.org/2827143002
      Cr-Commit-Position: refs/heads/master@{#44784}
      dddfcfd0
  4. 19 Apr, 2017 1 commit
    • bbudge's avatar
      [WASM SIMD] Implement primitive shuffles. · 5806d862
      bbudge authored
      - Adds unary Reverse shuffles (swizzles): S32x2Reverse, S16x4Reverse,
        S16x2Reverse, S8x8Reverse, S8x4Reverse, S8x2Reverse. Reversals are
        done within the sub-vectors that prefix the opcode name, e.g. S8x2
        reverses the 8 consecutive pairs in an S8x16 vector.
      
      - Adds binary Zip (interleave) left and right half-shuffles to return a
        single vector: S32x4ZipLeft, S32x4ZipRightS16x8ZipLeft, S16x8ZipRight,
        S8x16ZipLeft, S8x16ZipRight.
      
      - Adds binary Unzip (de-interleave) left and right half shuffles to return
        a single vector: S32x4UnzipLeft, S32x4UnzipRight, S16x8UnzipLeft,
        S16x8UnzipRight, S8x16UnzipLeft, S8x16UnzipRight.
      
      - Adds binary Transpose left and right half shuffles to return
        a single vector: S32x4TransposeLeft, S32x4TransposeRight,
        S16x8TransposeLeft, S16xTransposeRight, S8x16TransposeLeft,
        S8x16TransposeRight.
      
      - Adds binary Concat (concatenate) byte shuffle: S8x16Concat #bytes to
        paste two vectors together.
      
      LOG=N
      BUG=v8:6020
      
      Review-Url: https://codereview.chromium.org/2801183002
      Cr-Commit-Position: refs/heads/master@{#44734}
      5806d862
  5. 10 Apr, 2017 1 commit
    • bbudge's avatar
      [WASM SIMD] Implement packing and unpacking integer conversions. · dbfc0300
      bbudge authored
      - Adds WASM opcodes I32x4SConvertI16x8Low, I32x4SConvertI16x8High,
        I32x4UConvertI16x8Low, I32x4UConvertI16x8High, which unpack half of
        an I16x8 register into a whole I32x4 register, with signed or unsigned
        extension. Having separate Low/High opcodes works around the difficulty
        of having multiple output registers, which would be necessary if we unpacked
        the entire I16x8 register.
      
      - Adds WASM opcodes I16x8SConvertI8x16Low, I16x8SConvertI8x16High,
        I16x8UConvertI8x16Low, I16x8UConvertI8x16High, similarly to above.
      
      - Adds WASM opcodes I16x8SConvertI32x4, I16x8UConvertI32x4,
        I8x16SConvert16x8, I8x16UConvertI16x8, which pack two source registers
        into a single destination register with signed or unsigned saturation. These
        could have been separated into half operations, but this is simpler to
        implement with SSE, AVX, and is acceptable on ARM. It also avoids adding
        operations that only modify half of their destination register.
      
      - Implements these opcodes for ARM.
      
      LOG=N
      BUG=v8:6020
      
      Review-Url: https://codereview.chromium.org/2800523002
      Cr-Commit-Position: refs/heads/master@{#44541}
      dbfc0300
  6. 16 Mar, 2017 1 commit
  7. 15 Mar, 2017 1 commit
  8. 08 Mar, 2017 1 commit
    • bbudge's avatar
      [WASM] Implement remaining F32x4 operations for ARM. · 78382d72
      bbudge authored
      - Implements Float32x4 Mul, Min, Max for ARM.
      - Implements Float32x4 relational ops for ARM.
      - Implements reciprocal, reciprocal square root estimate/refinement ops for ARM.
      - Reorganizes tests to eliminate need for specialized float ref fns in tests.
      - Rephrases Gt, Ge in terms of Lt, Le, and eliminates the redundant machine
        operators.
      - Renames test-run-wasm-simd test names to match instructions.
      
      LOG=N
      BUG=v8:6020
      
      Review-Url: https://codereview.chromium.org/2729943002
      Cr-Commit-Position: refs/heads/master@{#43658}
      78382d72
  9. 02 Mar, 2017 1 commit
    • bbudge's avatar
      Implement remaining Boolean SIMD operations on ARM. · 386e5a11
      bbudge authored
      - Implements Select instructions using a single ARM vbsl instruction.
      - Renames boolean machine operators to match renamed S1xN machine types.
      - Implements S1xN vector logical ops, AND, OR, XOR, NOT for ARM.
      - Implements S1xN AnyTrue, AllTrue ops for ARM.
      - Eliminates unused SIMD op categories in opcodes.h.
      
      LOG=N
      BUG=v8:6020
      
      Review-Url: https://codereview.chromium.org/2711863002
      Cr-Commit-Position: refs/heads/master@{#43556}
      386e5a11
  10. 13 Feb, 2017 1 commit
    • bbudge's avatar
      [Turbofan] Add more non-arithmetic SIMD operations. · 11f88ef5
      bbudge authored
      - Renames select, swizzle, and shuffle to be consistent with the S128 and
        existing S32x4 ops, and reflect that these aren't arithmetic.
        e.g. I16x8Swizzle -> S16x8Swizzle.
      - Implements S16x8 and S8x16 Select operations and tests.
      - Implements S128And, Or, Xor, Not operations and tests.
      - Implements Swizzle for 32x4 formats.
      - Refactors test macros that generate SIMD code.
      
      TEST=cctest/test-run-wasm-simd/*
      
      LOG=N
      BUG=v8:4124
      
      Review-Url: https://codereview.chromium.org/2683713003
      Cr-Commit-Position: refs/heads/master@{#43168}
      11f88ef5
  11. 20 Jan, 2017 1 commit
  12. 21 Dec, 2016 1 commit
  13. 15 Dec, 2016 2 commits
    • ahaas's avatar
      [wasm] TrapIf and TrapUnless TurboFan operators implemented on ia32. · f435d622
      ahaas authored
      Original commit message:
      [wasm] Introduce the TrapIf and TrapUnless operators to generate trap code.
      
      Some instructions in WebAssembly trap for some inputs, which means that the
      execution is terminated and (at least at the moment) a JavaScript exception is
      thrown. Examples for traps are out-of-bounds memory accesses, or integer
      divisions by zero.
      
      Without the TrapIf and TrapUnless operators trap check in WebAssembly introduces 5
      TurboFan nodes (branch, if_true, if_false, trap-reason constant, trap-position
      constant), in addition to the trap condition itself. Additionally, each
      WebAssembly function has four TurboFan nodes (merge, effect_phi, 2 phis) whose
      number of inputs is linear to the number of trap checks in the function.
      Especially for functions with high numbers of trap checks we observe a
      significant slowdown in compilation time, down to 0.22 MiB/s in the sqlite
      benchmark instead of the average of 3 MiB/s in other benchmarks. By introducing
      a TrapIf common operator only a single node is necessary per trap check, in
      addition to the trap condition. Also the nodes which are shared between trap
      checks (merge, effect_phi, 2 phis) would disappear. First measurements suggest a
      speedup of 30-50% on average.
      
      This CL only implements TrapIf and TrapUnless on x64. The implementation is also
      hidden behind the --wasm-trap-if flag.
      
      Please take a special look at how the source position is transfered from the
      instruction selector to the code generator, and at the context that is used for
      the runtime call.
      
      R=titzer@chromium.org
      
      Review-Url: https://codereview.chromium.org/2571813002
      Cr-Commit-Position: refs/heads/master@{#41735}
      f435d622
    • ahaas's avatar
      [wasm] Introduce the TrapIf and TrapUnless operators to generate trap code. · 7bd61b60
      ahaas authored
      Some instructions in WebAssembly trap for some inputs, which means that the
      execution is terminated and (at least at the moment) a JavaScript exception is
      thrown. Examples for traps are out-of-bounds memory accesses, or integer
      divisions by zero.
      
      Without the TrapIf and TrapUnless operators trap check in WebAssembly introduces 5
      TurboFan nodes (branch, if_true, if_false, trap-reason constant, trap-position
      constant), in addition to the trap condition itself. Additionally, each
      WebAssembly function has four TurboFan nodes (merge, effect_phi, 2 phis) whose
      number of inputs is linear to the number of trap checks in the function.
      Especially for functions with high numbers of trap checks we observe a
      significant slowdown in compilation time, down to 0.22 MiB/s in the sqlite
      benchmark instead of the average of 3 MiB/s in other benchmarks. By introducing
      a TrapIf common operator only a single node is necessary per trap check, in
      addition to the trap condition. Also the nodes which are shared between trap
      checks (merge, effect_phi, 2 phis) would disappear. First measurements suggest a
      speedup of 30-50% on average.
      
      This CL only implements TrapIf and TrapUnless on x64. The implementation is also
      hidden behind the --wasm-trap-if flag.
      
      Please take a special look at how the source position is transfered from the
      instruction selector to the code generator, and at the context that is used for
      the runtime call.
      
      R=titzer@chromium.org
      
      Review-Url: https://codereview.chromium.org/2562393002
      Cr-Commit-Position: refs/heads/master@{#41720}
      7bd61b60
  14. 26 Oct, 2016 1 commit
  15. 12 Sep, 2016 1 commit
  16. 08 Sep, 2016 1 commit
  17. 02 Sep, 2016 1 commit
    • gdeepti's avatar
      [wasm] Fix wasm decoder tracing for prefix opcodes. · eed164b3
      gdeepti authored
      Using --trace-wasm-decoder prints unknowns for prefix opcodes, example:
      
        @3      #01:Block               |  env = 0x5547c10, state = R, reason = block:start, control = #0:Start
      
        @4      #14:GetLocal            | i@4:GetLocal[0]
        @6      #e5:Unknown             | s@6:Unknown
        @8      #15:SetLocal            | s@8:SetLocal[1]
        @10     #14:GetLocal            | s@8:SetLocal[1] i@10:GetLocal[0]
        @12     #14:GetLocal            | s@8:SetLocal[1] i@10:GetLocal[0] s@12:GetLocal[1]
        @14     #cb:I8Const             | s@8:SetLocal[1] i@10:GetLocal[0] s@12:GetLocal[1] i@14:I8Const
        @16     #e5:Unknown             | s@8:SetLocal[1] i@10:GetLocal[0] i@16:Unknown
      
      Fixed to print:
      
        @3        #01:Block               |  env = 0x45cac10, state = R, reason = block:start, control = #0:Start
      
        @4        #14:GetLocal            | i@4:GetLocal[0]
        @6    #e5 #1b:I32x4Splat          | s@6:I32x4Splat
        @8        #15:SetLocal            | s@8:SetLocal[1]
        @10       #14:GetLocal            | s@8:SetLocal[1] i@10:GetLocal[0]
        @12       #14:GetLocal            | s@8:SetLocal[1] i@10:GetLocal[0] s@12:GetLocal[1]
        @14       #cb:I8Const             | s@8:SetLocal[1] i@10:GetLocal[0] s@12:GetLocal[1] i@14:I8Const
        @16   #e5 #1c:I32x4ExtractLane    | s@8:SetLocal[1] i@10:GetLocal[0] i@16:I32x4ExtractLane
      
      R=ahaas@chromium.org, bbudge@chromium.org
      
      Review-Url: https://codereview.chromium.org/2307733002
      Cr-Commit-Position: refs/heads/master@{#39142}
      eed164b3
  18. 16 Jul, 2016 1 commit
  19. 15 Jul, 2016 2 commits
  20. 28 Jun, 2016 1 commit
  21. 23 Jun, 2016 1 commit
  22. 12 May, 2016 1 commit
    • ahaas's avatar
      [wasm] Implement parallel compilation. · 4aec7ba1
      ahaas authored
      With this CL it is possible to compile a wasm module with multiple
      threads in parallel. Parallel compilation works as follows:
      
      1)   The main thread allocates a compilation unit for each wasm function.
      2)   The main thread spawns WasmCompilationTasks which run on the
           background threads.
      3.a) The background threads and the main thread pick one compilation unit
           at a time and execute the parallel phase of the compilation unit.
           After finishing the execution of the parallel phase, the compilation
           unit is stored in a result queue.
      3.b) If the result queue contains a compilation unit, the main thread
           dequeues it and finishes its compilation.
      4)   After the execution of the parallel phase of all compilation units has
           started, the main thread waits for all WasmCompilationTasks to finish.
      5)   The main thread finalizes the compilation of the module.
      
      I'm going to add some additional tests before committing this CL.
      
      R=titzer@chromium.org, bmeurer@chromium.org, mlippautz@chromium.org, mstarzinger@chromium.org
      
      Committed: https://crrev.com/17215438659d8ff2d7d55f95226bf8a1477ccd79
      Cr-Commit-Position: refs/heads/master@{#36178}
      
      Review-Url: https://codereview.chromium.org/1961973002
      Cr-Commit-Position: refs/heads/master@{#36207}
      4aec7ba1
  23. 11 May, 2016 1 commit
  24. 03 May, 2016 1 commit
  25. 29 Apr, 2016 1 commit
    • titzer's avatar
      [wasm] Binary 11: WASM AST is now postorder. · 2aa4656e
      titzer authored
      [wasm] Binary 11: br_table takes a value.
      [wasm] Binary 11: Add implicit blocks to if arms.
      [wasm] Binary 11: Add arities to call, return, and breaks
      [wasm] Binary 11: Add experimental version.
      
      This CL changes the encoder, decoder, and tests to use a postorder
      encoding of the AST, which is more efficient in decode time and
      space.
      
      R=bradnelson@chromium.org,rossberg@chromium.org,binji@chromium.org
      BUG=chromium:575167
      LOG=Y
      
      Review-Url: https://codereview.chromium.org/1830663002
      Cr-Commit-Position: refs/heads/master@{#35896}
      2aa4656e
  26. 19 Apr, 2016 1 commit
  27. 03 Mar, 2016 2 commits
  28. 20 Jan, 2016 1 commit
  29. 17 Dec, 2015 1 commit
  30. 11 Dec, 2015 2 commits