src/codegen/ia32/sse-instr.h · 4db784674ec558eacbb461cac7088a59e2b11017 · Linshizhi / V8

[x64][ia32][wasm-simd] Optimize v128.bitselect · 8e9ad4f8

Zhi An Ng authored Dec 16, 2020

Couple of optimizations for v128.bitselect on both ia32 and x64.

1. Remove an extra movaps when AVX is supported, since we have 3-operand
instructions
2. Tweak the algorithm from:
     xor(and(xor(src1, src2), mask) src2)

   To:
     or(and(src1, mask), andnot(src2, mask))
   It is easier to read and understand, and also eliminate a dependency
   chain (on kScratchDoubleReg) in the older algorithm.
3. Use integer forms of the logical ops. Older processors have higher
throughput on these, compared to the floating point ops. However, the
integer forms are 1 byte longer, so on SSE, we stick to the floating
point ops.

For AVX, this reduces instruction count from 9948 to 9868.

Change-Id: Idd5d26b99a76255dbfa63e2c304e6af3760c4ec6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591859Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71845}

8e9ad4f8

sse-instr.h 3.42 KB

Replace sse-instr.h