Commit 3ea458be authored by Zhi An Ng's avatar Zhi An Ng Committed by Commit Bot

[x64][wasm-simd] Optimize f32x4.extract_lane

Change the codegen for f32x4.extract_lane from shufps to insertps when
AVX is supported. They have the same performance, but shufps has a false
dependency on dst (it shuffles dst and src, but we don't care about dst
at all).

Also for SSE, extractps + movd crosses register files, so change it to
use insertps as well.

Change-Id: Idf45849d37ac3499bf3371ba2fa6ae05829aa8a7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589048
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: 's avatarBill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71747}
parent 231bc86c
......@@ -2563,15 +2563,21 @@ CodeGenerator::CodeGenResult CodeGenerator::AssembleArchInstruction(
break;
}
case kX64F32x4ExtractLane: {
XMMRegister dst = i.OutputDoubleRegister();
XMMRegister src = i.InputSimd128Register(0);
uint8_t lane = i.InputUint8(1);
DCHECK_LT(lane, 4);
if (lane == 0 && dst == src) {
break;
}
uint8_t zmask = 0xE; // Zero top 3 lanes.
if (CpuFeatures::IsSupported(AVX)) {
CpuFeatureScope avx_scope(tasm(), AVX);
XMMRegister src = i.InputSimd128Register(0);
// vshufps and leave junk in the 3 high lanes.
__ vshufps(i.OutputDoubleRegister(), src, src, i.InputInt8(1));
// Use src for both operands to avoid false-dependency on dst.
__ vinsertps(dst, src, src, zmask | (lane << 6));
} else {
__ extractps(kScratchRegister, i.InputSimd128Register(0),
i.InputUint8(1));
__ movd(i.OutputDoubleRegister(), kScratchRegister);
__ insertps(dst, src, zmask | (lane << 6));
}
break;
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment