Commit c0d1f24b authored by Ng Zhi An's avatar Ng Zhi An Committed by V8 LUCI CQ

[x64] Optimize v128.const when two int64 halves are the same

Previously it generates a mvoq+pinsrq, now it generates a single
punpcklqdq.

punpcklqdq is smaller in code size, and also faster on most arch (latency
1, rthroughput of 1, 1 uop, uses 1 port) than pinsrq (latency 2, 2 uop,
uses 2 port) (from https://uops.info/table.html)

punpcklqdq is mean to work on int domain, and although we can't be
certain what v128.const will be used for, the movq is considered
an integer domain instruction, so we can avoid unnecessary transitions
by using punpcklqdq (instead of movddup, which is similar in perf
and code size).

Bug: v8:11033
Change-Id: Iab81168ffad84488b90ff307d440bed15c9f90a3
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3169322Reviewed-by: 's avatarDeepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76972}
parent 2db50670
...@@ -1635,6 +1635,12 @@ void TurboAssembler::Move(XMMRegister dst, uint64_t src) { ...@@ -1635,6 +1635,12 @@ void TurboAssembler::Move(XMMRegister dst, uint64_t src) {
} }
void TurboAssembler::Move(XMMRegister dst, uint64_t high, uint64_t low) { void TurboAssembler::Move(XMMRegister dst, uint64_t high, uint64_t low) {
if (high == low) {
Move(dst, low);
Punpcklqdq(dst, dst);
return;
}
Move(dst, low); Move(dst, low);
movq(kScratchRegister, high); movq(kScratchRegister, high);
Pinsrq(dst, dst, kScratchRegister, uint8_t{1}); Pinsrq(dst, dst, kScratchRegister, uint8_t{1});
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment