-
Ng Zhi An authored
Previously it generates a mvoq+pinsrq, now it generates a single punpcklqdq. punpcklqdq is smaller in code size, and also faster on most arch (latency 1, rthroughput of 1, 1 uop, uses 1 port) than pinsrq (latency 2, 2 uop, uses 2 port) (from https://uops.info/table.html) punpcklqdq is mean to work on int domain, and although we can't be certain what v128.const will be used for, the movq is considered an integer domain instruction, so we can avoid unnecessary transitions by using punpcklqdq (instead of movddup, which is similar in perf and code size). Bug: v8:11033 Change-Id: Iab81168ffad84488b90ff307d440bed15c9f90a3 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3169322Reviewed-by: Deepti Gandluri <gdeepti@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/main@{#76972}
c0d1f24b