-
Zhi An Ng authored
Optimize: - i32x4.widen_high_i16x8_s - i32x4.widen_high_i16x8_u - i16x8.widen_high_i8x16_s - i16x8.widen_high_i8x16_u These optimizations were suggested in http://b/175364869. The main change is to move away from palignr, which has a dependency on dst, and also the AVX version is 2 bytes longer than the punpckhqdq. For the signed and unsigned variants, we have slightly different optimizations. Unsigned variants can use an punpckh* instruction with a zero-ed scratch register, that effectively zero-extends. Signed variants use the movhlps instruction to move high half to low half of dst, then use packed signed extension instructions. The common fallback for these instructions is to use pshufd, which does not have a dependency on dst, but is 1 byte longer than the punpckh* instructions. FIXED=b/175364869 Change-Id: If28da2aaa8f6e39a58e63b01cc9a81bbbb294606 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591853Reviewed-by: Bill Budge <bbudge@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71856}
b145152d