Commits · 423375d4f06ae7103e575a31c23e62e3ba440845 · Linshizhi / ffmpeg.wasm-core

13 Oct, 2015 15 commits
- x86/vp9itxfm: fix register clobbering in ff_vp9_idct_idct_4x4_add_12_sse2 · 74a87ae2
  James Almer authored 9 years ago
```
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  74a87ae2
- vp9: use registers for constant loading where possible. · e5786383
  Ronald S. Bultje authored 9 years ago
  
  e5786383
- vp9: refactor itx coefficients and share between 8 and 10/12bpp. · 408bb855
  Ronald S. Bultje authored 9 years ago
  
  408bb855
- vp9: add itxfm_add eob shortcuts to 10/12bpp functions. · eb4b5ff7
  Ronald S. Bultje authored 9 years ago
```
These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.
```
  eb4b5ff7
- vp9: add 10/12bpp idct_idct_32x32 sse2 SIMD version. · 488fadeb
  Ronald S. Bultje authored 9 years ago
  
  488fadeb
- vp9: 10/12bpp sse2 SIMD for iadst16. · 3d0ca2fe
  Ronald S. Bultje authored 9 years ago
  
  3d0ca2fe
- vp9: refactor 10/12bpp dc-only code in 4x4/8x8 and add to 16x16. · 0e80265b
  Ronald S. Bultje authored 9 years ago
  
  0e80265b
- vp9: add 10/12bpp sse2 SIMD version for idct_idct_16x16. · 1338fb79
  Ronald S. Bultje authored 9 years ago
  
  1338fb79
- vp9: add 10/12bpp sse2 SIMD versions of iadst8x8. · cb054d06
  Ronald S. Bultje authored 9 years ago
  
  cb054d06
- vp9: add 10/12bpp sse2 SIMD for idct_idct_8x8. · e0610787
  Ronald S. Bultje authored 9 years ago
  
  e0610787
- vp9: add 12bpp sse2 versions of iadst4. · a35f6bdb
  Ronald S. Bultje authored 9 years ago
  
  a35f6bdb
- vp9: initial attempt at a idct_idct_4x4 12bpp x86 simd (sse2) impl. · 235e76ae
  Ronald S. Bultje authored 9 years ago
```
The trouble with this function is that intermediates overflow 31+sign
bits, so I've added some helpers (that will also be used in 10/12bpp
8x8, 16x16 and 32x32) to make that easier, basically emulating a half-
assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees
potential in adding ssse3, I'd love to hear it.
```
  235e76ae
- vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions. · f76423d0
  Ronald S. Bultje authored 9 years ago
  
  f76423d0
- vp9: add 10bpp simd (mmxext/ssse3) for idct_idct_4x4. · 6b579cf5
  Ronald S. Bultje authored 9 years ago
  
  6b579cf5
- vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function. · 1c3be325
  Ronald S. Bultje authored 9 years ago
  
  1c3be325