1. 15 Sep, 2019 1 commit
  2. 13 Aug, 2019 1 commit
    • gxw's avatar
      avutil/mips: refine msa macros CLIP_*. · a3e572d9
      gxw authored
      Changing details as following:
      1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in
         source vector.
      2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'.
         Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x).
         Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x).
         Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x).
      3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 'CLIP_SH/Wn_0_255'
         instead, because there are no difference in the effect of this two macros.
      Reviewed-by: 's avatarShiyou Yin <yinshiyou-hf@loongson.cn>
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      a3e572d9
  3. 18 Jul, 2019 1 commit
    • Shiyou Yin's avatar
      avutil/mips: refactor msa load and store macros. · 153c6075
      Shiyou Yin authored
      Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}.
      The old macros are difficult to use because they don't follow the same parameter passing rules.
      Changing details as following:
      1. remove LD4x4_SH.
      2. replace ST2x4_UB with ST_H4.
      3. replace ST4x2_UB with ST_W2.
      4. replace ST4x4_UB with ST_W4.
      5. replace ST4x8_UB with ST_W8.
      6. replace ST6x4_UB with ST_W2 and ST_H2.
      7. replace ST8x1_UB with ST_D1.
      8. replace ST8x2_UB with ST_D2.
      9. replace ST8x4_UB with ST_D4.
      10. replace ST8x8_UB with ST_D8.
      11. replace ST12x4_UB with ST_D4 and ST_W4.
      
      Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride)
      ST_H4 store four half-word elements in vector 'in' to pdst with stride.
      About the macro name:
      1) 'ST' means store operation.
      2) 'H/W/D' means type of vector element is 'half-word/word/double-word'.
      3) Number '1/2/4/8' means how many elements will be stored.
      About the macro parameter:
      1) 'in0, in1...' 128-bits vector.
      2) 'idx0, idx1...' elements index.
      3) 'pdst' destination pointer to store to
      4) 'stride' stride of each store operation.
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      153c6075
  4. 18 Jun, 2015 1 commit