Commits · af5922a79a13e7ab48679c619bfcbf3a8491de1e · Linshizhi / ffmpeg.wasm-core

12 Apr, 2020 1 commit

avcodec/mips: fix get_cabac_inline_mips function name · 4fa4ab97

Rosen Penev authored 4 years ago

On other platforms, the functions are named get_cabac_inline_xxx but not
this one. There's also a define.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

4fa4ab97

11 Apr, 2020 1 commit

avcodec/aacdec: fix compilation under soft float MIPS · 875ba233

Rosen Penev authored 4 years ago

Place HAVE_MIPSFPU further up so that functions that use floating point
ASM are defined away. Otherwise compilation failures result when soft
float in enabled on the toolchain.
Signed-off-by: Rosen Penev <rosenp@gmail.com>

875ba233

12 Dec, 2019 1 commit

lavc/mips: simplify the switch code · bffb9326

Linjie Fu authored 5 years ago

Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

bffb9326

30 Oct, 2019 1 commit

avcodec/mips: msa optimizations for vc1dsp · 648b422e

gxw authored 5 years ago

Performance of WMV3 decoding has speed up from 3.66x to 5.23x tested on 3A4000.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

648b422e

13 Oct, 2019 1 commit

avcodec/mips: Fixed four warnings in vc1dsp · 21d19f49

gxw authored 5 years ago

Change the stride argument to ptrdiff_t in the following functions:
ff_put_no_rnd_vc1_chroma_mc8_mmi, ff_put_no_rnd_vc1_chroma_mc4_mmi,
ff_avg_no_rnd_vc1_chroma_mc8_mmi, ff_avg_no_rnd_vc1_chroma_mc4_mmi.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

21d19f49

15 Sep, 2019 1 commit

avutil/mips: refactor msa SLDI_Bn_0 and SLDI_Bn macros. · 92fc0bfa

gxw authored 5 years ago

Changing details as following:
1. The previous order of parameters are irregular and difficult to
   understand. Adjust the order of the parameters according to the
   rule: (RTYPE, input registers, input mask/input index/..., output registers).
   Most of the existing msa macros follow the rule.
2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

92fc0bfa

10 Sep, 2019 1 commit

avcodec/mips: Fix a warnning of indentation not reflect the block structure. · de5543d8

Shiyou Yin authored 5 years ago

The indentation of code dose not reflect the if block structure in
'apply_ltp_mips', and this will generate a warnning when build with
'-Wall' or '-Wmisleading-indentation'.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

de5543d8

13 Aug, 2019 1 commit

avutil/mips: refine msa macros CLIP_*. · a3e572d9

gxw authored 5 years ago

Changing details as following:
1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in
source vector.
2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'.
Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x).
Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x).
Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x).
3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 'CLIP_SH/Wn_0_255'
instead, because there are no difference in the effect of this two macros.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

a3e572d9

02 Aug, 2019 1 commit
- avutil/mips: Avoid instruction exception caused by gssqc1/gslqc1. · 11f99a9a
  Shiyou Yin authored 5 years ago
```
Ensure the address accesed by gssqc1/gslqc1 are 16-byte aligned.
```
  11f99a9a
28 Jul, 2019 1 commit

avcodec/mips: [loongson] refine process of setting block as 0 in h264dsp_mmi. · 62e6b634

Shiyou Yin authored 5 years ago

In function ff_h264_add_pixels4_8_mmi, there is no need to reset '%[ftmp0]'
to 0, because it's value has never changed since the start of the asm block.
This patch remove the redundant 'xor' and set src to zero once it was loaded.

In function ff_h264_idct_add_8_mmi, 'block' is seted to zero twice.
This patch removed the first setting zero operation and move the second one
after the load operation of block.

In function ff_h264_idct8_add_8_mmi, 'block' is seted to zero twice too.
This patch just removed the second setting zero operation.

This patch mainly simplifies the implementation of functions above,
the effect on the performance of whole h264 decoding process is not obvious.
According to the perf data, proportion of ff_h264_idct_add_8_mmi decreased from
0.29% to 0.26% and ff_h264_idct8_add_8_mmi decreased from 0.62% to 0.59% when decoding
H264 format on loongson 3A3000(For reference only , not very stable.).
Reviewed-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

62e6b634

18 Jul, 2019 1 commit

avutil/mips: refactor msa load and store macros. · 153c6075

Shiyou Yin authored 5 years ago

Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}.
The old macros are difficult to use because they don't follow the same parameter passing rules.
Changing details as following:
1. remove LD4x4_SH.
2. replace ST2x4_UB with ST_H4.
3. replace ST4x2_UB with ST_W2.
4. replace ST4x4_UB with ST_W4.
5. replace ST4x8_UB with ST_W8.
6. replace ST6x4_UB with ST_W2 and ST_H2.
7. replace ST8x1_UB with ST_D1.
8. replace ST8x2_UB with ST_D2.
9. replace ST8x4_UB with ST_D4.
10. replace ST8x8_UB with ST_D8.
11. replace ST12x4_UB with ST_D4 and ST_W4.

Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride)
ST_H4 store four half-word elements in vector 'in' to pdst with stride.
About the macro name:
1) 'ST' means store operation.
2) 'H/W/D' means type of vector element is 'half-word/word/double-word'.
3) Number '1/2/4/8' means how many elements will be stored.
About the macro parameter:
1) 'in0, in1...' 128-bits vector.
2) 'idx0, idx1...' elements index.
3) 'pdst' destination pointer to store to
4) 'stride' stride of each store operation.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

153c6075

10 Jul, 2019 1 commit

avcodec/mips/cabac: replace addi with addiu · 925e33b2

YunQiang Su authored 5 years ago

addi/daddi are deprecated by MIPS for years, and MIPS r6 remove
them.

They should be replace with addiu:
   ADDIU performs the same arithmetic operation but
   does not trap on overflow.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

925e33b2

26 May, 2019 1 commit

avcodec/mips: [loongson] fix mpeg4 decoding error on loongson platform. · 6b67daa3

Shiyou Yin authored 5 years ago

In function ff_dct_unquantize_mpeg2_intra_mmi,
addr0 shoudn't be changed before storage operation.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

6b67daa3

27 Feb, 2019 1 commit

avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functions · 4571c7c0

gxw authored 6 years ago

VP9 decoding speed improved about 60.5%(from 38fps to 61fps, tested on loongson 3A3000).
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

4571c7c0

16 Feb, 2019 1 commit

avcodec/mips: [loongson] optimize theora decoding with mmi. · 1466dc14

gxw authored 6 years ago

Optimize theora decoding with mmi in functions:
1. ff_vp3_idct_add_mmi
2. ff_vp3_idct_put_mmi
3. ff_vp3_idct_dc_add_mmi
4. ff_put_no_rnd_pixels_l2_mmi

Theora decoding speed improved about 32%(from 88fps to 116fps, Tested on loongson 3A3000).
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

1466dc14

02 Feb, 2019 4 commits

avcodec/mips: [loongson] optimize put_hevc_qpel_h_8 with mmi. · b429c86d

Shiyou Yin authored 6 years ago

Optimize put_hevc_qpel_h_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 2%(2.39x to 2.44x, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

b429c86d

avcodec/mips: [loongson] optimize put_hevc_qpel_bi_h_8 with mmi. · dceefb2b

Shiyou Yin authored 6 years ago

Optimize put_hevc_qpel_bi_h_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 2.1%(2.34x to 2.39x, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

dceefb2b

avcodec/mips: [loongson] optimize put_hevc_epel_bi_hv_8 with mmi. · c0942b7a

Shiyou Yin authored 6 years ago

Optimize put_hevc_epel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32.
This optimization improved HEVC decoding performance 1.7%(2.30x to 2.34x, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

c0942b7a

avcodec/mips: [loongson] optimize put_hevc_qpel_uni_hv_8 with mmi. · 0c434292

Shiyou Yin authored 6 years ago

Optimize put_hevc_qpel_uni_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 2.7%(2.24x to 2.30x, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

0c434292

21 Jan, 2019 2 commits

avcodec/mips: [loongson] optimize put_hevc_qpel_bi_hv_8 with mmi. · 83aa2cd7

Shiyou Yin authored 6 years ago

Optimize put_hevc_qpel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 11.4%(2.01x to 2.24x, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

83aa2cd7

avcodec/mips: [loongson] optimize put_hevc_qpel_hv_8 with mmi. · 6d191648

Shiyou Yin authored 6 years ago

Optimize put_hevc_qpel_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 11%(1.81x to 2.01x, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

6d191648

19 Jan, 2019 1 commit

avcodec/mips: [loongson] optimize put_hevc_pel_bi_pixels_8 with mmi. · 32421602

Shiyou Yin authored 6 years ago

Optimize put_hevc_pel_bi_pixels_8 with mmi in the case width=8/16/24/32/48/64.
This optimization improved HEVC decoding performance 2%(1.77x to 1.81x, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

32421602

27 Dec, 2018 1 commit

avcodec/mips: [loongson] optimize theora decoding in vp3dsp. · d86f698e

gxw authored 6 years ago

Optimize theora decoding with msa in functions:
1. ff_vp3_idct_add_msa
2. ff_vp3_idct_put_msa
3. ff_vp3_idct_dc_add_msa
4. ff_vp3_v_loop_filter_msa
5. ff_vp3_h_loop_filter_msa
6. ff_put_no_rnd_pixels_l2_msa

Theora decoding speed improved about 36%(from 22fps to 30fps, Tested on loongson 2K1000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

d86f698e

24 Dec, 2018 1 commit

avcodec/mips: Fix failed case: hevc-conformance-AMP_A_Samsung_* when enable msa · f652c7a4

gxw authored 6 years ago

The AV_INPUT_BUFFER_PADDING_SIZE has been increased to 64, but the value is still 32
in function ff_hevc_sao_edge_filter_8_msa. So, use AV_INPUT_BUFFER_PADDING_SIZE directly.
Also, use MAX_PB_SIZE directly instead of 64. Fate tests passed.
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

f652c7a4

18 Dec, 2018 1 commit

avcodec/mips: [loongson] enable MSA optimization for loongson platform. · 76952aa4

Shiyou Yin authored 6 years ago

Set initialization order of MSA after MMI to make it work on loongson platform(msa is supported by loongson2k、3a4000 etc.).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

76952aa4

01 Dec, 2018 1 commit

avcodec/mips: [loongson] refine optimization in h264_chroma. · 5982614a

Shiyou Yin authored 6 years ago

Remove invalid operation in the case x and y all equal 0,
this refine made about 2% speedup for H264 decode on loongson platform.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

5982614a

19 Sep, 2018 1 commit

avcodec: [loongson] optimize get_cabac_inline. · ba175578

Shiyou Yin authored 6 years ago

This optimization improved h264 decoding performance about 4%(from 74fps to 77fps, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

ba175578

18 Sep, 2018 1 commit

avcodec/mips: [loongson] refine ff_vc1_inv_trans_8x8_mmi. · 2b646dac

Shiyou Yin authored 6 years ago

Combined 1st and 2nd loop into one inline asm in function ff_vc1_inv_trans_8x8_mmi to
reduce memory operation, and made some small optimization in ff_vc1_inv_trans_4x8_mmi.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

2b646dac

13 Sep, 2018 1 commit

avcodec/mips: [loongson] fix bug of svq3-watermark failed in fate test. · a55adf24

Shiyou Yin authored 6 years ago

Failed case: svq3-watermark
When minimum loop count of following functions are greater than parameter h passed to them, svq3-watermark failed.
1. ff_put_pixels4_8_mmi
2. ff_avg_pixels4_8_mmi
3. ff_put_pixels4_l2_8_mmi
4. ff_avg_pixels4_l2_8_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

a55adf24

09 Sep, 2018 2 commits

avutil/mips: [loongson] simplify macro TRANSPOSE_4H and TRANSPOSE_8B · 5161f7bc

Shiyou Yin authored 6 years ago

Simplify macro TRANSPOSE_4H in mmiutils.h and add TRANSPOSE_8B as a common macro.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

5161f7bc

avcodec/mips: [loongson] optimize vp8 decoding in vp8dsp. · 090647da

gxw authored 6 years ago

Optimize vp8 loop filter with mmi, four functions optimized:
1. ff_vp8_h_loop_filter8uv_mmi.
2. ff_vp8_v_loop_filter8uv_mmi.
3. ff_vp8_h_loop_filter16_mmi.
4. ff_vp8_v_loop_filter16_mmi.

Vp8 decoding speed improved about 50%(from 73fps to 110fps, Tested on loongson 3A3000).
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

090647da

07 Sep, 2018 1 commit

avcodec/mips: [loongson] fix improper use of register constraints. · 9f60c585

Shiyou Yin authored 6 years ago

Constraint "g" means compiler can store variable in memory or register.
When we use constraint "g" for a variable and this variable was operated by
instruction which only support register operands may lead "invalid operands" error.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

9f60c585

05 Sep, 2018 1 commit

avcodec/mips: [loongson] reoptimize put and add pixels clamped functions. · 776909e4

Shiyou Yin authored 6 years ago

Simplify the usage of intermediate variable addr and remove unused variable all64
in following functions:
1. ff_put_pixels_clamped_mmi
2. ff_put_signed_pixels_clamped_mmi
3. ff_add_pixels_clamped_mmi

This optimization speed up mpeg4 decode about 2% on loongson platform(tested with 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

776909e4

04 Sep, 2018 2 commits

avcodec/mips: [loongson] simplify the usage of intermediate variable addr. · 17c635e6

Shiyou Yin authored 6 years ago

Simplify the usage of intermediate variable addr in following functions:
1. ff_put_pixels4_8_mmi
2. ff_put_pixels8_8_mmi
3. ff_put_pixels16_8_mmi
4. ff_avg_pixels16_8_mmi.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

17c635e6

avcodec: [loongson] fix bug of mss2-wmv failed in fate test. · 61eeb40a

Shiyou Yin authored 6 years ago

Failed case: mss2-wmv
In following functions, pmullh was used to multiply two 16-bit data, this will cause data overflow.
1. ff_vc1_inv_trans_8x8_dc_mmi
2. ff_vc1_inv_trans_8x8_mmi
3. ff_vc1_inv_trans_8x4_mmi
4. ff_vc1_inv_trans_4x8_mmi
5. ff_vc1_inv_trans_4x4_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

61eeb40a

02 Sep, 2018 3 commits

avcodec/mips: [loongson] optimize memset in h264dsp. · 93b35a05

Shiyou Yin authored 6 years ago

Optimized memset with mmi in following functions:
1. ff_h264_add_pixels4_8_mmi.
2. ff_h264_idct_add_8_mmi.
3. ff_h264_idct8_add_8_mmi.

This optimization improved h264 decoding performance about 1.3%(tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

93b35a05

avcodec/mips: [loongson] reoptimize h264_chroma_mc8_mmi v2. · f91237ba

Shiyou Yin authored 6 years ago

Reoptimize function ff_put_h264_chroma_mc8_mmi and ff_avg_h264_chroma_mc8_mmi.
Performance of h264 decoding improved about 5%(from 69fps to 73fps, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

f91237ba

avcodec/mips: [loongson] reoptimize simple idct with mmi. · df13b75a

Shiyou Yin authored 6 years ago

Performance of mpeg4 decoding improved about 23%(from 128fps to 158fps, tested on loongson 3A3000).
Reoptimized following functions with mmi.
1. ff_simple_idct_put_8_mmi
2. ff_simple_idct_add_8_mmi
3. ff_simple_idct_8_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

df13b75a

14 Jul, 2018 1 commit

avcodec/mips: fix conflicting types error of ff_vc1_h_s_overlap_mmi. · c0b42987

Shiyou Yin authored 6 years ago

In commit 975a1a81,function ff_vc1_h_s_overlap_mmi was refactored,
but the declaration in libavcodec/mips/vc1dsp_mips.h was unchanged.

Change-Id: I90beae683511622a0cc1130ab1660ac8669ec3ef
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Reviewed-by: Jerome Borsboom <jerome.borsboom@carpalis.nl>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

c0b42987

28 Jun, 2018 1 commit

avcodec/vc1: fix overlap filter for frame interlaced pictures · 975a1a81

Jerome Borsboom authored 6 years ago

The overlap filter is not correct for vertical edges in frame interlaced
I and P pictures. When filtering macroblocks with different FIELDTX values,
we have to match the lines at both sides of the vertical border. In addition,
we have to use the correct rounding values, depending on the line we are
filtering.
Signed-off-by: Jerome Borsboom <jerome.borsboom@carpalis.nl>

975a1a81