- 15 Sep, 2019 1 commit
-
-
gxw authored
Changing details as following: 1. The previous order of parameters are irregular and difficult to understand. Adjust the order of the parameters according to the rule: (RTYPE, input registers, input mask/input index/..., output registers). Most of the existing msa macros follow the rule. 2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead. Reviewed-by:
Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 14 Aug, 2019 1 commit
-
-
Shiyou Yin authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 13 Aug, 2019 1 commit
-
-
gxw authored
Changing details as following: 1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in source vector. 2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'. Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x). Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x). Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x). 3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 'CLIP_SH/Wn_0_255' instead, because there are no difference in the effect of this two macros. Reviewed-by:
Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 02 Aug, 2019 1 commit
-
-
Shiyou Yin authored
Ensure the address accesed by gssqc1/gslqc1 are 16-byte aligned.
-
- 18 Jul, 2019 1 commit
-
-
Shiyou Yin authored
Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}. The old macros are difficult to use because they don't follow the same parameter passing rules. Changing details as following: 1. remove LD4x4_SH. 2. replace ST2x4_UB with ST_H4. 3. replace ST4x2_UB with ST_W2. 4. replace ST4x4_UB with ST_W4. 5. replace ST4x8_UB with ST_W8. 6. replace ST6x4_UB with ST_W2 and ST_H2. 7. replace ST8x1_UB with ST_D1. 8. replace ST8x2_UB with ST_D2. 9. replace ST8x4_UB with ST_D4. 10. replace ST8x8_UB with ST_D8. 11. replace ST12x4_UB with ST_D4 and ST_W4. Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride) ST_H4 store four half-word elements in vector 'in' to pdst with stride. About the macro name: 1) 'ST' means store operation. 2) 'H/W/D' means type of vector element is 'half-word/word/double-word'. 3) Number '1/2/4/8' means how many elements will be stored. About the macro parameter: 1) 'in0, in1...' 128-bits vector. 2) 'idx0, idx1...' elements index. 3) 'pdst' destination pointer to store to 4) 'stride' stride of each store operation. Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 10 Jul, 2019 1 commit
-
-
Shiyou Yin authored
Loongson 3A4000 and 2k1000 has supported MSA2.0. This patch optimized SAD_UB2_UH,UNPCK_R_SH_SW,UNPCK_SB_SH and UNPCK_SH_SW with MSA2.0 instruction. Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 27 Feb, 2019 1 commit
-
-
gxw authored
VP9 decoding speed improved about 60.5%(from 38fps to 61fps, tested on loongson 3A3000). Reviewed-by:
Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 21 Jan, 2019 1 commit
-
-
Shiyou Yin authored
Optimize put_hevc_qpel_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 11%(1.81x to 2.01x, tested on loongson 3A3000). Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 09 Sep, 2018 2 commits
-
-
Shiyou Yin authored
Simplify macro TRANSPOSE_4H in mmiutils.h and add TRANSPOSE_8B as a common macro. Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
gxw authored
Optimize vp8 loop filter with mmi, four functions optimized: 1. ff_vp8_h_loop_filter8uv_mmi. 2. ff_vp8_v_loop_filter8uv_mmi. 3. ff_vp8_h_loop_filter16_mmi. 4. ff_vp8_v_loop_filter16_mmi. Vp8 decoding speed improved about 50%(from 73fps to 110fps, Tested on loongson 3A3000). Signed-off-by:
Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 02 Sep, 2018 1 commit
-
-
Shiyou Yin authored
Performance of mpeg4 decoding improved about 23%(from 128fps to 158fps, tested on loongson 3A3000). Reoptimized following functions with mmi. 1. ff_simple_idct_put_8_mmi 2. ff_simple_idct_add_8_mmi 3. ff_simple_idct_8_mmi Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 25 Oct, 2017 1 commit
-
-
Kaustubh Raste authored
Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by:
Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 10 Oct, 2017 1 commit
-
-
Kaustubh Raste authored
Replace generic with block size specific function. Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by:
Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 27 Sep, 2017 1 commit
-
-
Kaustubh Raste authored
Replace generic with block size specific function. Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by:
Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 24 Sep, 2017 1 commit
-
-
Kaustubh Raste authored
Load the specific destination bytes instead of MSA load and pack. Pack the data to half word before clipping. Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by:
Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 15 Sep, 2017 1 commit
-
-
Kaustubh Raste authored
Preload data in band filter 0-8 for better pipeline parallelization. Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by:
Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 08 Sep, 2017 1 commit
-
-
Kaustubh Raste authored
Load the specific destination bytes instead of MSA load and pack. Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by:
Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 25 Jul, 2017 1 commit
-
-
Kaustubh Raste authored
Removed memset call and improved performance. Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by:
Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 21 Jul, 2017 1 commit
-
-
Kaustubh Raste authored
Reduced msa load-store code. Removed inline asm of GP load-store for 64 bit. Updated variable names in GP load-store macros for naming consistency. Corrected macro descriptions. Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by:
Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 23 Oct, 2016 1 commit
-
-
Zhou Xiaoyong authored
1.mmiutils.h defined MMI_ load/store macros for loongson2e/2f/3a 2.mmiutils.h defined some mmi assembly macors Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 05 Oct, 2016 1 commit
-
-
Shivraj Patil authored
Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 14 May, 2016 1 commit
-
-
ZhouXiaoyong authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 09 Mar, 2016 1 commit
-
-
Vicente Olivert Riera authored
Understanding the mips32r6 and mips64r6 ISAs in the configure script is not enough. In order to have full support for MIPS R6 in FFmpeg we need to be able to build it, and for that we need to make sure we don't use incompatible assembler code which makes the build fail. Ifdefing the offending code is sufficient to fix the problem. Signed-off-by:
Vicente Olivert Riera <Vincent.Riera@imgtec.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 31 Jan, 2016 1 commit
-
-
Timothy Gu authored
-
- 29 Sep, 2015 1 commit
-
-
Vicente Olivert Riera authored
MIPS R6 supports unaligned memory access and does not have the load/store-left/right family of instructions. Signed-off-by: Vicente Olivert Riera <Vincent.Riera at imgtec.com> Signed-off-by: Luca Barbato <lu_zero at gentoo.org> Signed-off-by:
Luca Barbato <lu_zero@gentoo.org>
-
- 23 Jul, 2015 1 commit
-
-
Shivraj Patil authored
Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Reviewed-by:
"Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 07 Jul, 2015 1 commit
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for idctdsp functions in new file idctdsp_msa.c and simple_idct_msa.c Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 06 Jul, 2015 2 commits
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for me_cmp functions in new file me_cmp_msa.c Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for mpegvideoencdsp functions in new file mpegvideoencdsp_msa.c Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 01 Jul, 2015 1 commit
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for mpegvideo functions in new file mpegvideo_msa.c Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 29 Jun, 2015 1 commit
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for pixblock functions in new file pixblockdsp_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 19 Jun, 2015 1 commit
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for hpel functions in new file hpeldsp_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 18 Jun, 2015 1 commit
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for qpel functions in new file qpeldsp_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 13 Jun, 2015 1 commit
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for AVC qpel functions in new file h264qpel_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Added const to local static array. Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 11 Jun, 2015 3 commits
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for AVC idct functions in new file h264idct_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for AVC intra prediction functions in new file h264pred_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Shivraj Patil authored
s patch adds MSA (MIPS-SIMD-Arch) optimizations for AVC chroma mc functions in new file h264chroma_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 10 Jun, 2015 2 commits
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC intra predition functions in new file hevcpred_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC loop filter and sao functions in new file hevc_lpf_sao_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h In this patch, in comparision with previous patch, duplicated c functions are removed. Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 04 Jun, 2015 1 commit
-
-
Shivraj Patil authored
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC idct functions in new file hevc_idct_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by:
Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-