- 20 Mar, 2019 2 commits
-
-
Lauri Kasanen authored
-
Lauri Kasanen authored
-
- 05 Feb, 2019 1 commit
-
-
Lauri Kasanen authored
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \ -s 1920x1728 -f null -vframes 100 -v error -nostats - 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. Fate passes, each format tested with an image to video conversion. Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out of the 16-bit function. This includes the vec_mulo/mule functions too, not just vmuluwm. With TIMER_REPORT skips disabled: yuv420p9le 12412 UNITS in planarX, 131072 runs, 0 skips 73136 UNITS in planarX, 131072 runs, 0 skips yuv420p9be 12481 UNITS in planarX, 131072 runs, 0 skips 73410 UNITS in planarX, 131072 runs, 0 skips yuv420p10le 12322 UNITS in planarX, 131072 runs, 0 skips 72546 UNITS in planarX, 131072 runs, 0 skips yuv420p10be 12291 UNITS in planarX, 131072 runs, 0 skips 72935 UNITS in planarX, 131072 runs, 0 skips yuv420p12le 12316 UNITS in planarX, 131072 runs, 0 skips 72708 UNITS in planarX, 131072 runs, 0 skips yuv420p12be 12319 UNITS in planarX, 131072 runs, 0 skips 72577 UNITS in planarX, 131072 runs, 0 skips yuv420p14le 12259 UNITS in planarX, 131072 runs, 0 skips 72516 UNITS in planarX, 131072 runs, 0 skips yuv420p14be 12440 UNITS in planarX, 131072 runs, 0 skips 72962 UNITS in planarX, 131072 runs, 0 skips yuv420p16le 10548 UNITS in planarX, 131072 runs, 0 skips 73429 UNITS in planarX, 131072 runs, 0 skips yuv420p16be 10634 UNITS in planarX, 131072 runs, 0 skips 150959 UNITS in planarX, 131072 runs, 0 skips Signed-off-by:
Lauri Kasanen <cand@gmx.com>
-
- 04 Dec, 2018 1 commit
-
-
Lauri Kasanen authored
Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied). Signed-off-by:
Lauri Kasanen <cand@gmx.com> Tested-by: Michael Kostylev on BE Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 26 Nov, 2018 1 commit
-
-
Lauri Kasanen authored
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \ -f null -vframes 100 -v error -nostats - 1158 UNITS in planar1, 65528 runs, 8 skips -cpuflags 0 19082 UNITS in planar1, 65533 runs, 3 skips 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version takes as many cycles as the x86 SSE2 version, yikes it's fast. Note that this function uses VSX instructions, but is not marked so. This is because several existing functions also make that mistake. I'll submit a patch moving them once this is reviewed. Signed-off-by:
Lauri Kasanen <cand@gmx.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 14 Aug, 2018 1 commit
-
-
Sergey Lavrushkin authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 09 Nov, 2016 1 commit
-
-
Michael Niedermayer authored
Found-by: Luca Barbato Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 12 Oct, 2016 1 commit
-
-
Michael Niedermayer authored
Implemented for AV_PIX_FMT_GBRP12. Signed-off-by:
Vittorio Giovara <vittorio.giovara@gmail.com>
-
- 27 Sep, 2016 1 commit
-
-
Luca Barbato authored
It is used to select functions that work with 9-15bits.
-
- 31 Mar, 2016 1 commit
-
-
Pedro Arthur authored
Removed previous swscale code under '#ifndef NEW_FILTER' and removed unused fields of SwsContext
-
- 31 May, 2015 1 commit
-
-
Luca Barbato authored
In Little Endian the vec_ld/vec_st operations work as expected only for byte-vectors.
-
- 27 Apr, 2015 1 commit
-
-
Rong Yan authored
swscale/ppc/swscale_altivec.c: POWER LE support in yuv2planeX_8() delete macro GET_VF() it was wrong GCC tool had a bug of PPC intrinsic interpret, which has been fixed in GCC 4.9.1. This bug lead to errors in two of our previous patches. We found this when we update our GCC tools to 4.9.1 and by reading the related info on GCC website. We fix our previous error in two separate commits Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 14 Mar, 2015 1 commit
-
-
Christophe Gisquet authored
The later may yield incorrect code for on-stack variables. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 12 Nov, 2014 1 commit
-
-
Rong Yan authored
libswscale/ppc/swscale_altivec.c : fix hScale_altivec_real() yuv2planeX_16_altivec() yuv2planeX_8() for little endian add marcos GET_LS() GET_VF() LOAD_FILTER() LOAD_L1() GET_VF4() FIRST_LOAD() UPDATE_PTR() LOAD_SRCV() LOAD_SRCV8() GET_VFD() for POWER LE Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 29 Aug, 2013 1 commit
-
-
Diego Biurrun authored
Also give consistent names to init functions.
-
- 08 Oct, 2012 1 commit
-
-
Anton Khirnov authored
-
- 05 Oct, 2012 1 commit
-
-
Mans Rullgard authored
This gets rid of the variable-length scratch buffer by filtering 16 pixels at a time and writing directly to the destination. The extra loads this requires to load the source values are compensated by not doing a round-trip to memory before shifting. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 22 Jul, 2012 1 commit
-
-
Diego Biurrun authored
-
- 04 Jul, 2012 1 commit
-
-
Michael Niedermayer authored
Reviewed-by:
Paul B Mahol <onemda@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 06 Mar, 2012 1 commit
-
-
Ronald S. Bultje authored
Fixes overflows for large image sizes. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org
-
- 21 Feb, 2012 1 commit
-
-
Diego Biurrun authored
-
- 25 Jan, 2012 1 commit
-
-
Diego Biurrun authored
-
- 22 Oct, 2011 2 commits
-
-
Ronald S. Bultje authored
-
Kieran Kunhya authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 25 Sep, 2011 1 commit
-
-
Mans Rullgard authored
Use uintptr_t instead of plain int. Without this change, the comparisons will come out wrong for pointers in certain ranges. Fixes random failures on ppc64. Also fixes some compiler warnings. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 18 Aug, 2011 1 commit
-
-
Ronald S. Bultje authored
This allows using more specific implementations for chroma/luma, e.g. we can make assumptions on filterSize being constant, thus avoiding that test at runtime.
-
- 12 Aug, 2011 2 commits
-
-
Luca Barbato authored
It just does that part in scalar form, I doubt using a vector store over 2 array would speed it up particularly. The function should be written to not use a scratch buffer.
-
Ronald S. Bultje authored
-
- 11 Jul, 2011 1 commit
-
-
Ronald S. Bultje authored
For 9/10bit, it means we don't have to upscale to 16bit before actual scaling or pixel format conversion, and thus a performance gain.
-
- 01 Jul, 2011 1 commit
-
-
Ronald S. Bultje authored
For 9/10bit, it means we don't have to upscale to 16bit before actual scaling or pixel format conversion, and thus a performance gain.
-
- 30 Jun, 2011 1 commit
-
-
Ronald S. Bultje authored
This means that precision is retained when scaling between sample formats with >8 bits per component (48bit RGB, 16bit grayscale, 9/10/16bit YUV).
-
- 29 Jun, 2011 1 commit
-
-
Ronald S. Bultje authored
This means that precision is retained when scaling between sample formats with >8 bits per component (48bit RGB, 16bit grayscale, 9/10/16bit YUV).
-
- 28 Jun, 2011 3 commits
-
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Ronald S. Bultje authored
Remove unused variables "flags" and "dstFormat" in yuv2packed1, merge source rows per plane for yuv2packed[12], and make every source argument int16_t (some where invalidly set to uint16_t). This prevents stack pollution and is part of the Great Evil Plan to simplify swscale.
-
Ronald S. Bultje authored
This will likely lead to a considerable performance boost, since it removes a branch from the inner loop. Part of the Great Evil Plan to simplify swscale.
-
- 26 Jun, 2011 1 commit
-
-
Ronald S. Bultje authored
-
- 07 Jun, 2011 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
Make yuv2yuvX16_c a function pointer for yuv2yuvX(), so that the function pointer becomes bitdepth-independent.
-
- 03 Jun, 2011 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-