- 13 Oct, 2015 22 commits
-
-
Ronald S. Bultje authored
These aren't quite as helpful as the ones in 8bpp, since over there, we can use pmulhrsw, but here the coefficients have too many bits to be able to take advantage of pmulhrsw. However, we can still skip cols for which all coefs are 0, and instead just zero the input data for the row itx. This helps a few % on overall decoding speed.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
The trouble with this function is that intermediates overflow 31+sign bits, so I've added some helpers (that will also be used in 10/12bpp 8x8, 16x16 and 32x32) to make that easier, basically emulating a half- assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees potential in adding ssse3, I'd love to hear it.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Christophe Gisquet authored
In particular for 10 and 12 bits. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
This makes it easier to see where a failure happens Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Christophe Gisquet authored
On 12 frames of a 444p 12 bits DNxHR sequence, _put function: C: 78902 decicycles in idct, 262071 runs, 73 skips avx: 32478 decicycles in idct, 262045 runs, 99 skips Difference between the 2: stddev: 0.39 PSNR:104.47 MAXDIFF: 2 This is unavoidable and due to the scale factors used in the x86 version, which cannot match the C ones. In addition, the trick of adding an initial bias to the input of a pass can overflow, as the input coefficients are already 15bits, which is the maximum this function can handle. Overall, however, the omse on 12 bits samples goes from 0.16916 to 0.16883. Reducing rowshift by 1 improves to 0.0908, but causes overflows. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Christophe Gisquet authored
Modeled from the prores version. Clips to [0;1023] and is bitexact. Bitexactness requires to add offsets in different places compared to prores or C, and makes the function approximately 2% slower. For 16 frames of a DNxHD 4:2:2 10bits test sequence: C: 60861 decicycles in idct, 1048205 runs, 371 skips sse2: 27567 decicycles in idct, 1048216 runs, 360 skips avx: 26272 decicycles in idct, 1048171 runs, 405 skips The add version is not implemented, so the corresponding dsp function is set to NULL to make it clear in a code executing it. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Christophe Gisquet authored
When the input of a pass has 15 or 16 bits of precision (in particular the column pass), the addition of a bias to W4 may lead to overflows in the input to pmaddwd. This requires postponing the adding of the bias to after the first butterfly. To do so, the fact that m15, unused although zeroed, is exploited. In case the pass is safe, an address can be directly used, and the number of xmm regs can be decreased. Otherwise, the 32bits bias is loaded into it. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Ganesh Ajjanagadde authored
-
Ganesh Ajjanagadde authored
-
Ganesh Ajjanagadde authored
-
Christophe Gisquet authored
It was useful to (accidentally?) spot an overflow in the column pass of the x86 simple_idct10 implementation. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Christophe Gisquet authored
omse goes from 0.03060703 (which fails for dct-test) to 0.01663750. This also actually improve the error of decoding the sample generated by fate-vsynth3-dnxhd1080i-10bit using simple_idct10 to FAANI, which goes (when resampled to yuv422p) from: stddev: 0.06 PSNR: 72.28 MAXDIFF: 1 to identical. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 12 Oct, 2015 18 commits
-
-
Christophe Gisquet authored
This should be reused for a generic simple_idct10 function. Requires a bit of trickery to declare common constants in C. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Rostislav Pehlivanov authored
To keep it similar to the other functions which are all named *_pred.
-
Rostislav Pehlivanov authored
Left out of last commit which added support for eight channel audio.
-
Christophe Gisquet authored
Dequant or encoding were trying to reverse a scan that hadn't been applied... Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Ricardo Constantino authored
Signed-off-by: Ricardo Constantino <wiiaboo@gmail.com>
-
Ricardo Constantino authored
Includes escapes that should now be supported and a few features not yet fully supported, like comments, regions, classes, ruby, and lang. All were tested with https://quuz.org/webvtt/ for validation, except regions because the validator doesn't support them yet, and I couldn't find any other way to validate WebVTT. Signed-off-by: Ricardo Constantino <wiiaboo@gmail.com>
-
Ricardo Constantino authored
Bare ampersand characters are still accepted, even though out-of-spec. Also fixes adjacent tags not being parsed. Fixes trac #4915 Signed-off-by: Ricardo Constantino <wiiaboo@gmail.com>
-
Lou Logan authored
Signed-off-by: Lou Logan <lou@lrcd.com> Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Derek Buitenhuis authored
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Rostislav Pehlivanov authored
Fails on SunOS and old GCC (<=4.6 is ancient) versions.
-
Rostislav Pehlivanov authored
-
Rostislav Pehlivanov authored
This commit adds the ability for a profile to set the default options, as well as for the user to override such options by simply stating them in the command line while still keeping the same profile, as long as those options are still permitted by the profile. Example: setting the profile to aac_low (the default) will turn PNS and IS on. They can be disabled by -aac_pns 0 and -aac_is 0, respectively. Turning on -aac_pred 1 will cause the profile to be elevated to aac_main, as long as no options forbidding aac_main have been entered (like AAC-LTP, which will be pushed soon). A useful feature is that by setting the profile to mpeg2_aac_low, all MPEG4 features will be disabled and if the user tries to enable them then the program will exit with an error. This profile is signalled with the same bitstream as aac_low (MPEG4) but some devices and decoders will fail if any MPEG4 features have been enabled.
-
Alex Agranovsky authored
Signed-off-by: Alex Agranovsky <alex@sighthound.com>
-
Bela Bodecs authored
It makes possible to put multiple stream specifier into the select option separated by comma. eg. select=\'a:0,v\' Signed-off-by: Bela Bodecs <bodecsb@vivanet.hu> Signed-off-by: Nicolas George <george@nsup.org>
-
Rostislav Pehlivanov authored
This commit implements support for 7.1 channel audio. There's no more predefined bitstream channel mappings so going beyond 8 channels (and 7 channels exactly) will require programmable channel elements, which is already underway.
-
Rostislav Pehlivanov authored
Two guesses as to which file was used as boilerplate.
-
Claudio Freire authored
The bulk of calls to quantize_band_cost are replaced by a call to a version that memoizes, greatly improving performance, since during coefficient search there is a great deal of repeat work. Memoization cannot always be applied, so do this in a different function, and leave the original as-is.
-