Commits · 26ee83acc4ebd765529b666c7f050243b7677d76 · Linshizhi / ffmpeg.wasm-core

11 Mar, 2017 38 commits

aarch64: vp9itxfm: Reorder iadst16 coeffs · 26ee83ac

Martin Storsjö authored Dec 31, 2016

This matches the order they are in the 16 bpp version.

There they are in this order, to make sure we access them in the
same order they are declared, easing loading only half of the
coefficients at a time.

This makes the 8 bpp version match the 16 bpp version better.

This is cherrypicked from libav commit
b8f66c08.
Signed-off-by: Martin Storsjö <martin@martin.st>

26ee83ac

arm: vp9itxfm: Reorder iadst16 coeffs · b2e20d89

Martin Storsjö authored Dec 31, 2016

This matches the order they are in the 16 bpp version.

There they are in this order, to make sure we access them in the
same order they are declared, easing loading only half of the
coefficients at a time.

This makes the 8 bpp version match the 16 bpp version better.

This is cherrypicked from libav commit
08074c09.
Signed-off-by: Martin Storsjö <martin@martin.st>

b2e20d89

aarch64: vp9itxfm: Reorder the idct coefficients for better pairing · f9522730

Martin Storsjö authored Dec 31, 2016

All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.

This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp version.

This is cherrypicked from libav commit
09eb88a1.
Signed-off-by: Martin Storsjö <martin@martin.st>

f9522730

arm: vp9itxfm: Reorder the idct coefficients for better pairing · 4f693b56

Martin Storsjö authored Dec 31, 2016

All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.

This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp version.

This is cherrypicked from libav commit
de06bdfe.
Signed-off-by: Martin Storsjö <martin@martin.st>

4f693b56

aarch64: vp9itxfm: Avoid reloading the idct32 coefficients · 2905657b

Martin Storsjö authored Jan 02, 2017

The idct32x32 function actually pushed d8-d15 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

After this, we still can skip pushing d12-d15.

Before:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
After:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3

This is cherrypicked from libav commit
65aa002d.
Signed-off-by: Martin Storsjö <martin@martin.st>

2905657b

arm: vp9itxfm: Avoid reloading the idct32 coefficients · 600f4c9b

Martin Storsjö authored Jan 02, 2017

The idct32x32 function actually pushed q4-q7 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

Since the idct16 core transform avoids clobbering q4-q7 (but clobbers
q2-q3 instead, to avoid needing to back up and restore q4-q7 at all
in the idct16 function), and the lanewise vmul needs a register in
the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5
while doing idct16.

While keeping these coefficients in registers, we still can skip pushing
q7.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_32x32_sub32_add_neon:  18553.8  17182.7  14303.3  12089.7
After:
vp9_inv_dct_dct_32x32_sub32_add_neon:  18470.3  16717.7  14173.6  11860.8

This is cherrypicked from libav commit
402546a1.
Signed-off-by: Martin Storsjö <martin@martin.st>

600f4c9b

arm: vp9lpf: Implement the mix2_44 function with one single filter pass · a88db8b9

Martin Storsjö authored Jan 14, 2017

For this case, with 8 inputs but only changing 4 of them, we can fit
all 16 input pixels into a q register, and still have enough temporary
registers for doing the loop filter.

The wd=8 filters would require too many temporary registers for
processing all 16 pixels at once though.

Before:                          Cortex A7      A8     A9     A53
vp9_loop_filter_mix2_v_44_16_neon:   289.7   256.2  237.5   181.2
After:
vp9_loop_filter_mix2_v_44_16_neon:   221.2   150.5  177.7   138.0

This is cherrypicked from libav commit
575e31e9.
Signed-off-by: Martin Storsjö <martin@martin.st>

a88db8b9

aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1 · f32690a2

Martin Storsjö authored Feb 23, 2017

This is one cycle faster in total, and three instructions fewer.

Before:
vp9_loop_filter_mix2_v_44_16_neon: 123.2
After:
vp9_loop_filter_mix2_v_44_16_neon: 122.2

This is cherrypicked from libav commit
3bf9c483.
Signed-off-by: Martin Storsjö <martin@martin.st>

f32690a2

arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit · 3fbbad29

Martin Storsjö authored Jan 14, 2017

The theoretical maximum value of E is 193, so we can just
saturate the addition to 255.

Before:                     Cortex A7      A8      A9     A53  A53/AArch64
vp9_loop_filter_v_4_8_neon:     143.0   127.7   114.8    88.0         87.7
vp9_loop_filter_v_8_8_neon:     241.0   197.2   173.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    497.0   419.5   379.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   965.2   818.7   731.4   579.0        452.0
After:
vp9_loop_filter_v_4_8_neon:     136.0   125.7   112.6    84.0         83.0
vp9_loop_filter_v_8_8_neon:     234.0   195.5   171.5   136.0        133.7
vp9_loop_filter_v_16_8_neon:    490.0   417.5   377.7   289.0        271.0
vp9_loop_filter_v_16_16_neon:   951.2   814.7   732.3   571.0        446.7

This is cherrypicked from libav commit
c582cb85.
Signed-off-by: Martin Storsjö <martin@martin.st>

3fbbad29

aarch64: Add parentheses around the offset parameter in movrel · dda45c08

Martin Storsjö authored Feb 16, 2017

This fixes building with clang for linux with PIC enabled.

This is cherrypicked from libav commit
8847eeaa.
Signed-off-by: Martin Storsjö <martin@martin.st>

dda45c08

aarch64: vp9lpf: Fix broken indentation/vertical alignment · c8d6eec8
Martin Storsjö authored Jan 11, 2017
```
This is cherrypicked from libav commit
07b5136c.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
c8d6eec8

aarch64: vp9lpf: Interleave the start of flat8in into the calculation above · 9f3a8863

Martin Storsjö authored Jan 10, 2017

This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.

This is cherrypicked from libav commit
b0806088.
Signed-off-by: Martin Storsjö <martin@martin.st>

9f3a8863

arm: vp9lpf: Interleave the start of flat8in into the calculation above · 83399cf5

Martin Storsjö authored Jan 10, 2017

This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.

This is cherrypicked from libav commit
e18c3900.
Signed-off-by: Martin Storsjö <martin@martin.st>

83399cf5

arm: vp9lpf: Use orrs instead of orr+cmp · 92ab8374

Martin Storsjö authored Jan 13, 2017

This is cherrypicked from libav commit
435cd7bc.
Signed-off-by: Martin Storsjö <martin@martin.st>

92ab8374

arm/aarch64: vp9lpf: Calculate !hev directly · f0ecbb13

Martin Storsjö authored Jan 12, 2017

Previously we first calculated hev, and then negated it.

Since we were able to schedule the negation in the middle
of another calculation, we don't see any gain in all cases.

Before:                     Cortex A7      A8      A9     A53  A53/AArch64
vp9_loop_filter_v_4_8_neon:     147.0   129.0   115.8    89.0         88.7
vp9_loop_filter_v_8_8_neon:     242.0   198.5   174.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    500.0   419.5   382.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   971.2   825.5   731.5   579.0        453.0
After:
vp9_loop_filter_v_4_8_neon:     143.0   127.7   114.8    88.0         87.7
vp9_loop_filter_v_8_8_neon:     241.0   197.2   173.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    497.0   419.5   379.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   965.2   818.7   731.4   579.0        452.0

This is cherrypicked from libav commit
e1f9de86.
Signed-off-by: Martin Storsjö <martin@martin.st>

f0ecbb13

aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling · 148cc0bb

Martin Storsjö authored Jan 04, 2017

This work is sponsored by, and copyright, Google.

Before:                           Cortex A53
vp9_inv_dct_dct_16x16_sub1_add_neon:   235.3
vp9_inv_dct_dct_32x32_sub1_add_neon:   555.1
After:
vp9_inv_dct_dct_16x16_sub1_add_neon:   180.2
vp9_inv_dct_dct_32x32_sub1_add_neon:   475.3

This is cherrypicked from libav commit
3fcf788f.
Signed-off-by: Martin Storsjö <martin@martin.st>

148cc0bb

arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling · 758302e4

Martin Storsjö authored Jan 04, 2017

This work is sponsored by, and copyright, Google.

Before:                            Cortex A7      A8      A9     A53
vp9_inv_dct_dct_16x16_sub1_add_neon:   273.0   189.5   211.7   235.8
vp9_inv_dct_dct_32x32_sub1_add_neon:   752.0   459.2   862.2   553.9
After:
vp9_inv_dct_dct_16x16_sub1_add_neon:   226.5   145.0   225.1   171.8
vp9_inv_dct_dct_32x32_sub1_add_neon:   721.2   415.7   727.6   475.0

This is cherrypicked from libav commit
a76bf8cf.
Signed-off-by: Martin Storsjö <martin@martin.st>

758302e4

aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter · 045e33ae

Martin Storsjö authored Dec 17, 2016

No measured speedup on a Cortex A53, but other cores might benefit.

This is cherrypicked from libav commit
388e0d25.
Signed-off-by: Martin Storsjö <martin@martin.st>

045e33ae

arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter · bff07715

Martin Storsjö authored Dec 17, 2016

Before:                    Cortex A7      A8     A9     A53
vp9_put_8tap_smooth_4h_neon:   378.1   273.2  340.7   229.5
After:
vp9_put_8tap_smooth_4h_neon:   352.1   222.2  290.5   229.5

This is cherrypicked from libav commit
fea92a4b.
Signed-off-by: Martin Storsjö <martin@martin.st>

bff07715

aarch64: vp9mc: Simplify the extmla macro parameters · ac6cb8ae

Martin Storsjö authored Dec 16, 2016

Fold the field lengths into the macro.

This makes the macro invocations much more readable, when the
lines are shorter.

This also makes it easier to use only half the registers within
the macro.

This is cherrypicked from libav commit
5e0c2158.
Signed-off-by: Martin Storsjö <martin@martin.st>

ac6cb8ae

aarch64: vp9itxfm: Fix incorrect vertical alignment · 16ef0007
Martin Storsjö authored Jan 03, 2017
```
This is cherrypicked from libav commit
0c0b87f1.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
16ef0007
aarch64: vp9itxfm: Update a comment to refer to a register with a different name · d0fbf7f3
Martin Storsjö authored Jan 03, 2017
```
This is cherrypicked from libav commit
8476eb0d.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
d0fbf7f3
aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability · 6752318c
Martin Storsjö authored Jan 03, 2017
```
This is cherrypicked from libav commit
3dd78272.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
6752318c

aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible · 19a0f952

Martin Storsjö authored Jan 03, 2017

The ld1r is a leftover from the arm version, where this trick is
beneficial on some cores.

Use a single-lane load where we don't need the semantics of ld1r.

This is cherrypicked from libav commit
ed8d2933.
Signed-off-by: Martin Storsjö <martin@martin.st>

19a0f952

aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function · 3006e525
Martin Storsjö authored Jan 03, 2017
```
This is cherrypicked from libav commit
4da4b2b8.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
3006e525
arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function · 1d8ab576
Martin Storsjö authored Jan 03, 2017
```
This is cherrypicked from libav commit
3933b86b.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
1d8ab576

aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 · 9532a7d4

Martin Storsjö authored Nov 22, 2016

This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14740 bytes to 24292 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub8_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1387.4
vp9_inv_dct_dct_16x16_sub16_add_neon:   1387.6
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5198.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    5198.6
vp9_inv_dct_dct_32x32_sub8_add_neon:    5196.3
vp9_inv_dct_dct_32x32_sub12_add_neon:   6183.4
vp9_inv_dct_dct_32x32_sub16_add_neon:   6174.3
vp9_inv_dct_dct_32x32_sub20_add_neon:   7151.4
vp9_inv_dct_dct_32x32_sub24_add_neon:   7145.3
vp9_inv_dct_dct_32x32_sub28_add_neon:   8119.3
vp9_inv_dct_dct_32x32_sub32_add_neon:   8118.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:     640.8
vp9_inv_dct_dct_16x16_sub4_add_neon:     639.0
vp9_inv_dct_dct_16x16_sub8_add_neon:     842.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1388.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   1389.3
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    3685.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    3685.1
vp9_inv_dct_dct_32x32_sub8_add_neon:    3684.4
vp9_inv_dct_dct_32x32_sub12_add_neon:   5312.2
vp9_inv_dct_dct_32x32_sub16_add_neon:   5315.4
vp9_inv_dct_dct_32x32_sub20_add_neon:   7154.9
vp9_inv_dct_dct_32x32_sub24_add_neon:   7154.5
vp9_inv_dct_dct_32x32_sub28_add_neon:   8126.6
vp9_inv_dct_dct_32x32_sub32_add_neon:   8127.2

This is cherrypicked from libav commit
a63da451.
Signed-off-by: Martin Storsjö <martin@martin.st>

9532a7d4

arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible · 82458955

Martin Storsjö authored Nov 22, 2016

This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 12388 bytes to 19784 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before: Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8
vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5
vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5
vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2
vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9
vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9
vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7
vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3
vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0
vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3
vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3
vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2
vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4
vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0
vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8
vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7
vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3
vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1
vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6
vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2
vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7
vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0
vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9
vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1
vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4
vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0

This is cherrypicked from libav commit
5eb5aec4.
Signed-off-by: Martin Storsjö <martin@martin.st>

82458955

aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · a681c793

Martin Storsjö authored Feb 05, 2017

This allows reusing the macro for a separate implementation of the
pass2 function.

This is cherrypicked from libav commit
79d332eb.
Signed-off-by: Martin Storsjö <martin@martin.st>

a681c793

arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · 3bd9b391

Martin Storsjö authored Feb 05, 2017

This allows reusing the macro for a separate implementation of the
pass2 function.

This is cherrypicked from libav commit
47b3c2c1.
Signed-off-by: Martin Storsjö <martin@martin.st>

3bd9b391

aarch64: vp9itxfm: Make the larger core transforms standalone functions · dc47bf38

Martin Storsjö authored Nov 23, 2016

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
19496 to 14740 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub32_add_neon:   8095.7

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   1390.1
vp9_inv_dct_dct_32x32_sub4_add_neon:    5199.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8125.8

This is cherrypicked from libav commit
11547601.
Signed-off-by: Martin Storsjö <martin@martin.st>

dc47bf38

arm: vp9itxfm: Make the larger core transforms standalone functions · f8fcee0d

Martin Storsjö authored Nov 23, 2016

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
15324 to 12388 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub4_add_neon:    2063.4   1516.0   1719.5   1245.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3279.3   2454.5   2525.2   1982.3
vp9_inv_dct_dct_32x32_sub4_add_neon:   10750.0   7955.4   8525.6   6754.2
vp9_inv_dct_dct_32x32_sub32_add_neon:  18574.0  17108.4  14216.7  12010.2

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    2060.8   1608.5   1735.7   1262.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.2   2443.5   2546.1   1999.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10682.0   8043.8   8581.3   6810.1
vp9_inv_dct_dct_32x32_sub32_add_neon:  18522.4  17277.4  14286.7  12087.9

This is cherrypicked from libav commit
0331c3f5.
Signed-off-by: Martin Storsjö <martin@martin.st>

f8fcee0d

aarch64: vp9itxfm: Restructure the idct32 store macros · 52c7366c

Martin Storsjö authored Dec 01, 2016

This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.

This is also arguably more readable.

This is cherrypicked from libav commit
58d87e0f.
Signed-off-by: Martin Storsjö <martin@martin.st>

52c7366c

arm: vp9itxfm: Avoid .irp when it doesn't save any lines · 31e41350

Martin Storsjö authored Feb 04, 2017

This makes it more readable.

This is cherrypicked from libav commit
3bc5b28d.
Signed-off-by: Martin Storsjö <martin@martin.st>

31e41350

libavfilter/avf_showwaves: make sqrt and cbrt scale option values available to showwavespic by name · 114bbb0b

Moritz Barsnick authored Mar 09, 2017

The 'sqrt' and 'cbrt' scalers were added in commit
80262d8c, but their symbolic option values
only made available to the showwaves filter, not showwavespic, despite
the scalers working properly by their numerical option values.
Signed-off-by: Moritz Barsnick <barsnick@gmx.net>

114bbb0b

ffprobe: add AVCodecContext help message into ffprobe · 51e35019

Steven Liu authored Mar 11, 2017

because the ffprobe can use AVCodecContext parameters
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>

51e35019

avcodec/vp56: Reset have_undamaged_frame on resolution changes · 6e913f21

Michael Niedermayer authored Mar 09, 2017

Fixes: timeout in 758/clusterfuzz-testcase-4720832028868608

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

6e913f21

avcodec/h264_ps: Forward errors from decode_scaling_list() · dc0b9b21
Michael Niedermayer authored Mar 09, 2017
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
dc0b9b21

10 Mar, 2017 2 commits

libavcodec/libopenjpegenc: enable lossless option, remove layer option, and improve defaults · 195784ec

Aaron Boxer authored Mar 10, 2017

1. limit to single layer, as there is no current support for setting distortion/quality of multiple layers
2. encoder mode should be kept at default setting (0)
3. remove fixed_alloc parameter from context : seldom if ever used, and no way of properly configuring at the moment
4. add irreversible setting, to allow for lossless encoding. Set to OpenJPEG default (enabled)
5. set numresolution max to 33, which is the maximum number of allowed resolutions according the J2K spec
Signed-off-by: Michael Bradshaw <mjbshaw@google.com>

195784ec

avcodec/vp8: Fix hang with slice threads · 9bbc73ae

Thomas Guilbert authored Mar 09, 2017

Fixes: 447860.webm
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

9bbc73ae