1. 19 Feb, 2019 14 commits
    • Martin Storsjö's avatar
      arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 · cef914e0
      Martin Storsjö authored
      This makes it similar to put_epel16_v6, and gives a 10-25%
      speedup of this function.
      
      Before:                   Cortex A7       A8       A9      A53     A72
      vp8_put_epel16_h6v6_neon:    3058.0   2218.5   2459.8   2183.0  1572.2
      After:
      vp8_put_epel16_h6v6_neon:    2670.8   1934.2   2244.4   1729.4  1503.9
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      cef914e0
    • Martin Storsjö's avatar
      aarch64: vp8: Port bilin functions from arm version · e39a9212
      Martin Storsjö authored
                            Cortex A53     A72     A73
      vp8_put_bilin4_h_c:        303.8   102.2   161.8
      vp8_put_bilin4_h_neon:     100.0    40.9    41.2
      vp8_put_bilin4_hv_c:       322.8   201.0   305.9
      vp8_put_bilin4_hv_neon:    156.8    72.6    77.0
      vp8_put_bilin4_v_c:        304.7   101.7   166.5
      vp8_put_bilin4_v_neon:      82.7    41.2    33.0
      vp8_put_bilin8_h_c:       1192.7   352.5   623.8
      vp8_put_bilin8_h_neon:     213.5    70.2    87.8
      vp8_put_bilin8_hv_c:      1098.6   769.2  1041.9
      vp8_put_bilin8_hv_neon:    324.0   123.5   146.0
      vp8_put_bilin8_v_c:       1193.9   350.4   617.7
      vp8_put_bilin8_v_neon:     183.9    60.7    64.7
      vp8_put_bilin16_h_c:      2353.1   671.2  1223.3
      vp8_put_bilin16_h_neon:    261.9   140.7   145.0
      vp8_put_bilin16_hv_c:     2453.2  1470.9  2355.2
      vp8_put_bilin16_hv_neon:   383.9   196.0   217.0
      vp8_put_bilin16_v_c:      2349.3   669.8  1251.2
      vp8_put_bilin16_v_neon:    202.9   110.7    96.2
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      e39a9212
    • Martin Storsjö's avatar
      aarch64: vp8: Port epel4 functions from arm version · 58d15492
      Martin Storsjö authored
                            Cortex A53    A72    A73
      vp8_put_epel4_h4_c:        631.4  291.7  367.8
      vp8_put_epel4_h4_neon:     241.0  131.0  155.7
      vp8_put_epel4_h4v4_c:      967.5  529.3  667.7
      vp8_put_epel4_h4v4_neon:   429.3  241.8  279.7
      vp8_put_epel4_h4v6_c:     1374.7  657.5  864.5
      vp8_put_epel4_h4v6_neon:   515.5  295.5  334.7
      vp8_put_epel4_h6_c:        851.0  421.0  486.0
      vp8_put_epel4_h6_neon:     321.5  195.0  217.7
      vp8_put_epel4_h6v4_c:     1111.3  621.1  781.2
      vp8_put_epel4_h6v4_neon:   539.2  328.0  365.3
      vp8_put_epel4_h6v6_c:     1561.3  763.3  999.7
      vp8_put_epel4_h6v6_neon:   645.5  401.0  434.7
      vp8_put_epel4_v4_c:        663.8  298.3  357.0
      vp8_put_epel4_v4_neon:     116.0   81.5   72.5
      vp8_put_epel4_v6_c:        870.5  437.0  507.4
      vp8_put_epel4_v6_neon:     147.7  108.8   92.0
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      58d15492
    • Martin Storsjö's avatar
      aarch64: vp8: Port missing epel8 functions from arm version · cc7ba00c
      Martin Storsjö authored
                            Cortex A53     A72     A73
      vp8_put_epel8_h4_c:       2594.8  1159.6  1374.8
      vp8_put_epel8_h4_neon:     506.4   244.2   314.0
      vp8_put_epel8_h6_c:       3445.8  1677.1  1811.3
      vp8_put_epel8_h6_neon:     634.4   371.7   433.0
      vp8_put_epel8_v4_c:       2614.0  1174.8  1378.0
      vp8_put_epel8_v4_neon:     321.0   221.7   235.8
      vp8_put_epel8_v6_c:       3635.5  1703.0  2079.2
      vp8_put_epel8_v6_neon:     416.9   317.0   295.5
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      cc7ba00c
    • Martin Storsjö's avatar
      aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version · 52c9b0a6
      Martin Storsjö authored
                           Cortex A53    A72    A73
      vp8_luma_dc_wht_c:        115.7   75.7   90.7
      vp8_luma_dc_wht_neon:      60.7   41.2   45.7
      vp8_idct_dc_add4uv_c:     376.1  262.9  282.5
      vp8_idct_dc_add4uv_neon:   52.0   29.0   37.0
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      52c9b0a6
    • Martin Storsjö's avatar
      aarch64: vp8: Fix a typo in a comment · c513fcd7
      Martin Storsjö authored
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      c513fcd7
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      aarch64: vp8: Move the vp8dsp makefile entries to the right places · b4b27dce
      Martin Storsjö authored
      Even if NEON would be disabled, the init functions should be built
      as they are called as long as ARCH_AARCH64 is set.
      
      These functions are part of a generic DSP subsytem, not tied directly
      to one decoder. (They should be built if the vp7 decoder is enabled,
      even if the vp8 decoder is disabled.)
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      b4b27dce
    • Martin Storsjö's avatar
      aarch64: vp8: Remove superfluous includes · ad32f7b1
      Martin Storsjö authored
      This fixes building with MSVC, which lacks unistd.h.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      ad32f7b1
    • Martin Storsjö's avatar
      aarch64: vp8: Use the proper aarch64 form for conditional branches · 85bfaa49
      Martin Storsjö authored
      The previous form also does seem to assemble on current tools,
      but I think it might fail on some older aarch64 tools.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      85bfaa49
    • Martin Storsjö's avatar
      2eeac799
    • Martin Storsjö's avatar
      aarch64: vp8: Fix assembling with clang · 26d7af4c
      Martin Storsjö authored
      This also partially fixes assembling with MS armasm64 (via
      gas-preprocessor).
      
      The movrel macro invocations need to pass the offset via a separate
      parameter. Mach-o and COFF relocations don't allow a negative
      offset to a symbol, which is handled properly if the offset is passed
      via the parameter. If no offset parameter is given, the macro
      evaluates to something like "adrp x17, subpel_filters-16+(0)", which
      older clang versions also fail to parse (the older clang versions
      only support one single offset term, although it can be a parenthesis.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      26d7af4c
    • Magnus Röös's avatar
      libavcodec: vp8 neon optimizations for aarch64 · 0801853e
      Magnus Röös authored
      Partial port of the ARM Neon for aarch64.
      
      Benchmarks from fate:
      
      benchmarking with Linux Perf Monitoring API
      nop: 58.6
      checkasm: using random seed 1760970128
      NEON:
       - vp8dsp.idct       [OK]
       - vp8dsp.mc         [OK]
       - vp8dsp.loopfilter [OK]
      checkasm: all 21 tests passed
      vp8_idct_add_c: 201.6
      vp8_idct_add_neon: 83.1
      vp8_idct_dc_add_c: 107.6
      vp8_idct_dc_add_neon: 33.8
      vp8_idct_dc_add4y_c: 426.4
      vp8_idct_dc_add4y_neon: 59.4
      vp8_loop_filter8uv_h_c: 688.1
      vp8_loop_filter8uv_h_neon: 216.3
      vp8_loop_filter8uv_inner_h_c: 649.3
      vp8_loop_filter8uv_inner_h_neon: 195.3
      vp8_loop_filter8uv_inner_v_c: 544.8
      vp8_loop_filter8uv_inner_v_neon: 131.3
      vp8_loop_filter8uv_v_c: 706.1
      vp8_loop_filter8uv_v_neon: 141.1
      vp8_loop_filter16y_h_c: 668.8
      vp8_loop_filter16y_h_neon: 242.8
      vp8_loop_filter16y_inner_h_c: 647.3
      vp8_loop_filter16y_inner_h_neon: 224.6
      vp8_loop_filter16y_inner_v_c: 647.8
      vp8_loop_filter16y_inner_v_neon: 128.8
      vp8_loop_filter16y_v_c: 721.8
      vp8_loop_filter16y_v_neon: 154.3
      vp8_loop_filter_simple_h_c: 387.8
      vp8_loop_filter_simple_h_neon: 187.6
      vp8_loop_filter_simple_v_c: 384.1
      vp8_loop_filter_simple_v_neon: 78.6
      vp8_put_epel8_h4v4_c: 3971.1
      vp8_put_epel8_h4v4_neon: 855.1
      vp8_put_epel8_h4v6_c: 5060.1
      vp8_put_epel8_h4v6_neon: 989.6
      vp8_put_epel8_h6v4_c: 4320.8
      vp8_put_epel8_h6v4_neon: 1007.3
      vp8_put_epel8_h6v6_c: 5449.3
      vp8_put_epel8_h6v6_neon: 1158.1
      vp8_put_epel16_h6_c: 6683.8
      vp8_put_epel16_h6_neon: 831.8
      vp8_put_epel16_h6v6_c: 11110.8
      vp8_put_epel16_h6v6_neon: 2214.8
      vp8_put_epel16_v6_c: 7024.8
      vp8_put_epel16_v6_neon: 799.6
      vp8_put_pixels8_c: 112.8
      vp8_put_pixels8_neon: 78.1
      vp8_put_pixels16_c: 131.3
      vp8_put_pixels16_neon: 129.8
      
      This contains a fix to include guards by Carl Eugen Hoyos.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      0801853e
    • Luca Barbato's avatar
      Unbreak travis on macos · 899ee030
      Luca Barbato authored
      899ee030
  2. 16 Feb, 2019 11 commits
  3. 12 Feb, 2019 1 commit
    • Sven Dueking's avatar
      srt: Set srto_sender flag to sender srt socket · 90b15f60
      Sven Dueking authored
      SRT API Documentation:
      This flag is superfluous if both parties are at least version 1.3.0
      (this shall be enforced by setting this value to SRTO_MINVERSION if
      you expect that it be true) and therefore support HSv5 handshake,
      where the SRT extended handshake is done with the overall handshake
      process.
      
      This flag is however obligatory if at least one party may be using
      SRT below version 1.3.0 and does not support HSv5.
      90b15f60
  4. 27 Jan, 2019 1 commit
  5. 26 Jan, 2019 5 commits
    • Martin Storsjö's avatar
      libopenh264dec: Use a newer decoding entry point function · eec93e57
      Martin Storsjö authored
      The "new" entry point actually has existed since OpenH264 1.4 in
      2015 and is the the recommended decoding entry point.
      
      The name of this function, DecodeFrameNoDelay, is rather backwards
      considering that it doesn't return the latest decoded frame immediately,
      but actually does proper delaying and reordering of frames.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      eec93e57
    • Janne Grunau's avatar
      h264/aarch64: add intra loop filter neon asm · 28a8b541
      Janne Grunau authored
      Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
      (x264 uses nv12 chroma) and optimized.
      
      Cycle count for checkasm --bench on a Snapdragon 820e:
      h264_h_loop_filter_luma_intra_8bpp_c: 60.0
      h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
      h264_v_loop_filter_luma_intra_8bpp_c: 148.3
      h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
      h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
      h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
      h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
      h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
      h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
      h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
      28a8b541
    • Janne Grunau's avatar
      h264/aarch64: optimize neon loop filter · 846c3d6a
      Janne Grunau authored
      Exit as soon as possible if no filtering will be done.
      
      Improves the checkasm --bench cycle count on a Snapdragon 820e:
      h264_h_loop_filter_luma_8bpp_c:      72.4 ->  72.5
      h264_h_loop_filter_luma_8bpp_neon:   97.1 ->  56.3
      h264_v_loop_filter_luma_8bpp_c:     174.0 -> 173.5
      h264_v_loop_filter_luma_8bpp_neon:   62.9 ->  60.9
      h264_h_loop_filter_chroma_8bpp_c:    30.2 ->  30.3
      h264_h_loop_filter_chroma_8bpp_neon: 51.6 ->  25.7
      h264_v_loop_filter_chroma_8bpp_c:    57.3 ->  57.3
      h264_v_loop_filter_chroma_8bpp_neon: 28.0 ->  24.0
      846c3d6a
    • Janne Grunau's avatar
      checkasm/h264: add loop filter tests · d7f4f5c4
      Janne Grunau authored
      d7f4f5c4
    • Janne Grunau's avatar
      bb515e3a
  6. 25 Jan, 2019 1 commit
    • Martin Storsjö's avatar
      arm: Create proper .rdata sections for COFF · 41cf3e3b
      Martin Storsjö authored
      As .rodata isn't one of the default created sections for COFF, it was
      created as a read-write data section. By using the default .rdata
      section name for COFF, it automatically becomes a read-only data section.
      The existing ".section .rodata" works as intended for ELF though.
      
      This is based on an original patch and diagnose by Tom Tan
      <Tom.Tan@microsoft.com>.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      41cf3e3b
  7. 23 Jan, 2019 1 commit
  8. 17 Jan, 2019 1 commit
  9. 12 Dec, 2018 1 commit
    • James Almer's avatar
      libdav1d: update API usage to the first stable release · 70ab2778
      James Almer authored
      The color fields were moved to another struct, and a way to propagate
      timestamps and other input metadata was introduced, so the packet
      fifo can be removed.
      
      Add support for 12bit streams, an option to disable film grain, and
      read the profile from the sequence header referenced by the ouput
      picture instead of guessing based on output pix_fmt.
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      70ab2778
  10. 15 Nov, 2018 1 commit
  11. 13 Nov, 2018 1 commit
    • Linjie Fu's avatar
      qsvenc: Add VDENC support for H264 and HEVC · e716323f
      Linjie Fu authored
      Add VDENC(lowpower mode) support for QSV h264 and HEVC
      
      It's an experimental function(like lowpower in vaapi) with
      some limitations:
      - CBR/VBR require HuC which should be explicitly loaded via i915
      module parameter(i915.enable_guc=2 for linux kerner version >= 4.16)
      - HEVC VDENC was supported >= ICE LAKE
      
      use option "-low_power 1" to enable VDENC.
      Signed-off-by: 's avatarLinjie Fu <linjie.fu@intel.com>
      e716323f
  12. 06 Nov, 2018 2 commits