1. 15 Oct, 2014 1 commit
  2. 27 Aug, 2014 1 commit
  3. 26 Aug, 2014 1 commit
  4. 04 Aug, 2014 1 commit
    • Ben Avison's avatar
      vc-1: Optimise parser (with special attention to ARM) · 701e8b42
      Ben Avison authored
      The previous implementation of the parser made four passes over each input
      buffer (reduced to two if the container format already guaranteed the input
      buffer corresponded to frames, such as with MKV). But these buffers are
      often 200K in size, certainly enough to flush the data out of L1 cache, and
      for many CPUs, all the way out to main memory. The passes were:
      
      1) locate frame boundaries (not needed for MKV etc)
      2) copy the data into a contiguous block (not needed for MKV etc)
      3) locate the start codes within each frame
      4) unescape the data between start codes
      
      After this, the unescaped data was parsed to extract certain header fields,
      but because the unescape operation was so large, this was usually also
      effectively operating on uncached memory. Most of the unescaped data was
      simply thrown away and never processed further. Only step 2 - because it
      used memcpy - was using prefetch, making things even worse.
      
      This patch reorganises these steps so that, aside from the copying, the
      operations are performed in parallel, maximising cache utilisation. No more
      than the worst-case number of bytes needed for header parsing is unescaped.
      Most of the data is, in practice, only read in order to search for a start
      code, for which optimised implementations already existed in the H264 codec
      (notably the ARM version uses prefetch, so we end up doing both remaining
      passes at maximum speed). For MKV files, we know when we've found the last
      start code of interest in a given frame, so we are able to avoid doing even
      that one remaining pass for most of the buffer.
      
      In some use-cases (such as the Raspberry Pi) video decode is handled by the
      GPU, but the entire elementary stream is still fed through the parser to
      pick out certain elements of the header which are necessary to manage the
      decode process. As you might expect, in these cases, the performance of the
      parser is significant.
      
      To measure parser performance, I used the same VC-1 elementary stream in
      either an MPEG-2 transport stream or a MKV file, and fed it through avconv
      with -c:v copy -c:a copy -f null. These are the gperftools counts for
      those streams, both filtered to only include vc1_parse() and its callees,
      and unfiltered (to include the whole binary). Lower numbers are better:
      
                      Before          After
      File  Filtered  Mean   StdDev   Mean   StdDev  Confidence  Change
      M2TS  No        861.7  8.2      650.5  8.1     100.0%      +32.5%
      MKV   No        868.9  7.4      731.7  9.0     100.0%      +18.8%
      M2TS  Yes       250.0  11.2     27.2   3.4     100.0%      +817.9%
      MKV   Yes       149.0  12.8     1.7    0.8     100.0%      +8526.3%
      
      Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
      case does show a larger absolute improvement though, since it was worse
      to begin with.
      
      This patch has been tested with the FATE suite (albeit on x86 for speed).
      Signed-off-by: 's avatarLuca Barbato <lu_zero@gentoo.org>
      701e8b42
  5. 29 Apr, 2014 1 commit
  6. 25 Apr, 2014 1 commit
    • Ben Avison's avatar
      vc-1: Optimise parser (with special attention to ARM) · a0d7f9ec
      Ben Avison authored
      The previous implementation of the parser made four passes over each input
      buffer (reduced to two if the container format already guaranteed the input
      buffer corresponded to frames, such as with MKV). But these buffers are
      often 200K in size, certainly enough to flush the data out of L1 cache, and
      for many CPUs, all the way out to main memory. The passes were:
      
      1) locate frame boundaries (not needed for MKV etc)
      2) copy the data into a contiguous block (not needed for MKV etc)
      3) locate the start codes within each frame
      4) unescape the data between start codes
      
      After this, the unescaped data was parsed to extract certain header fields,
      but because the unescape operation was so large, this was usually also
      effectively operating on uncached memory. Most of the unescaped data was
      simply thrown away and never processed further. Only step 2 - because it
      used memcpy - was using prefetch, making things even worse.
      
      This patch reorganises these steps so that, aside from the copying, the
      operations are performed in parallel, maximising cache utilisation. No more
      than the worst-case number of bytes needed for header parsing is unescaped.
      Most of the data is, in practice, only read in order to search for a start
      code, for which optimised implementations already existed in the H264 codec
      (notably the ARM version uses prefetch, so we end up doing both remaining
      passes at maximum speed). For MKV files, we know when we've found the last
      start code of interest in a given frame, so we are able to avoid doing even
      that one remaining pass for most of the buffer.
      
      In some use-cases (such as the Raspberry Pi) video decode is handled by the
      GPU, but the entire elementary stream is still fed through the parser to
      pick out certain elements of the header which are necessary to manage the
      decode process. As you might expect, in these cases, the performance of the
      parser is significant.
      
      To measure parser performance, I used the same VC-1 elementary stream in
      either an MPEG-2 transport stream or a MKV file, and fed it through ffmpeg
      with -c:v copy -c:a copy -f null. These are the gperftools counts for
      those streams, both filtered to only include vc1_parse() and its callees,
      and unfiltered (to include the whole binary). Lower numbers are better:
      
                      Before          After
      File  Filtered  Mean   StdDev   Mean   StdDev  Confidence  Change
      M2TS  No        861.7  8.2      650.5  8.1     100.0%      +32.5%
      MKV   No        868.9  7.4      731.7  9.0     100.0%      +18.8%
      M2TS  Yes       250.0  11.2     27.2   3.4     100.0%      +817.9%
      MKV   Yes       149.0  12.8     1.7    0.8     100.0%      +8526.3%
      
      Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
      case does show a larger absolute improvement though, since it was worse
      to begin with.
      
      This patch has been tested with the FATE suite (albeit on x86 for speed).
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      a0d7f9ec
  7. 27 Oct, 2013 1 commit
  8. 14 Jun, 2013 1 commit
  9. 04 May, 2013 1 commit
  10. 03 May, 2013 1 commit
  11. 07 Aug, 2012 1 commit
  12. 18 Feb, 2012 1 commit
    • Ronald S. Bultje's avatar
      vc1parse: call vc1_init_common(). · c742ab4e
      Ronald S. Bultje authored
      The parser uses VLC tables initialized in vc1_common_init(), therefore
      we should call this function on parser init also.
      
      Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
      CC: libav-stable@libav.org
      c742ab4e
  13. 15 Feb, 2012 1 commit
  14. 10 Feb, 2012 1 commit
  15. 06 Jan, 2012 1 commit
    • Janne Grunau's avatar
      parsers: initialize MpegEncContext.slice_context_count to 1 · f907615f
      Janne Grunau authored
      The mpeg4 video, H264 and VC-1 parser hold (directly or indirectly)
      a MpegEncContext in their private context. Since they do not call the
      common mpegvideo init function slice_context_count has explicitly set
      to 1.
      Prevents a null pointer dereference in the h264 parser and fixes
      bug 193.
      f907615f
  16. 12 Dec, 2011 1 commit
  17. 02 Nov, 2011 1 commit
  18. 25 Aug, 2011 1 commit
    • John Stebbins's avatar
      vc1: fix VC-1 Pulldown handling. · 0d802ac5
      John Stebbins authored
      Pulldown flags are being set incorrectly and AVFrame->repeat_pict is not
      being set.  Also, skipped frames exit header parsing too early and do not
      set pulldown flags appropriately. Ticks_per_frame needs to be set and
      time_base adjusted so player can extend frame duration by a field time.
      
      This fixes problems encountered when attempting to transcode HD-DVD EVOB
      files with HandBrake. Also makes these files play smoothly in avplay.
      Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      0d802ac5
  19. 02 May, 2011 2 commits
  20. 19 Mar, 2011 1 commit
  21. 28 Jan, 2011 1 commit
  22. 26 Jan, 2011 1 commit
  23. 20 Apr, 2010 1 commit
  24. 30 May, 2009 1 commit
  25. 22 Feb, 2009 1 commit
  26. 01 Feb, 2009 1 commit
  27. 07 May, 2007 1 commit
  28. 06 May, 2007 1 commit
  29. 05 May, 2007 1 commit
  30. 04 May, 2007 2 commits