1. 18 Jul, 2014 25 commits
  2. 17 Jul, 2014 15 commits
    • Michael Niedermayer's avatar
      c527c14d
    • Ben Avison's avatar
      armv6: Accelerate butterflies_float · 5a272190
      Ben Avison authored
      I benchmarked the result by measuring the number of gperftools samples that
      hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
      specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
      same sample AAC stream:
      
                         Before          After
                         Mean   StdDev   Mean   StdDev  Confidence  Change
      Audio decode       1542.8 43.7     1470.5 41.5    100.0%      +4.9%
      butterflies_float  130.0  11.9     70.2   12.1    100.0%      +85.2%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5a272190
    • Ben Avison's avatar
      armv6: Accelerate vector_fmul_window · 5edad2c4
      Ben Avison authored
      I benchmarked the result by measuring the number of gperftools samples that
      hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
      specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
      same sample AAC stream:
      
                          Before          After
                          Mean   StdDev   Mean   StdDev  Confidence  Change
      Audio decode        1598.2 47.4     1529.2 25.4    100.0%      +4.5%
      vector_fmul_window  244.0  22.1     188.9  22.3    100.0%      +29.2%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5edad2c4
    • Ben Avison's avatar
      armv6: Accelerate ff_fft_calc for general case (nbits != 4) · 87552d54
      Ben Avison authored
      The previous implementation targeted DTS Coherent Acoustics, which only
      requires nbits == 4 (fft16()). This case was (and still is) linked directly
      rather than being indirected through ff_fft_calc_vfp(), but now the full
      range from radix-4 up to radix-65536 is available. This benefits other codecs
      such as AAC and AC3.
      
      The implementaion is based upon the C version, with each routine larger than
      radix-16 calling a hierarchy of smaller FFT functions, then performing a
      post-processing pass. This pass benefits a lot from loop unrolling to
      counter the long pipelines in the VFP. A relaxed calling standard also
      reduces the overhead of the call hierarchy, and avoiding the excessive
      inlining performed by GCC probably helps with I-cache utilisation too.
      
      I benchmarked the result by measuring the number of gperftools samples that
      hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
      specifically in the FFT routines (fft4() to fft512() and pass()) for the
      same sample AAC stream:
      
                    Before          After
                    Mean   StdDev   Mean   StdDev  Confidence  Change
      Audio decode  2245.5 53.1     1599.6 43.8    100.0%      +40.4%
      FFT routines  940.6  22.0     348.1  20.8    100.0%      +170.2%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      87552d54
    • Ben Avison's avatar
      armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) · 5c22e8e4
      Ben Avison authored
      The previous implementation targeted DTS Coherent Acoustics, which only
      requires mdct_bits == 6. This relatively small size lent itself to
      unrolling the loops a small number of times, and encoding offsets
      calculated at assembly time within the load/store instructions of each
      iteration.
      
      In the more general case (codecs such as AAC and AC3) much larger arrays
      are used - mdct_bits == [8, 9, 11]. The old method does not scale for
      these cases, so more integer registers are used with non-unrolled versions
      of the loops (and with some stack spillage). The postrotation filter loop
      is still unrolled by a factor of 2 to permit the double-buffering of some
      VFP registers to facilitate overlap of neighbouring iterations.
      
      I benchmarked the result by measuring the number of gperftools samples
      that hit anywhere in the AAC decoder (starting from aac_decode_frame())
      or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
      example AAC stream:
      
                        Before          After
                        Mean   StdDev   Mean   StdDev  Confidence  Change
      aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
      ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5c22e8e4
    • Michael Niedermayer's avatar
      avcodec/me_cmp: restore author attribution and copyrights · 162cffca
      Michael Niedermayer authored
      These where removed by libav in
      
      See: git show -C 2d604443
      diff --git a/libavcodec/dsputil.c b/libavcodec/me_cmp.c
      similarity index 98%
      rename from libavcodec/dsputil.c
      rename to libavcodec/me_cmp.c
      index ba71a99..9fcc937 100644
      --- a/libavcodec/dsputil.c
      +++ b/libavcodec/me_cmp.c
      @@ -1,8 +1,4 @@
       /*
      - * DSP utils
      - * Copyright (c) 2000, 2001 Fabrice Bellard
      - * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
      - *
        * This file is part of Libav.
        *
        * Libav is free software; you can redistribute it and/or
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      162cffca
    • Michael Niedermayer's avatar
      Merge commit '2d604443' · 3a2d1465
      Michael Niedermayer authored
      * commit '2d604443':
        dsputil: Split motion estimation compare bits off into their own context
      
      Conflicts:
      	configure
      	libavcodec/Makefile
      	libavcodec/arm/Makefile
      	libavcodec/dvenc.c
      	libavcodec/error_resilience.c
      	libavcodec/h264.h
      	libavcodec/h264_slice.c
      	libavcodec/me_cmp.c
      	libavcodec/me_cmp.h
      	libavcodec/motion_est.c
      	libavcodec/motion_est_template.c
      	libavcodec/mpeg4videoenc.c
      	libavcodec/mpegvideo.c
      	libavcodec/mpegvideo_enc.c
      	libavcodec/x86/Makefile
      	libavcodec/x86/me_cmp_init.c
      Merged-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      3a2d1465
    • Michael Niedermayer's avatar
      Merge commit 'a578b040' · 6be71e99
      Michael Niedermayer authored
      * commit 'a578b040':
        configure: Assume runtime cpu detection on arm on --target-os=android as well
      Merged-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      6be71e99
    • Michael Niedermayer's avatar
      Merge commit 'c23ce454' · d6676a16
      Michael Niedermayer authored
      * commit 'c23ce454':
        x86: dsputil: Coalesce all init files
      
      Conflicts:
      	libavcodec/x86/dsputil_init.c
      	libavcodec/x86/dsputil_x86.h
      	libavcodec/x86/motion_est.c
      Merged-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      d6676a16
    • Nicolas George's avatar
      lavd/x11grab: reindent after last commit. · 8e297686
      Nicolas George authored
      8e297686
    • Nicolas George's avatar
      lavfi: check refcount before merging. · 099aff5c
      Nicolas George authored
      When merging the formats around the automatically inserted
      convert filters, the refcount of the format lists can not be 0.
      Coverity does not detect it, and suspects a memory leak,
      because if refcount is 0 the newly allocated lists are not
      stored anywhere. That gives CIDs 1224282, 1224283 and 1224284.
      Lists with refcount 0 are used in can_merge_formats(), so the
      asserts can not be moved inside the merge functions.
      099aff5c
    • Nicolas George's avatar
      lavd/x11grab: add an option to disable MIT-SHM. · 1d12df1a
      Nicolas George authored
      With remote displays supporting the MIT-SHM extension,
      the extension is detected and used, but attaching fails
      asynchronously.
      1d12df1a
    • Nicolas George's avatar
      lavd/x11grab: check 32-bits color masks. · 16c67954
      Nicolas George authored
      The X11 servers by VNC, at 32-bits depths, has the following masks:
      R:0x000007ff G:0x003ff800 B:0xffc00000
      This is not compatible with AV_PIX_FMT_0RGB32, and the result
      is success with completely wrong colors.
      16c67954
    • Nicolas George's avatar
    • Nicolas George's avatar
      lavd/x11grab: disable drawing mouse without XFixes. · a65c0a3f
      Nicolas George authored
      Fix a segfault if the XFixes extension is not available on
      the X11 server.
      Can be reproduced using the VNC server.
      a65c0a3f