1. 18 Jul, 2014 31 commits
  2. 17 Jul, 2014 9 commits
    • Michael Niedermayer's avatar
      c527c14d
    • Ben Avison's avatar
      armv6: Accelerate butterflies_float · 5a272190
      Ben Avison authored
      I benchmarked the result by measuring the number of gperftools samples that
      hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
      specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
      same sample AAC stream:
      
                         Before          After
                         Mean   StdDev   Mean   StdDev  Confidence  Change
      Audio decode       1542.8 43.7     1470.5 41.5    100.0%      +4.9%
      butterflies_float  130.0  11.9     70.2   12.1    100.0%      +85.2%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5a272190
    • Ben Avison's avatar
      armv6: Accelerate vector_fmul_window · 5edad2c4
      Ben Avison authored
      I benchmarked the result by measuring the number of gperftools samples that
      hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
      specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
      same sample AAC stream:
      
                          Before          After
                          Mean   StdDev   Mean   StdDev  Confidence  Change
      Audio decode        1598.2 47.4     1529.2 25.4    100.0%      +4.5%
      vector_fmul_window  244.0  22.1     188.9  22.3    100.0%      +29.2%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5edad2c4
    • Ben Avison's avatar
      armv6: Accelerate ff_fft_calc for general case (nbits != 4) · 87552d54
      Ben Avison authored
      The previous implementation targeted DTS Coherent Acoustics, which only
      requires nbits == 4 (fft16()). This case was (and still is) linked directly
      rather than being indirected through ff_fft_calc_vfp(), but now the full
      range from radix-4 up to radix-65536 is available. This benefits other codecs
      such as AAC and AC3.
      
      The implementaion is based upon the C version, with each routine larger than
      radix-16 calling a hierarchy of smaller FFT functions, then performing a
      post-processing pass. This pass benefits a lot from loop unrolling to
      counter the long pipelines in the VFP. A relaxed calling standard also
      reduces the overhead of the call hierarchy, and avoiding the excessive
      inlining performed by GCC probably helps with I-cache utilisation too.
      
      I benchmarked the result by measuring the number of gperftools samples that
      hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
      specifically in the FFT routines (fft4() to fft512() and pass()) for the
      same sample AAC stream:
      
                    Before          After
                    Mean   StdDev   Mean   StdDev  Confidence  Change
      Audio decode  2245.5 53.1     1599.6 43.8    100.0%      +40.4%
      FFT routines  940.6  22.0     348.1  20.8    100.0%      +170.2%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      87552d54
    • Ben Avison's avatar
      armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) · 5c22e8e4
      Ben Avison authored
      The previous implementation targeted DTS Coherent Acoustics, which only
      requires mdct_bits == 6. This relatively small size lent itself to
      unrolling the loops a small number of times, and encoding offsets
      calculated at assembly time within the load/store instructions of each
      iteration.
      
      In the more general case (codecs such as AAC and AC3) much larger arrays
      are used - mdct_bits == [8, 9, 11]. The old method does not scale for
      these cases, so more integer registers are used with non-unrolled versions
      of the loops (and with some stack spillage). The postrotation filter loop
      is still unrolled by a factor of 2 to permit the double-buffering of some
      VFP registers to facilitate overlap of neighbouring iterations.
      
      I benchmarked the result by measuring the number of gperftools samples
      that hit anywhere in the AAC decoder (starting from aac_decode_frame())
      or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
      example AAC stream:
      
                        Before          After
                        Mean   StdDev   Mean   StdDev  Confidence  Change
      aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
      ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5c22e8e4
    • Michael Niedermayer's avatar
      avcodec/me_cmp: restore author attribution and copyrights · 162cffca
      Michael Niedermayer authored
      These where removed by libav in
      
      See: git show -C 2d604443
      diff --git a/libavcodec/dsputil.c b/libavcodec/me_cmp.c
      similarity index 98%
      rename from libavcodec/dsputil.c
      rename to libavcodec/me_cmp.c
      index ba71a99..9fcc937 100644
      --- a/libavcodec/dsputil.c
      +++ b/libavcodec/me_cmp.c
      @@ -1,8 +1,4 @@
       /*
      - * DSP utils
      - * Copyright (c) 2000, 2001 Fabrice Bellard
      - * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
      - *
        * This file is part of Libav.
        *
        * Libav is free software; you can redistribute it and/or
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      162cffca
    • Michael Niedermayer's avatar
      Merge commit '2d604443' · 3a2d1465
      Michael Niedermayer authored
      * commit '2d604443':
        dsputil: Split motion estimation compare bits off into their own context
      
      Conflicts:
      	configure
      	libavcodec/Makefile
      	libavcodec/arm/Makefile
      	libavcodec/dvenc.c
      	libavcodec/error_resilience.c
      	libavcodec/h264.h
      	libavcodec/h264_slice.c
      	libavcodec/me_cmp.c
      	libavcodec/me_cmp.h
      	libavcodec/motion_est.c
      	libavcodec/motion_est_template.c
      	libavcodec/mpeg4videoenc.c
      	libavcodec/mpegvideo.c
      	libavcodec/mpegvideo_enc.c
      	libavcodec/x86/Makefile
      	libavcodec/x86/me_cmp_init.c
      Merged-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      3a2d1465
    • Michael Niedermayer's avatar
      Merge commit 'a578b040' · 6be71e99
      Michael Niedermayer authored
      * commit 'a578b040':
        configure: Assume runtime cpu detection on arm on --target-os=android as well
      Merged-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      6be71e99
    • Michael Niedermayer's avatar
      Merge commit 'c23ce454' · d6676a16
      Michael Niedermayer authored
      * commit 'c23ce454':
        x86: dsputil: Coalesce all init files
      
      Conflicts:
      	libavcodec/x86/dsputil_init.c
      	libavcodec/x86/dsputil_x86.h
      	libavcodec/x86/motion_est.c
      Merged-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      d6676a16