• Christophe Gisquet's avatar
    x86: hevc_mc: split differently calls · 3e892b2b
    Christophe Gisquet authored
    In some cases, 2 or 3 calls are performed to functions for unusual
    widths. Instead, perform 2 calls for different widths to split the
    workload.
    
    The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't
    be processed that way without modifications: some calls use unaligned
    buffers, and having branches to handle this was resulting in no
    micro-benchmark benefit.
    
    For block_w == 12 (around 1% of the pixels of the sequence):
    Before:
    12758 decicycles in epel_uni, 4093 runs, 3 skips
    19389 decicycles in qpel_uni, 8187 runs, 5 skips
    22699 decicycles in epel_bi, 32743 runs, 25 skips
    34736 decicycles in qpel_bi, 32733 runs, 35 skips
    
    After:
    11929 decicycles in epel_uni, 4096 runs, 0 skips
    18131 decicycles in qpel_uni, 8184 runs, 8 skips
    20065 decicycles in epel_bi, 32750 runs, 18 skips
    31458 decicycles in qpel_bi, 32753 runs, 15 skips
    Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
    3e892b2b
hevcdsp_init.c 33.3 KB