• Janne Grunau's avatar
    h264/aarch64: add intra loop filter neon asm · 28a8b541
    Janne Grunau authored
    Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
    (x264 uses nv12 chroma) and optimized.
    
    Cycle count for checkasm --bench on a Snapdragon 820e:
    h264_h_loop_filter_luma_intra_8bpp_c: 60.0
    h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
    h264_v_loop_filter_luma_intra_8bpp_c: 148.3
    h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
    h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
    h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
    h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
    h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
    h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
    h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
    28a8b541
h264dsp_neon.S 29 KB