1. 28 Mar, 2016 26 commits
  2. 26 Mar, 2016 1 commit
    • Martin Storsjö's avatar
      aarch64: Make transpose_4x4H do a regular transpose · cdb1665f
      Martin Storsjö authored
      Previously, ff_h264_idct_add_neon (originally in the arm version) used
      a non-regular transpose in order to be able to use more instructions
      that deal with registers as 128 bit register pairs. The aarch64
      translation doesn't do it to the same extent, but brought along the
      same structure since it was a straight translation.
      
      This reshuffles ff_h264_idct_add_neon, bringing it closer to
      the C implementation, making the transpose_4x4H macro do a regular
      transpose, usable for other algorithms as well.
      
      Previously, the third and fourth output from transpose_4x4H were
      swapped, and prior to cc29d96d, the same inputs as well. In
      addition to just swapping the outputs, also renumber the intermediate
      registers for better readability (making the register order match
      transpose_4x8B).
      
      This runs with the same number of cycles as before.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      cdb1665f
  3. 25 Mar, 2016 13 commits