• Mans Rullgard's avatar
    x86: vc1: fix and enable optimised loop filter · f2fd1678
    Mans Rullgard authored
    The problem is that the ssse3 psign instruction does the wrong
    thing here.  Commit ea60dfe2 incorrectly removed a macro emulating
    this instruction for pre-ssse3 code.  However, the emulation is
    incorrect, and the code relies on the behaviour of the macro.
    Specifically, the psign sets destination elements to zero where
    the corresponding source element is zero, whereas the emulation
    only negates destination elements where the source is negative.
    
    Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus,
    which is why the original VC-1 code had an additional right shift
    when using it.  Since the psign instruction cannot be used here,
    skip all the macro hell and use the working instruction sequence
    directly.
    
    None of this was noticed due a stray return statement in
    ff_vc1dsp_init_mmx() which meant that only the mmx version of the
    loop filter was ever used (before being removed in ea60dfe2).
    Signed-off-by: 's avatarMans Rullgard <mans@mansr.com>
    f2fd1678
vc1dsp_yasm.asm 7.62 KB