• Andreas Rheinhardt's avatar
    avformat/matroskadec: Introduce a "last known good" position · a3db9f62
    Andreas Rheinhardt authored
    Currently, resyncing during reading packets works as follows:
    The current position is recorded, then a call to matroska_parse_cluster
    is made and if said call fails, the demuxer tries to resync from the
    earlier position. If the call doesn't fail, but also doesn't deliver a
    packet, then this is looped.
    
    There are two problems with this approach:
    1. The Matroska file format aims to be forward-compatible; to achieve
    this, a demuxer should simply ignore and skip elements it doesn't
    know about. But it is not possible to reliably distinguish unknown
    elements from junk. If matroska_parse_cluster encounters an unknown
    element, it can therefore not simply error out; instead it returns zero
    and the loop is iterated which includes an update of the position that
    is intended to be used in case of errors, i.e. the element that is
    skipped is not searched for level 1 element ids to resync to at all if
    later calls to matroska_parse_cluster return an error.
    Notice that in case that sync has been lost there can be a chain of
    several unknown/possibly junk elements before an error is detected.
    
    2. Even if a call to matroska_parse_cluster delivers a packet, this does
    not mean that everything is fine. E.g. it might be that some of the
    block's data is missing and that the data that was presumed to be from
    the block just read actually contains the beginning of the next element.
    This will only be apparent at the next call of matroska_read_packet,
    which uses the (false) end of the earlier block as resync position so
    that in the (not unlikely) case that the call to matroska_parse_cluster
    fails, the data believed to be part of the earlier block is not searched
    for a level 1 element to resync to.
    
    To counter this, a "last known good" position is introduced. When an
    element id that is known to be allowed at this position in the hierarchy
    (according to the syntax currently in use for parsing) is read and some
    further checks (regarding the length of the element and its containing
    master element) are passed, then the beginning of the current element is
    treated as a "good" position and recorded as such in the
    MatroskaDemuxContext. Because of 2., only the start of the element is
    treated as a "good" position, not the whole element. If an error occurs
    later during parsing of clusters, the resync process starts at the last
    known good position.
    
    Given that when the header is damaged the subsequent resync never skips over
    data and is therefore unaffected by both issues, the "last known good"
    concept is not used there.
    Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@gmail.com>
    a3db9f62
Name
Last commit
Last update
compat Loading commit data...
doc Loading commit data...
ffbuild Loading commit data...
fftools Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libpostproc Loading commit data...
libswresample Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL.md Loading commit data...
LICENSE.md Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
configure Loading commit data...