• Jakob Gruber's avatar
    [regexp] Fix BoyerMooreLookahead behavior at submatches · bc4cbe92
    Jakob Gruber authored
    Since https://codereview.chromium.org/2777583003, the Boyer-Moore
    lookahead (used by the irregexp engine) also looks inside submatches
    to narrow down its range of accepted characters at specific offsets.
    
    But the end of a submatch, designated by a PositiveSubmatchSuccess
    action node, was not handled correctly. When a submatch terminates,
    we have no knowledge of what may follow, and thus must accept any
    character at following positions. This is done by the SetRest call
    added in this CL.
    
    An example, since this is fairly obscure:
    
    /^.*?Y(((?=B?).)*)Y$/s
    
    The initial non-greedy loop, together with the s flag,
    will trigger an attempted Boyer-Moore lookahead. After this follows
    an unconditional Y, a *-quantified loop matching any char and
    containing a lookahead that matches either 1 B or 0 B's, and an
    unconditional trailing Y.
    
    When the BM lookahead scans the subject string for the beginning of
    this pattern after the non-greedy loop, it should look for: a Y at
    offset 0, and either a B, a Y, or '.' (-> any character) at offset 1.
    
    Prior to this CL this was not the case:
    
    - The lookaround is internally generated as a submatch.
    - The optional 'B?' is unrolled into 'either B followed by submatch
      end' or 'submatch end'.
    - Filling in BM infos terminates when encountering a submatch end.
      Thus in the former case we added B to the set of accepted characters
      and terminated, while in the latter case we simply terminated.o
    
    This CL ensures that BM will accept any character at any offset at or
    exceeding the first encountered submatch end.
    
    Bug: v8:8770
    Change-Id: Iff998ba307cd9669203846a9182798b8cf6a85dc
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1679506
    Commit-Queue: Jakob Gruber <jgruber@chromium.org>
    Reviewed-by: 's avatarErik Corry <erikcorry@chromium.org>
    Reviewed-by: 's avatarYang Guo <yangguo@chromium.org>
    Auto-Submit: Jakob Gruber <jgruber@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#62460}
    bc4cbe92
Name
Last commit
Last update
..
arm Loading commit data...
arm64 Loading commit data...
ia32 Loading commit data...
mips Loading commit data...
mips64 Loading commit data...
ppc Loading commit data...
s390 Loading commit data...
x64 Loading commit data...
OWNERS Loading commit data...
property-sequences.cc Loading commit data...
property-sequences.h Loading commit data...
regexp-ast.cc Loading commit data...
regexp-ast.h Loading commit data...
regexp-bytecodes.h Loading commit data...
regexp-compiler-tonode.cc Loading commit data...
regexp-compiler.cc Loading commit data...
regexp-compiler.h Loading commit data...
regexp-dotprinter.cc Loading commit data...
regexp-dotprinter.h Loading commit data...
regexp-interpreter.cc Loading commit data...
regexp-interpreter.h Loading commit data...
regexp-macro-assembler-arch.h Loading commit data...
regexp-macro-assembler-irregexp-inl.h Loading commit data...
regexp-macro-assembler-irregexp.cc Loading commit data...
regexp-macro-assembler-irregexp.h Loading commit data...
regexp-macro-assembler-tracer.cc Loading commit data...
regexp-macro-assembler-tracer.h Loading commit data...
regexp-macro-assembler.cc Loading commit data...
regexp-macro-assembler.h Loading commit data...
regexp-nodes.h Loading commit data...
regexp-parser.cc Loading commit data...
regexp-parser.h Loading commit data...
regexp-stack.cc Loading commit data...
regexp-stack.h Loading commit data...
regexp-utils.cc Loading commit data...
regexp-utils.h Loading commit data...
regexp.cc Loading commit data...
regexp.h Loading commit data...