• Jakob Gruber's avatar
    [regexp] Fix BoyerMooreLookahead behavior at submatches · bc4cbe92
    Jakob Gruber authored
    Since https://codereview.chromium.org/2777583003, the Boyer-Moore
    lookahead (used by the irregexp engine) also looks inside submatches
    to narrow down its range of accepted characters at specific offsets.
    
    But the end of a submatch, designated by a PositiveSubmatchSuccess
    action node, was not handled correctly. When a submatch terminates,
    we have no knowledge of what may follow, and thus must accept any
    character at following positions. This is done by the SetRest call
    added in this CL.
    
    An example, since this is fairly obscure:
    
    /^.*?Y(((?=B?).)*)Y$/s
    
    The initial non-greedy loop, together with the s flag,
    will trigger an attempted Boyer-Moore lookahead. After this follows
    an unconditional Y, a *-quantified loop matching any char and
    containing a lookahead that matches either 1 B or 0 B's, and an
    unconditional trailing Y.
    
    When the BM lookahead scans the subject string for the beginning of
    this pattern after the non-greedy loop, it should look for: a Y at
    offset 0, and either a B, a Y, or '.' (-> any character) at offset 1.
    
    Prior to this CL this was not the case:
    
    - The lookaround is internally generated as a submatch.
    - The optional 'B?' is unrolled into 'either B followed by submatch
      end' or 'submatch end'.
    - Filling in BM infos terminates when encountering a submatch end.
      Thus in the former case we added B to the set of accepted characters
      and terminated, while in the latter case we simply terminated.o
    
    This CL ensures that BM will accept any character at any offset at or
    exceeding the first encountered submatch end.
    
    Bug: v8:8770
    Change-Id: Iff998ba307cd9669203846a9182798b8cf6a85dc
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1679506
    Commit-Queue: Jakob Gruber <jgruber@chromium.org>
    Reviewed-by: 's avatarErik Corry <erikcorry@chromium.org>
    Reviewed-by: 's avatarYang Guo <yangguo@chromium.org>
    Auto-Submit: Jakob Gruber <jgruber@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#62460}
    bc4cbe92
Name
Last commit
Last update
benchmarks Loading commit data...
build_overrides Loading commit data...
custom_deps Loading commit data...
docs Loading commit data...
gni Loading commit data...
include Loading commit data...
infra Loading commit data...
samples Loading commit data...
src Loading commit data...
test Loading commit data...
testing Loading commit data...
third_party Loading commit data...
tools Loading commit data...
.clang-format Loading commit data...
.clang-tidy Loading commit data...
.editorconfig Loading commit data...
.git-blame-ignore-revs Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.gn Loading commit data...
.vpython Loading commit data...
.ycm_extra_conf.py Loading commit data...
AUTHORS Loading commit data...
BUILD.gn Loading commit data...
CODE_OF_CONDUCT.md Loading commit data...
COMMON_OWNERS Loading commit data...
ChangeLog Loading commit data...
DEPS Loading commit data...
ENG_REVIEW_OWNERS Loading commit data...
INFRA_OWNERS Loading commit data...
INTL_OWNERS Loading commit data...
LICENSE Loading commit data...
LICENSE.fdlibm Loading commit data...
LICENSE.strongtalk Loading commit data...
LICENSE.v8 Loading commit data...
LICENSE.valgrind Loading commit data...
MIPS_OWNERS Loading commit data...
OWNERS Loading commit data...
PPC_OWNERS Loading commit data...
PRESUBMIT.py Loading commit data...
README.md Loading commit data...
S390_OWNERS Loading commit data...
WATCHLISTS Loading commit data...
codereview.settings Loading commit data...