• Martin Bidlingmaier's avatar
    [regexp] Support assertions in experimental engine · e83511c2
    Martin Bidlingmaier authored
    Assertions are implemented with the new ASSERTION instruction.  The nfa
    interpreter evaluates the assertion based on the current context in the
    subject string every time a thread executes ASSERTION.  This is
    analogous to what re2 and rust/regex do.
    
    Alternatives to this approach:
    - The interpreter could calculate eagerly for all assertion types
      whether they are satisfied whenever the current input position is
      advanced.  This would make evaluating the ASSERTION instruction itself
      cheaper, but at the cost of making every advance in the input string
      more expensive.  I suspect this would be slower on average because
      assertions are not that common that we typically evaluate >= 2
      assertions at every input position.
    - Assertions in a regexp could be desugared into CONSUME_RANGE
      instructions, so that no new instruction would be necessary.  For
      example, the word boundary assertion \b is satisfied at a given
      position/state if we have just consumed a word character and will
      consume a non-word character next, or vice-versa.  The tricky part
      about this is that the assertion itself should not consume input, so
      we'd have to split (automaton) states according to whether we've
      arrived at them via a word character or not.  The current compiler is
      not really equipped for this kind of transformation.  For {start,end}
      of {line,file} assertions, we'd need to introduce dummy characters
      indicating start/end of input (say, 0x10000 and 0x10001) which we feed
      to the interpreter before respectively after the actual input.
      I suspect that this approach wouldn't make much of a difference for
      NFA execution. It would likely speed up (lazy) DFA execution though
      because assertions would be dealt with in the fast path.
    
    Cq-Include-Trybots: luci.v8.try:v8_linux64_fyi_rel_ng
    Bug: v8:10765
    Change-Id: Ic2012c943e0ce54eb8662789fb3d4c1b6cd8d606
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2398644
    Commit-Queue: Martin Bidlingmaier <mbid@google.com>
    Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
    Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#70026}
    e83511c2
Name
Last commit
Last update
build_overrides Loading commit data...
custom_deps Loading commit data...
docs Loading commit data...
gni Loading commit data...
include Loading commit data...
infra Loading commit data...
samples Loading commit data...
src Loading commit data...
test Loading commit data...
testing Loading commit data...
third_party Loading commit data...
tools Loading commit data...
.clang-format Loading commit data...
.clang-tidy Loading commit data...
.editorconfig Loading commit data...
.flake8 Loading commit data...
.git-blame-ignore-revs Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.gn Loading commit data...
.vpython Loading commit data...
.ycm_extra_conf.py Loading commit data...
AUTHORS Loading commit data...
BUILD.gn Loading commit data...
CODE_OF_CONDUCT.md Loading commit data...
COMMON_OWNERS Loading commit data...
DEPS Loading commit data...
ENG_REVIEW_OWNERS Loading commit data...
INFRA_OWNERS Loading commit data...
INTL_OWNERS Loading commit data...
LICENSE Loading commit data...
LICENSE.fdlibm Loading commit data...
LICENSE.strongtalk Loading commit data...
LICENSE.v8 Loading commit data...
MIPS_OWNERS Loading commit data...
OWNERS Loading commit data...
PPC_OWNERS Loading commit data...
PRESUBMIT.py Loading commit data...
README.md Loading commit data...
S390_OWNERS Loading commit data...
WATCHLISTS Loading commit data...
codereview.settings Loading commit data...