• Martin Bidlingmaier's avatar
    [regexp] Support assertions in experimental engine · e83511c2
    Martin Bidlingmaier authored
    Assertions are implemented with the new ASSERTION instruction.  The nfa
    interpreter evaluates the assertion based on the current context in the
    subject string every time a thread executes ASSERTION.  This is
    analogous to what re2 and rust/regex do.
    
    Alternatives to this approach:
    - The interpreter could calculate eagerly for all assertion types
      whether they are satisfied whenever the current input position is
      advanced.  This would make evaluating the ASSERTION instruction itself
      cheaper, but at the cost of making every advance in the input string
      more expensive.  I suspect this would be slower on average because
      assertions are not that common that we typically evaluate >= 2
      assertions at every input position.
    - Assertions in a regexp could be desugared into CONSUME_RANGE
      instructions, so that no new instruction would be necessary.  For
      example, the word boundary assertion \b is satisfied at a given
      position/state if we have just consumed a word character and will
      consume a non-word character next, or vice-versa.  The tricky part
      about this is that the assertion itself should not consume input, so
      we'd have to split (automaton) states according to whether we've
      arrived at them via a word character or not.  The current compiler is
      not really equipped for this kind of transformation.  For {start,end}
      of {line,file} assertions, we'd need to introduce dummy characters
      indicating start/end of input (say, 0x10000 and 0x10001) which we feed
      to the interpreter before respectively after the actual input.
      I suspect that this approach wouldn't make much of a difference for
      NFA execution. It would likely speed up (lazy) DFA execution though
      because assertions would be dealt with in the fast path.
    
    Cq-Include-Trybots: luci.v8.try:v8_linux64_fyi_rel_ng
    Bug: v8:10765
    Change-Id: Ic2012c943e0ce54eb8662789fb3d4c1b6cd8d606
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2398644
    Commit-Queue: Martin Bidlingmaier <mbid@google.com>
    Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
    Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#70026}
    e83511c2
char-predicates-inl.h 4.08 KB