• Martin Bidlingmaier's avatar
    [regexp] Support assertions in experimental engine · e83511c2
    Martin Bidlingmaier authored
    Assertions are implemented with the new ASSERTION instruction.  The nfa
    interpreter evaluates the assertion based on the current context in the
    subject string every time a thread executes ASSERTION.  This is
    analogous to what re2 and rust/regex do.
    
    Alternatives to this approach:
    - The interpreter could calculate eagerly for all assertion types
      whether they are satisfied whenever the current input position is
      advanced.  This would make evaluating the ASSERTION instruction itself
      cheaper, but at the cost of making every advance in the input string
      more expensive.  I suspect this would be slower on average because
      assertions are not that common that we typically evaluate >= 2
      assertions at every input position.
    - Assertions in a regexp could be desugared into CONSUME_RANGE
      instructions, so that no new instruction would be necessary.  For
      example, the word boundary assertion \b is satisfied at a given
      position/state if we have just consumed a word character and will
      consume a non-word character next, or vice-versa.  The tricky part
      about this is that the assertion itself should not consume input, so
      we'd have to split (automaton) states according to whether we've
      arrived at them via a word character or not.  The current compiler is
      not really equipped for this kind of transformation.  For {start,end}
      of {line,file} assertions, we'd need to introduce dummy characters
      indicating start/end of input (say, 0x10000 and 0x10001) which we feed
      to the interpreter before respectively after the actual input.
      I suspect that this approach wouldn't make much of a difference for
      NFA execution. It would likely speed up (lazy) DFA execution though
      because assertions would be dealt with in the fast path.
    
    Cq-Include-Trybots: luci.v8.try:v8_linux64_fyi_rel_ng
    Bug: v8:10765
    Change-Id: Ic2012c943e0ce54eb8662789fb3d4c1b6cd8d606
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2398644
    Commit-Queue: Martin Bidlingmaier <mbid@google.com>
    Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
    Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#70026}
    e83511c2
Name
Last commit
Last update
..
api Loading commit data...
asmjs Loading commit data...
ast Loading commit data...
base Loading commit data...
builtins Loading commit data...
codegen Loading commit data...
common Loading commit data...
compiler Loading commit data...
compiler-dispatcher Loading commit data...
d8 Loading commit data...
date Loading commit data...
debug Loading commit data...
deoptimizer Loading commit data...
diagnostics Loading commit data...
execution Loading commit data...
extensions Loading commit data...
flags Loading commit data...
handles Loading commit data...
heap Loading commit data...
ic Loading commit data...
init Loading commit data...
inspector Loading commit data...
interpreter Loading commit data...
json Loading commit data...
libplatform Loading commit data...
libsampler Loading commit data...
logging Loading commit data...
numbers Loading commit data...
objects Loading commit data...
parsing Loading commit data...
profiler Loading commit data...
protobuf Loading commit data...
regexp Loading commit data...
roots Loading commit data...
runtime Loading commit data...
sanitizer Loading commit data...
snapshot Loading commit data...
strings Loading commit data...
tasks Loading commit data...
third_party Loading commit data...
torque Loading commit data...
tracing Loading commit data...
trap-handler Loading commit data...
utils Loading commit data...
wasm Loading commit data...
zone Loading commit data...
DEPS Loading commit data...
OWNERS Loading commit data...