• Iain Ireland's avatar
    [regexp] Propagate eats_at_least for negative lookahead · 363ab5ae
    Iain Ireland authored
    In issue 11290, we disabled the propagation of EAL data out of
    lookarounds, because it was incorrect for lookahead nodes in
    loops. This caused performance regressions: for example,
    `/^\P{Letter}+$/u` (matching only characters that are not in Unicode's
    Letter category) uses negative lookahead when matching lone
    surrogates, and became about 2x slower. I spent some time looking into
    fixes, and this is what I've settled on.
    
    Some background: the implementation of lookarounds in irregexp is
    split between positive and negative lookaheads. (Lookbehinds aren't
    relevant here, because backwards matches always have EAL=0.)  Positive
    lookaheads are wrapped in BEGIN_SUBMATCH and POSITIVE_SUBMATCH_SUCCESS
    ActionNodes. BEGIN_SUBMATCH saves the current state.
    POSITIVE_SUBMATCH_SUCCESS restores the necessary state (while leaving
    any captures that occurred during the lookaround intact).
    
    Negative lookaheads also begin with a BEGIN_SUBMATCH node, but follow
    it with a NegativeLookaroundChoiceNode. This node has two successors:
    a lookaround node, and a continue node. It only executes the continue
    node if the lookaround node backtracks, which automatically restores
    the previous state. Negative lookarounds also can't update captures.
    
    This affects EAL calculations. It turns out that negative lookaheads
    are already doing the right thing: EatsAtLeastPropagator only
    propagates information from the continue node, ignoring the lookaround
    node. The same is true for quick checks (see the comment in
    RegExpLookaround:Builder::ForMatch). A BEGIN_SUBMATCH for a negative
    lookahead can simply propagate the EAL data from its successor like
    any other ActionNode, and everything works.
    
    Positive lookaheads are harder. I tried saving a pointer to the
    successor in BEGIN_SUBMATCH, but ran into problems in FillInBMInfo,
    because the EAL value corresponded to the nodes after the lookahead,
    but the analysis was still looking at the nodes inside. I fell back
    to a more modest approach: split BEGIN_SUBMATCH in two, and propagate
    EAL info for BEGIN_NEGATIVE_SUBMATCH while keeping the current
    behaviour for BEGIN_POSITIVE_SUBMATCH. This fixes the performance
    regression at hand.
    
    Two potential approaches for fixing EAL for positive lookahead are:
     1. Handling positive lookahead with its own dedicated choice node,
        like NegativeLookaroundChoiceNode.
     2. Adding an eats_at_least_inside_loop field to EatsAtLeastInfo,
        which is <= eats_at_least_from_possibly_start, and using that
        value in EatsAtLeastFromLoopEntry.
    
    Both of those approaches are more complex than I want to tackle
    right now, though.
    
    Bug: v8:11844
    Change-Id: I2a43509c2c21194b8c18f0a587fa21c194db76c2
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2934858Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
    Commit-Queue: Jakob Gruber <jgruber@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#75031}
    363ab5ae
Name
Last commit
Last update
..
api Loading commit data...
asmjs Loading commit data...
ast Loading commit data...
base Loading commit data...
baseline Loading commit data...
bigint Loading commit data...
builtins Loading commit data...
codegen Loading commit data...
common Loading commit data...
compiler Loading commit data...
compiler-dispatcher Loading commit data...
d8 Loading commit data...
date Loading commit data...
debug Loading commit data...
deoptimizer Loading commit data...
diagnostics Loading commit data...
execution Loading commit data...
extensions Loading commit data...
flags Loading commit data...
handles Loading commit data...
heap Loading commit data...
ic Loading commit data...
init Loading commit data...
inspector Loading commit data...
interpreter Loading commit data...
json Loading commit data...
libplatform Loading commit data...
libsampler Loading commit data...
logging Loading commit data...
numbers Loading commit data...
objects Loading commit data...
parsing Loading commit data...
profiler Loading commit data...
protobuf Loading commit data...
regexp Loading commit data...
roots Loading commit data...
runtime Loading commit data...
sanitizer Loading commit data...
snapshot Loading commit data...
strings Loading commit data...
tasks Loading commit data...
third_party Loading commit data...
torque Loading commit data...
tracing Loading commit data...
trap-handler Loading commit data...
utils Loading commit data...
wasm Loading commit data...
web-snapshot Loading commit data...
zone Loading commit data...
DEPS Loading commit data...
DIR_METADATA Loading commit data...
OWNERS Loading commit data...