[regexp] Propagate eats_at_least for negative lookahead
In issue 11290, we disabled the propagation of EAL data out of lookarounds, because it was incorrect for lookahead nodes in loops. This caused performance regressions: for example, `/^\P{Letter}+$/u` (matching only characters that are not in Unicode's Letter category) uses negative lookahead when matching lone surrogates, and became about 2x slower. I spent some time looking into fixes, and this is what I've settled on. Some background: the implementation of lookarounds in irregexp is split between positive and negative lookaheads. (Lookbehinds aren't relevant here, because backwards matches always have EAL=0.) Positive lookaheads are wrapped in BEGIN_SUBMATCH and POSITIVE_SUBMATCH_SUCCESS ActionNodes. BEGIN_SUBMATCH saves the current state. POSITIVE_SUBMATCH_SUCCESS restores the necessary state (while leaving any captures that occurred during the lookaround intact). Negative lookaheads also begin with a BEGIN_SUBMATCH node, but follow it with a NegativeLookaroundChoiceNode. This node has two successors: a lookaround node, and a continue node. It only executes the continue node if the lookaround node backtracks, which automatically restores the previous state. Negative lookarounds also can't update captures. This affects EAL calculations. It turns out that negative lookaheads are already doing the right thing: EatsAtLeastPropagator only propagates information from the continue node, ignoring the lookaround node. The same is true for quick checks (see the comment in RegExpLookaround:Builder::ForMatch). A BEGIN_SUBMATCH for a negative lookahead can simply propagate the EAL data from its successor like any other ActionNode, and everything works. Positive lookaheads are harder. I tried saving a pointer to the successor in BEGIN_SUBMATCH, but ran into problems in FillInBMInfo, because the EAL value corresponded to the nodes after the lookahead, but the analysis was still looking at the nodes inside. I fell back to a more modest approach: split BEGIN_SUBMATCH in two, and propagate EAL info for BEGIN_NEGATIVE_SUBMATCH while keeping the current behaviour for BEGIN_POSITIVE_SUBMATCH. This fixes the performance regression at hand. Two potential approaches for fixing EAL for positive lookahead are: 1. Handling positive lookahead with its own dedicated choice node, like NegativeLookaroundChoiceNode. 2. Adding an eats_at_least_inside_loop field to EatsAtLeastInfo, which is <= eats_at_least_from_possibly_start, and using that value in EatsAtLeastFromLoopEntry. Both of those approaches are more complex than I want to tackle right now, though. Bug: v8:11844 Change-Id: I2a43509c2c21194b8c18f0a587fa21c194db76c2 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2934858Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#75031}
Showing
Please
register
or
sign in
to comment