Commit bc4cbe92 authored by Jakob Gruber's avatar Jakob Gruber Committed by Commit Bot

[regexp] Fix BoyerMooreLookahead behavior at submatches

Since https://codereview.chromium.org/2777583003, the Boyer-Moore
lookahead (used by the irregexp engine) also looks inside submatches
to narrow down its range of accepted characters at specific offsets.

But the end of a submatch, designated by a PositiveSubmatchSuccess
action node, was not handled correctly. When a submatch terminates,
we have no knowledge of what may follow, and thus must accept any
character at following positions. This is done by the SetRest call
added in this CL.

An example, since this is fairly obscure:

/^.*?Y(((?=B?).)*)Y$/s

The initial non-greedy loop, together with the s flag,
will trigger an attempted Boyer-Moore lookahead. After this follows
an unconditional Y, a *-quantified loop matching any char and
containing a lookahead that matches either 1 B or 0 B's, and an
unconditional trailing Y.

When the BM lookahead scans the subject string for the beginning of
this pattern after the non-greedy loop, it should look for: a Y at
offset 0, and either a B, a Y, or '.' (-> any character) at offset 1.

Prior to this CL this was not the case:

- The lookaround is internally generated as a submatch.
- The optional 'B?' is unrolled into 'either B followed by submatch
  end' or 'submatch end'.
- Filling in BM infos terminates when encountering a submatch end.
  Thus in the former case we added B to the set of accepted characters
  and terminated, while in the latter case we simply terminated.o

This CL ensures that BM will accept any character at any offset at or
exceeding the first encountered submatch end.

Bug: v8:8770
Change-Id: Iff998ba307cd9669203846a9182798b8cf6a85dc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1679506
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: 's avatarErik Corry <erikcorry@chromium.org>
Reviewed-by: 's avatarYang Guo <yangguo@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62460}
parent 91aa3078
......@@ -1334,7 +1334,11 @@ int ActionNode::EatsAtLeast(int still_to_find, int budget, bool not_at_start) {
void ActionNode::FillInBMInfo(Isolate* isolate, int offset, int budget,
BoyerMooreLookahead* bm, bool not_at_start) {
if (action_type_ != POSITIVE_SUBMATCH_SUCCESS) {
if (action_type_ == POSITIVE_SUBMATCH_SUCCESS) {
// Anything may follow a positive submatch success, thus we need to accept
// all characters from this position onwards.
bm->SetRest(offset);
} else {
on_success()->FillInBMInfo(isolate, offset, budget - 1, bm, not_at_start);
}
SaveBMInfo(bm, not_at_start, offset);
......
// Copyright 2019 the V8 project authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
const re = /^.*?Y((?=X?).)*Y$/s;
const sult = "YABCY";
const result = re.exec(sult);
assertNotNull(result);
assertArrayEquals([sult, "C"], result);
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment