-
Jakob Gruber authored
Since https://codereview.chromium.org/2777583003, the Boyer-Moore lookahead (used by the irregexp engine) also looks inside submatches to narrow down its range of accepted characters at specific offsets. But the end of a submatch, designated by a PositiveSubmatchSuccess action node, was not handled correctly. When a submatch terminates, we have no knowledge of what may follow, and thus must accept any character at following positions. This is done by the SetRest call added in this CL. An example, since this is fairly obscure: /^.*?Y(((?=B?).)*)Y$/s The initial non-greedy loop, together with the s flag, will trigger an attempted Boyer-Moore lookahead. After this follows an unconditional Y, a *-quantified loop matching any char and containing a lookahead that matches either 1 B or 0 B's, and an unconditional trailing Y. When the BM lookahead scans the subject string for the beginning of this pattern after the non-greedy loop, it should look for: a Y at offset 0, and either a B, a Y, or '.' (-> any character) at offset 1. Prior to this CL this was not the case: - The lookaround is internally generated as a submatch. - The optional 'B?' is unrolled into 'either B followed by submatch end' or 'submatch end'. - Filling in BM infos terminates when encountering a submatch end. Thus in the former case we added B to the set of accepted characters and terminated, while in the latter case we simply terminated.o This CL ensures that BM will accept any character at any offset at or exceeding the first encountered submatch end. Bug: v8:8770 Change-Id: Iff998ba307cd9669203846a9182798b8cf6a85dc Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1679506 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by: Erik Corry <erikcorry@chromium.org> Reviewed-by: Yang Guo <yangguo@chromium.org> Auto-Submit: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#62460}
bc4cbe92