[regexp] Use stricter bounds check to avoid additional iteration
The motivating example is JetStream 2's UniPoker test, which tests whether a sorted string of Unicode playing cards contains a five-card straight using a regular expression. In the top-level generated loop for this RegExp, we see this loop exit condition: 00000350000C2067 27 83fffe cmpl rdi,0xfe 00000350000C206A 2a 0f8da8e40000 jge 00000350000D0518 <+0xe4d8> Meaning if the current position is pointing at the very last (16-bit) character, then we exit the loop. Otherwise we go on and try to find various matches starting at the current position. However, we can see in the original expression that any possible match is at least 10 characters (5 astral-plane Unicode values), so we're wasting a lot of time attempting to find matches in cases where we're too close to the end of the string for any match to succeed. This example might be a bit contrived, but I expect that an improvement in this bounds check would help a larger family of regular expressions, where the minimum match length is large relative to the string being matched and we don't meet the other necessary criteria for fast Boyer- Moore lookahead. To get the desired bounds check in this case, this patch does the following: 1. Compute accurate EatsAtLeast values for every node during the analysis phase. This could end up doing more work than the current implementation, but analysis already has to touch every node, so it seems like a cache-friendly time to compute these values. In some cases, this might be less total work than the current implementation, because the current implementation might recompute the same node multiple times. 2. When emitting a quick check, use the EatsAtLeast value from the predecessor ChoiceNode for the bounds check. This improves the UniPoker score on my machine by about 4%, because it cuts the time spent checking for straights roughly in half, and checking for straights originally accounted for about 8% of the total time. Bug: v8:9305 Change-Id: I110b190c2578f73b2263259d5aa5750e921b01be Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1702125 Commit-Queue: Seth Brenith <seth.brenith@microsoft.com> Reviewed-by: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#62919}
Showing
This diff is collapsed.
This diff is collapsed.
Please
register
or
sign in
to comment