[regexp] Better quick checks on loop entry nodes
Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this change is inspired by attempting to exit earlier from generated RegExp code, when no further matches are possible because any match would be too long. The motivating example this time is the following expression, which tests whether a string of Unicode playing cards has five of the same suit in a row: /([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u A human reading this expression can readily see that any match requires at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for each repeated option reports its minimum distance to the end of a match as zero. This is correct, because the LoopChoiceNode's behavior depends on additional state (the loop counter). However, the preceding node, a SET_REGISTER action that initializes the loop counter, could confidently state that it consumes at least 10 characters. Furthermore, when we try to emit a quick check for that action, we could follow only paths from the LoopChoiceNode that are possible based on the minimum iteration count. This change implements both of those "could"s. I expect this improvement to apply pretty broadly to expressions that use minimum repetition counts and that don't meet the criteria for unrolling. In this particular case, I get about 12% improvement on the overall UniPoker test, due to reducing the execution time of this expression by 85% and the execution time of another similar expression that checks for n-of-a-kind by 20%. Bug: v8:9305 Change-Id: I319e381743967bdf83324be75bae943fbb5dd496 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941 Commit-Queue: Seth Brenith <seth.brenith@microsoft.com> Reviewed-by: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#62963}
Showing
Please
register
or
sign in
to comment