-
Jakob Gruber authored
Large character classes may easily be created when unicode properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are expanded internally into character classes that consist of hundreds of character ranges. Previously to this CL, we'd emit branching code for each of these ranges, leading to very large regexp code objects. This CL adds a new codegen mode for large character classes (where 'large' currently means > 16 ranges). Instead of emitting branching code inline, the ranges are written into a ByteArray and we call into the C function IsCharacterInRangeArray for the actual branching logic. The ByteArray is smaller than emitted code and is deduplicated if the same character class is matched repeatedly in the same pattern. Note this mode is *not* implemented for the interpreter, since we currently don't have a constant pool for irregexp bytecode, and thus cannot reference ByteArrays. Bug: v8:11069 Change-Id: I2d728e42d85114b796c637f791848731a104cd54 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3229377Reviewed-by: Patrick Thier <pthier@chromium.org> Auto-Submit: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/main@{#77463}
8bbb44e5
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
regexp-macro-assembler-x64.cc | ||
regexp-macro-assembler-x64.h |