src/regexp/ppc · 31abbcfb4d708743ecadf5a9ea3e0a0cf6977936 · Linshizhi / V8

PPC/s390: [regexp] Compact codegen for large character classes · 841d33a5

Milad Fa authored Oct 19, 2021

Port 8bbb44e5

Original Commit Message:

    Large character classes may easily be created when unicode
    properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
    expanded internally into character classes that consist of hundreds
    of character ranges. Previously to this CL, we'd emit branching code
    for each of these ranges, leading to very large regexp code objects.

    This CL adds a new codegen mode for large character classes (where
    'large' currently means > 16 ranges). Instead of emitting branching
    code inline, the ranges are written into a ByteArray and we call into
    the C function IsCharacterInRangeArray for the actual branching logic.
    The ByteArray is smaller than emitted code and is deduplicated if the
    same character class is matched repeatedly in the same pattern.

    Note this mode is *not* implemented for the interpreter, since we
    currently don't have a constant pool for irregexp bytecode, and thus
    cannot reference ByteArrays.

R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
BUG=
LOG=N

Change-Id: I2ded01fa2767e56e72be81b949eefb5fb85b7013
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3231981Reviewed-by: Junliang Yan <junyan@redhat.com>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77473}

841d33a5

Name	Last commit	Last update
..
regexp-macro-assembler-ppc.cc		Loading commit data...
regexp-macro-assembler-ppc.h		Loading commit data...