• Jakob Gruber's avatar
    [regexp] Fix CharacterRange limits again again again · 2e17aaca
    Jakob Gruber authored
    When emitting code, character ranges must only specify ranges which
    the actual subject string (one- or two-byte) may contain.
    
    This was not always the case, specifically for ranges with
    `from <= kMaxUint8` and `to > kMaxUint8`.
    
    The reason this is so tricky: 1. not all parts of the pipeline know
    whether we are compiling for one- or two-byte subjects; 2. for
    case-insensitive regexps, an out-of-bounds CharacterRange may have an
    in-bounds case equivalent (e.g. /[Ÿ]/i also matches 'ÿ' == \u{ff}),
    which only gets added somewhere in the middle of the pipeline.
    
    Our current solution is to clamp immediately before code emission. We
    also keep the existing handling/dchecks of the 0x10ffff marker value
    which may occur in the two-byte subject case.
    
    Bug: v8:11069
    Change-Id: Ic7b34a13a900ea2aa3df032daac9236bf5682a42
    Fixed: chromium:1275096
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3306569
    Commit-Queue: Jakob Gruber <jgruber@chromium.org>
    Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
    Cr-Commit-Position: refs/heads/main@{#78186}
    2e17aaca
regexp-compiler-tonode.cc 64.4 KB