Commit 4735f85f authored by jgruber's avatar jgruber Committed by Commit Bot

[regexp] Shortcut case-folding of entire non-bmp range

When the range of all non-bmp characters is passed to
AddUnicodeCaseEquivalents, icu::UnicodeSet::closeOver dutifully tries to
case-fold every single character in that range. Since we already know
this to be a nop, we can simply return instead.

This improves compilation time of /ui regexps by around 100x.

Bug: v8:6727
Change-Id: I79d73c77d6a54cbb5ad2cad0355214ed712b59b9
Reviewed-on: https://chromium-review.googlesource.com/635303
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: 's avatarYang Guo <yangguo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#47636}
parent 8f1a92ce
......@@ -5060,6 +5060,16 @@ RegExpNode* UnanchoredAdvance(RegExpCompiler* compiler,
void AddUnicodeCaseEquivalents(ZoneList<CharacterRange>* ranges, Zone* zone) {
#ifdef V8_INTL_SUPPORT
DCHECK(CharacterRange::IsCanonical(ranges));
// Micro-optimization to avoid passing large ranges to UnicodeSet::closeOver.
// See also https://crbug.com/v8/6727.
// TODO(jgruber): This only covers the special case of the {0,0x10FFFF} range,
// which we use frequently internally. But large ranges can also easily be
// created by the user. We might want to have a more general caching mechanism
// for such ranges.
if (ranges->length() == 1 && ranges->at(0).IsEverything(kNonBmpEnd)) return;
// Use ICU to compute the case fold closure over the ranges.
icu::UnicodeSet set;
for (int i = 0; i < ranges->length(); i++) {
......
......@@ -110,7 +110,7 @@ class CharacterRange {
uc32 to() const { return to_; }
void set_to(uc32 value) { to_ = value; }
bool is_valid() { return from_ <= to_; }
bool IsEverything(uc16 max) { return from_ == 0 && to_ >= max; }
bool IsEverything(uc32 max) { return from_ == 0 && to_ >= max; }
bool IsSingleton() { return (from_ == to_); }
static void AddCaseEquivalents(Isolate* isolate, Zone* zone,
ZoneList<CharacterRange>* ranges,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment