Commits · 1ac46e46a1f5dff94850977a3f73bf5f4fe2ca2a · Linshizhi / V8

10 Jun, 2020 1 commit

[regexp] Fix integer overflows in TextNode::GetQuickCheckDetails · a305d2de

Jakob Gruber authored 4 years ago

Several uc32 (= int32_t) fields were incorrectly treated as uc16
(= uint16_t):

CharacterRange::from()
CharacterRange::to()
QuickCheckDetails::Position::mask
QuickCheckDetails::Position::value

Bug: v8:10568
Change-Id: I9ea7d76e4a0cbc6ee681de2136c398cdc622bca2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2230527
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68290}

a305d2de

28 Apr, 2020 1 commit

[regexp] Handlify RegExpCompileData::code · 6bb3f0c0

Iain Ireland authored 4 years ago

RegExpMacroAssembler::GetCode returns a Handle<Object>. However, that
Handle is almost immediately dereferenced, and is stored as a bare
Object in both RegExpCompiler::CompilationResult and RegExpCompileData.

This makes SpiderMonkey's rooting hazard analysis somewhat
antsy. While RegExpCompileData is alive on the stack, the hazard
analysis will not allow any calls that might GC, because it isn't
smart enough to prove that the code field can't be clobbered by a GC.

As far as I can tell, there is no real hazard here, but storing a
Handle in RegExpCompileData instead of a bare Object will simplify SM
and prevent a future patch from accidentally breaking something.

Bug: v8:10406
Change-Id: I9642dd05c591bfd23b340a89df2f2bf5c9fcac2c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2161578Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67441}

6bb3f0c0

21 Apr, 2020 1 commit

[regexp] Factor out PreprocessRegExp · 58ac66b7

Iain Ireland authored 4 years ago

RegExpImpl::Compile does a number of transformations that require
directly manipulating the internal representation of the regexp. For
example, when matching a (non-sticky, non-anchored) regular
expression, the pattern must be wrapped in .* so that it can match
anywhere in the input.

In the interest of moving towards a cleaner division between irregexp
and the outside world, it makes sense to move this code into
RegExpCompiler.

R=jgruber@chromium.org

Bug: v8:10406
Change-Id: I6da251c91c0016914a51480f80bb46c337fd0b23
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2140246Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67262}

58ac66b7

19 Mar, 2020 3 commits

Reland "[regexp] Rewrite error handling" · 560f2d8b

Iain Ireland authored 4 years ago

This is a reland of e80ca24c

Original change's description:
> [regexp] Rewrite error handling
>
> This patch modifies irregexp's error handling. Instead of representing
> errors as C strings, they are represented as an enumeration value
> (RegExpError), and only converted to strings when throwing the error
> object in regexp.cc. This makes it significantly easier to integrate
> into SpiderMonkey. A few notes:
>
> 1. Depending on whether the stack overflows during parsing or
>    analysis, the stack overflow message can vary ("Stack overflow" or
>    "Maximum call stack size exceeded"). I kept that behaviour in this
>    patch, under the assumption that stack overflow messages are
>    (sadly) the sorts of things that real world code ends up depending
>    on.
>
> 2. Depending on the point in code where the error was identified,
>    invalid unicode escapes could be reported as "Invalid Unicode
>    escape", "Invalid unicode escape", or "Invalid Unicode escape
>    sequence". I fervently hope that nobody depends on the specific
>    wording of a syntax error, so I standardized on the first one. (It
>    was both the most common, and the most consistent with other
>    "Invalid X escape" messages.)
>
> 3. In addition to changing the representation, this patch also adds an
>    error_pos field to RegExpParser and RegExpCompileData, which stores
>    the position at which an error occurred. This is used by
>    SpiderMonkey to provide more helpful messages about where a syntax
>    error occurred in large regular expressions.
>
> 4. This model is closer to V8's existing MessageTemplate
>    infrastructure. I considered trying to integrate it more closely
>    with MessageTemplate, but since one of our stated goals for this
>    project was to make it easier to use irregexp outside of V8, I
>    decided to hold off.
>
> R=jgruber@chromium.org
>
> Bug: v8:10303
> Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#66784}

R=jgruber@chromium.org

Bug: v8:10303
Change-Id: Iad1f11a0e0b9e525d7499aacb56c27eff9e7c7b5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2109952Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66798}

560f2d8b

Revert "[regexp] Rewrite error handling" · 2193f691

Leszek Swirski authored 4 years ago

This reverts commit e80ca24c.

Reason for revert: Causes failures in the fast/regex/non-pattern-characters.html Blink web test (https://ci.chromium.org/p/v8/builders/ci/V8%20Blink%20Linux/3679)

Original change's description:
> [regexp] Rewrite error handling
> 
> This patch modifies irregexp's error handling. Instead of representing
> errors as C strings, they are represented as an enumeration value
> (RegExpError), and only converted to strings when throwing the error
> object in regexp.cc. This makes it significantly easier to integrate
> into SpiderMonkey. A few notes:
> 
> 1. Depending on whether the stack overflows during parsing or
>    analysis, the stack overflow message can vary ("Stack overflow" or
>    "Maximum call stack size exceeded"). I kept that behaviour in this
>    patch, under the assumption that stack overflow messages are
>    (sadly) the sorts of things that real world code ends up depending
>    on.
> 
> 2. Depending on the point in code where the error was identified,
>    invalid unicode escapes could be reported as "Invalid Unicode
>    escape", "Invalid unicode escape", or "Invalid Unicode escape
>    sequence". I fervently hope that nobody depends on the specific
>    wording of a syntax error, so I standardized on the first one. (It
>    was both the most common, and the most consistent with other
>    "Invalid X escape" messages.)
> 
> 3. In addition to changing the representation, this patch also adds an
>    error_pos field to RegExpParser and RegExpCompileData, which stores
>    the position at which an error occurred. This is used by
>    SpiderMonkey to provide more helpful messages about where a syntax
>    error occurred in large regular expressions.
> 
> 4. This model is closer to V8's existing MessageTemplate
>    infrastructure. I considered trying to integrate it more closely
>    with MessageTemplate, but since one of our stated goals for this
>    project was to make it easier to use irregexp outside of V8, I
>    decided to hold off.
> 
> R=jgruber@chromium.org
> 
> Bug: v8:10303
> Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#66784}

TBR=jgruber@chromium.org,iireland@mozilla.com

Change-Id: I9247635f3c5b17c943b9c4abaf82ebe7b2de165e
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:10303
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2108550Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66786}

2193f691

[regexp] Rewrite error handling · e80ca24c

Iain Ireland authored 4 years ago

This patch modifies irregexp's error handling. Instead of representing
errors as C strings, they are represented as an enumeration value
(RegExpError), and only converted to strings when throwing the error
object in regexp.cc. This makes it significantly easier to integrate
into SpiderMonkey. A few notes:

1. Depending on whether the stack overflows during parsing or
   analysis, the stack overflow message can vary ("Stack overflow" or
   "Maximum call stack size exceeded"). I kept that behaviour in this
   patch, under the assumption that stack overflow messages are
   (sadly) the sorts of things that real world code ends up depending
   on.

2. Depending on the point in code where the error was identified,
   invalid unicode escapes could be reported as "Invalid Unicode
   escape", "Invalid unicode escape", or "Invalid Unicode escape
   sequence". I fervently hope that nobody depends on the specific
   wording of a syntax error, so I standardized on the first one. (It
   was both the most common, and the most consistent with other
   "Invalid X escape" messages.)

3. In addition to changing the representation, this patch also adds an
   error_pos field to RegExpParser and RegExpCompileData, which stores
   the position at which an error occurred. This is used by
   SpiderMonkey to provide more helpful messages about where a syntax
   error occurred in large regular expressions.

4. This model is closer to V8's existing MessageTemplate
   infrastructure. I considered trying to integrate it more closely
   with MessageTemplate, but since one of our stated goals for this
   project was to make it easier to use irregexp outside of V8, I
   decided to hold off.

R=jgruber@chromium.org

Bug: v8:10303
Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66784}

e80ca24c

12 Mar, 2020 1 commit

[regexp] Use ZoneVector in parser and compiler · 5b44c169

Iain Ireland authored 4 years ago

For a variety of reasons related to OOM handling and custom
allocators, SpiderMonkey wants to be able to see all memory
allocations. To enforce this, we have a static analysis that verifies
that we don't link in malloc/new/etc in unexpected places. One
consequence of this is that we can't use STL containers without a
custom allocator, because they call operator new internally.

This is mostly not an issue in irregexp, which makes heavy use of zone
allocation. The main exceptions are a handful of uses of std::vector
in regexp-compiler.* and regexp-parser.*. If these vectors are
converted to ZoneVectors, then our static analysis is satisfied.

R=jgruber@chromium.org

Bug: v8:10303
Change-Id: I8b14a2eb54d3b20959e3fbe878f77effae124a2c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091402Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66674}

5b44c169

04 Sep, 2019 1 commit

[regexp] Minor performance improvements and cleanups · 019fa2b5

Patrick Thier authored 5 years ago

Instead of checking code flags to decide if the irregexp code object is
an off-heap trampoline, we now directly load the builtin index offset
and treat the code as on-heap if the offset is -1.

In addition the regexp stack now has its own external reference for top
of stack address. This prevents calculating the top of stack address
using the base address and size at every invocation.

Bug: chromium:999993
Change-Id: I23649e8b410a56276f26846b0b12ad29310c3db7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1782565Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Patrick Thier <pthier@google.com>
Cr-Commit-Position: refs/heads/master@{#63548}

019fa2b5

29 Aug, 2019 1 commit

[regexp] Consolidate calls to jitted irregexp and regexp interpreter · 213504b9

Patrick Thier authored 5 years ago

The code fields in a JSRegExp object now either contain irregexp
compiled code or a trampoline to the interpreter. This way the code
can be executed without explicitly checking if the regexp shall be
interpreted or executed natively.
In case of interpreted regexp the generated bytecode is now stored in
its own fields instead of the code fields for Latin1 and UC16
respectively.
The signatures of the jitted irregexp match and the regexp interpreter
have been equalized.

Bug: v8:9516
Change-Id: I30e3d86f4702a902d3387bccc1ee91dea501fe4e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1762513
Commit-Queue: Patrick Thier <pthier@google.com>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63457}

213504b9

31 Jul, 2019 1 commit

Reland "[regexp] Better quick checks on loop entry nodes" · bea0ffd0

Seth Brenith authored 5 years ago

This is a reland of 4b15b984

Updates since original: fix an arithmetic overflow bug, remove an invalid
DCHECK, add a unit test that would trigger that DCHECK.

Original change's description:
> [regexp] Better quick checks on loop entry nodes
>
> Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this
> change is inspired by attempting to exit earlier from generated RegExp
> code, when no further matches are possible because any match would be
> too long. The motivating example this time is the following expression,
> which tests whether a string of Unicode playing cards has five of the
> same suit in a row:
>
> /([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u
>
> A human reading this expression can readily see that any match requires
> at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for
> each repeated option reports its minimum distance to the end of a match
> as zero. This is correct, because the LoopChoiceNode's behavior depends
> on additional state (the loop counter). However, the preceding node, a
> SET_REGISTER action that initializes the loop counter, could confidently
> state that it consumes at least 10 characters. Furthermore, when we try
> to emit a quick check for that action, we could follow only paths from
> the LoopChoiceNode that are possible based on the minimum iteration
> count. This change implements both of those "could"s.
>
> I expect this improvement to apply pretty broadly to expressions that
> use minimum repetition counts and that don't meet the criteria for
> unrolling. In this particular case, I get about 12% improvement on the
> overall UniPoker test, due to reducing the execution time of this
> expression by 85% and the execution time of another similar expression
> that checks for n-of-a-kind by 20%.
>
> Bug: v8:9305
>
> Change-Id: I319e381743967bdf83324be75bae943fbb5dd496
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941
> Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#62963}

Bug: v8:9305
Change-Id: I992070d383009013881bf778242254c27134b650
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1726674Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Cr-Commit-Position: refs/heads/master@{#63009}

bea0ffd0

30 Jul, 2019 1 commit

Revert "[regexp] Better quick checks on loop entry nodes" · 51afbd1a

Leszek Swirski authored 5 years ago

This reverts commit 4b15b984.

Reason for revert: UBSan failure (https://logs.chromium.org/logs/v8/buildbucket/cr-buildbucket.appspot.com/8906578530303352544/+/steps/Check/0/logs/regress-126412/0).

Original change's description:
> [regexp] Better quick checks on loop entry nodes
> 
> Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this
> change is inspired by attempting to exit earlier from generated RegExp
> code, when no further matches are possible because any match would be
> too long. The motivating example this time is the following expression,
> which tests whether a string of Unicode playing cards has five of the
> same suit in a row:
> 
> /([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u
> 
> A human reading this expression can readily see that any match requires
> at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for
> each repeated option reports its minimum distance to the end of a match
> as zero. This is correct, because the LoopChoiceNode's behavior depends
> on additional state (the loop counter). However, the preceding node, a
> SET_REGISTER action that initializes the loop counter, could confidently
> state that it consumes at least 10 characters. Furthermore, when we try
> to emit a quick check for that action, we could follow only paths from
> the LoopChoiceNode that are possible based on the minimum iteration
> count. This change implements both of those "could"s.
> 
> I expect this improvement to apply pretty broadly to expressions that
> use minimum repetition counts and that don't meet the criteria for
> unrolling. In this particular case, I get about 12% improvement on the
> overall UniPoker test, due to reducing the execution time of this
> expression by 85% and the execution time of another similar expression
> that checks for n-of-a-kind by 20%.
> 
> Bug: v8:9305
> 
> Change-Id: I319e381743967bdf83324be75bae943fbb5dd496
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941
> Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#62963}

TBR=jgruber@chromium.org,seth.brenith@microsoft.com

Change-Id: Iac085b75e054fdf0d218987cfe449be1f1630545
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:9305
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1725621Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62977}

51afbd1a

29 Jul, 2019 1 commit

[regexp] Better quick checks on loop entry nodes · 4b15b984

Seth Brenith authored 5 years ago

Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this
change is inspired by attempting to exit earlier from generated RegExp
code, when no further matches are possible because any match would be
too long. The motivating example this time is the following expression,
which tests whether a string of Unicode playing cards has five of the
same suit in a row:

/([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u

A human reading this expression can readily see that any match requires
at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for
each repeated option reports its minimum distance to the end of a match
as zero. This is correct, because the LoopChoiceNode's behavior depends
on additional state (the loop counter). However, the preceding node, a
SET_REGISTER action that initializes the loop counter, could confidently
state that it consumes at least 10 characters. Furthermore, when we try
to emit a quick check for that action, we could follow only paths from
the LoopChoiceNode that are possible based on the minimum iteration
count. This change implements both of those "could"s.

I expect this improvement to apply pretty broadly to expressions that
use minimum repetition counts and that don't meet the criteria for
unrolling. In this particular case, I get about 12% improvement on the
overall UniPoker test, due to reducing the execution time of this
expression by 85% and the execution time of another similar expression
that checks for n-of-a-kind by 20%.

Bug: v8:9305

Change-Id: I319e381743967bdf83324be75bae943fbb5dd496
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62963}

4b15b984

25 Jul, 2019 1 commit

[regexp] Use stricter bounds check to avoid additional iteration · f42b1a5d

Seth Brenith authored 5 years ago

The motivating example is JetStream 2's UniPoker test, which tests
whether a sorted string of Unicode playing cards contains a five-card
straight using a regular expression. In the top-level generated loop for
this RegExp, we see this loop exit condition:

00000350000C2067    27  83fffe         cmpl rdi,0xfe
00000350000C206A    2a  0f8da8e40000   jge 00000350000D0518  <+0xe4d8>

Meaning if the current position is pointing at the very last (16-bit)
character, then we exit the loop. Otherwise we go on and try to find
various matches starting at the current position. However, we can see
in the original expression that any possible match is at least 10
characters (5 astral-plane Unicode values), so we're wasting a lot of
time attempting to find matches in cases where we're too close to the
end of the string for any match to succeed.

This example might be a bit contrived, but I expect that an improvement
in this bounds check would help a larger family of regular expressions,
where the minimum match length is large relative to the string being
matched and we don't meet the other necessary criteria for fast Boyer-
Moore lookahead.

To get the desired bounds check in this case, this patch does the
following:
1. Compute accurate EatsAtLeast values for every node during the
   analysis phase. This could end up doing more work than the current
   implementation, but analysis already has to touch every node, so it
   seems like a cache-friendly time to compute these values. In some
   cases, this might be less total work than the current implementation,
   because the current implementation might recompute the same node
   multiple times.
2. When emitting a quick check, use the EatsAtLeast value from the
   predecessor ChoiceNode for the bounds check.

This improves the UniPoker score on my machine by about 4%, because it
cuts the time spent checking for straights roughly in half, and checking
for straights originally accounted for about 8% of the total time.

Bug: v8:9305
Change-Id: I110b190c2578f73b2263259d5aa5750e921b01be
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1702125
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62919}

f42b1a5d

27 Jun, 2019 1 commit

[regexp] Refactor BoyerMoorePositionInfo uses · ad68a376

Jakob Gruber authored 5 years ago

BoyerMoorePositionInfo is a simple wrapper around a bitset and an
associated ContainedInLattice field. This CL refactors bitset-related
operations that used to be implemented naively (e.g.: loop over all
bits to find a single set bit, or to generate a union of two bitsets).

Instead, use more suitable methods from base::bits and std::bitset.

Drive-by: Remove dead class members.
Drive-by: Zero the ByteArray with memset.

Bug: v8:9359
Change-Id: I33923c7d216320f4e3d7e4a6df2967f4aa86ab05
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1667407
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62419}

ad68a376

19 Jun, 2019 2 commits

[regexp] Refactor OutSet and BoyerMoorePositionInfo · 4fe611ec

Jakob Gruber authored 5 years ago

Outset:
The more advanced features of OutSet are no longer used, thus the
rename to DynamicBitSet to reflect its current purpose.

BoyerMoorePositionInfo:
Use bitset backing store in BoyerMoorePositionInfo (previously this
was based on a (statically-sized) ZoneList<bool>).

Bug: v8:9359
Change-Id: I40ca89467ae90ee90c616be5fd0d51e54e94e157
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1664064
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62277}

4fe611ec

[regexp] Remove unused DispatchTable and ZoneSplayTree · 3663e834

Jakob Gruber authored 5 years ago

Bug: v8:9359
Change-Id: I237f16324ff036f2cbfb7ca97b4ac208442b06cc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1664056
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Sigurd Schneider <sigurds@chromium.org>
Reviewed-by: Sigurd Schneider <sigurds@chromium.org>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62268}

3663e834

18 Jun, 2019 3 commits

[regexp] Simplify UnicodeRangeSplitter · 83da1c2d

Jakob Gruber authored 5 years ago

This class used to be based on DispatchTable, which itself uses an
interval tree to both categorize and canonicalize ranges
(i.e. such that no overlap and all immediately adjacent ranges are
merged). The produced ranges were then entered into lists for
{bmp,lead_surrogate,trail_surrogate,non_bmp} splits.

With this CL, we simplify to a plain loop over all character range
kinds instead. The dispatch table (and ZoneSplayList, perhaps
SplayList) can be removed in follow-ups.

Bug: v8:9359
Change-Id: I9c6b72f3bc44d1557af7c74419709ae5662611f1
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1664053
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62260}

83da1c2d

[regexp] Remove dead DispatchTableConstructor · 6155b33a

Jakob Gruber authored 5 years ago

Bug: v8:9359
Change-Id: I1b490c928ed884f4ad33e005699f98614be75233
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1662306
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62242}

6155b33a

[regexp] Move AddRange utility function · 2c2dd893

Jakob Gruber authored 5 years ago

Move this straggler to its use-site in regexp-compiler.cc.

Bug: v8:9359
Change-Id: Ia5393140de5a1c8d70ac410ef6239eabfec130b3
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1662303
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62241}

2c2dd893

17 Jun, 2019 2 commits

[regexp] Reduce public API surface · c7d57dd3

Jakob Gruber authored 5 years ago

This further reduces the number of things declared in the public
regexp API file, currently still named jsregexp.h.

* Move JSRegExp::Flags convenience functions to regexp-compiler.h.
* Set RegExpImpl methods private if possible (these will later be
  moved to a new hidden impl class).
* Merge RegExpEngine::CompilationResult into RegExpCompileData.
* Move remaining RegExpEngine methods to RegExpImpl and delete
  RegExpEngine.
* Extract RegExpGlobalCache.
* Document a few data structures.

Upcoming CLs will rename RegExpImpl to RegExp and jsregexp.h to
regexp.h. This should then be the only header included from other
directories.

Bug: v8:9359
Change-Id: I78c8f4cca495a2b95735a48b6181583bc3310bdf
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1662294Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62218}

c7d57dd3

[regexp] Extract more parts of the regexp compiler · def9aa5d

Jakob Gruber authored 5 years ago

Bug: v8:9359
Change-Id: I06a4ccc53abff25237a1113774a0b17bdf861c86
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1658157Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62198}

def9aa5d

13 Jun, 2019 3 commits

Reland "[regexp] Move AST-to-Node code to a dedicated file" · d61a558a

Jakob Gruber authored 5 years ago

This is a reland of 811bfbbc

Original change's description:
> [regexp] Move AST-to-Node code to a dedicated file
>
> Prior to this CL, jsregexp contains a bunch of things that are slightly
> related but would be cleaner in separate files, including: AST-to-Node
> transformations, the compiler implementation, and a debugging printer.
>
> This CL extracts AST-to-Node transformations.
>
> Bug: v8:9359
> Change-Id: I030cfca5c40cfd72e3a7abe2188e4654cfe2277c
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1655303
> Auto-Submit: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Yang Guo <yangguo@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#62148}

Tbr: yangguo@chromium.org
Bug: v8:9359
Change-Id: I68a16086dc56c9a059547033ca8bc1e9de1080db
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1658568Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62154}

d61a558a

Revert "[regexp] Move AST-to-Node code to a dedicated file" · ee279dc2

Leszek Swirski authored 5 years ago

This reverts commit 811bfbbc.

Reason for revert: Breaks noi18n build (https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20noi18n%20-%20debug/27201)

Original change's description:
> [regexp] Move AST-to-Node code to a dedicated file
> 
> Prior to this CL, jsregexp contains a bunch of things that are slightly
> related but would be cleaner in separate files, including: AST-to-Node
> transformations, the compiler implementation, and a debugging printer.
> 
> This CL extracts AST-to-Node transformations.
> 
> Bug: v8:9359
> Change-Id: I030cfca5c40cfd72e3a7abe2188e4654cfe2277c
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1655303
> Auto-Submit: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Yang Guo <yangguo@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#62148}

TBR=yangguo@chromium.org,jgruber@chromium.org,petermarshall@chromium.org

Change-Id: I079e15b02d73d81aef806992f324f08d7008e367
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:9359
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1658160Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62149}

ee279dc2

[regexp] Move AST-to-Node code to a dedicated file · 811bfbbc

Jakob Gruber authored 5 years ago

Prior to this CL, jsregexp contains a bunch of things that are slightly
related but would be cleaner in separate files, including: AST-to-Node
transformations, the compiler implementation, and a debugging printer.

This CL extracts AST-to-Node transformations.

Bug: v8:9359
Change-Id: I030cfca5c40cfd72e3a7abe2188e4654cfe2277c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1655303
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62148}

811bfbbc