Commits · ea28ceee128a9b8c13e02e74e9b84b19bf1672f0 · Linshizhi / V8

13 May, 2022 2 commits

Replace STATIC_ASSERT with static_assert · dd74a023

Clemens Backes authored 2 years ago

Now that we require C++17 support, we can just use the standard
static_assert without message, instead of our STATIC_ASSERT macro.

R=leszeks@chromium.org

Bug: v8:12425
Change-Id: I1d4e39c310b533bcd3a4af33d027827e6c083afe
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3647353Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/main@{#80524}

dd74a023

Remove redundant (internal) FatalProcessOutOfMemory · 5d48c41f

Clemens Backes authored 2 years ago

Use V8::FatalProcessOutOfMemory directly instead.

R=mlippautz@chromium.org

Bug: chromium:1323177
Change-Id: Ib1efd9e8099c76cd9ae0ac412b2e37307a698f4f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3641176Reviewed-by: Patrick Thier <pthier@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/main@{#80517}

5d48c41f

20 Jan, 2022 1 commit

[regexp] Standardize handling of stack overflow crash in ToNode · 2edff884

Jakob Gruber authored 3 years ago

Use the FatalProcessOutOfMemory function such that tooling recognizes
these crashes as OOM's.

Drive-by: Skip one more test that leads to such stack overflows.

Fixed: v8:12555, chromium:1288456
Bug: v8:12472
Change-Id: Ib9203a4aa0487744f7cea9a212aeeffda579ae23
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3401861
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/main@{#78692}

2edff884

18 Jan, 2022 1 commit

[regexp] Periodically check for stack overflow during node generation · cbddd61d

Jakob Gruber authored 3 years ago

Recursive ToNode node generation may overflow the stack for large
graphs. As a quick fix, insert periodic stack overflow checks in
selected ToNode methods.

As a more permanent fix, in the future we could abort gracefully
(instead of crashing on a CHECK), and/or refactor into iterative node
generation.

Bug: v8:12472
Change-Id: Ie5fbe838c5f6a5192d7d9b44bfe6f6c76a8d26e7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3398112Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#78667}

cbddd61d

01 Dec, 2021 1 commit

[regexp] Fix CharacterRange limits again again again · 2e17aaca

Jakob Gruber authored 3 years ago

When emitting code, character ranges must only specify ranges which
the actual subject string (one- or two-byte) may contain.

This was not always the case, specifically for ranges with
`from <= kMaxUint8` and `to > kMaxUint8`.

The reason this is so tricky: 1. not all parts of the pipeline know
whether we are compiling for one- or two-byte subjects; 2. for
case-insensitive regexps, an out-of-bounds CharacterRange may have an
in-bounds case equivalent (e.g. /[Ÿ]/i also matches 'ÿ' == \u{ff}),
which only gets added somewhere in the middle of the pipeline.

Our current solution is to clamp immediately before code emission. We
also keep the existing handling/dchecks of the 0x10ffff marker value
which may occur in the two-byte subject case.

Bug: v8:11069
Change-Id: Ic7b34a13a900ea2aa3df032daac9236bf5682a42
Fixed: chromium:1275096
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3306569
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#78186}

2e17aaca

09 Nov, 2021 1 commit

[regexp] Fix -Wshadow warnings · 7ce84cbb

Ng Zhi An authored 3 years ago

Bug: v8:12244,v8:12245
Change-Id: I5b908f056222c57e796fb76e86ceea9a77cde77f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3265066Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77782}

7ce84cbb

26 Oct, 2021 1 commit

[regexp] Allow empty ranges in GetQuickCheckDetails · c1e32791

Jakob Gruber authored 3 years ago

A follow-up to crrev.com/c/3240782.

Drive-by: extend JSRegExp printing.

Fixed: chromium:1263327
Bug: v8:11069
Change-Id: Iff64ded27ca93641f0f572df2ce0a9f846948f7f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3245110
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77536}

c1e32791

25 Oct, 2021 1 commit

[regexp] Only emit valid ranges in MakeRangeArray · b7dc9915

Jakob Gruber authored 3 years ago

Character class handling in the irregexp pipeline is quite complex;
codepoints outside the BMP (basic multilingual plane) are only
translated into surrogate pairs when needed, e.g. when the subject
string is two-byte. If not needed, the codepoints simply stay part of
the list of CharacterRanges.

In EmitCharClass, we determine the valid subset of ranges through
ranges_length; until this CL, we forgot to pass that information on to
MakeRangeArray. Do that now by truncating the list of CharacterRanges.

Fixed: chromium:1262423
Change-Id: I5bb5b839e9935890ca2d10908ad66d72c3217178
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3240782
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77514}

b7dc9915

19 Oct, 2021 1 commit

[regexp] Compact codegen for large character classes · 8bbb44e5

Jakob Gruber authored 3 years ago

Large character classes may easily be created when unicode
properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
expanded internally into character classes that consist of hundreds
of character ranges. Previously to this CL, we'd emit branching code
for each of these ranges, leading to very large regexp code objects.

This CL adds a new codegen mode for large character classes (where
'large' currently means > 16 ranges). Instead of emitting branching
code inline, the ranges are written into a ByteArray and we call into
the C function IsCharacterInRangeArray for the actual branching logic.
The ByteArray is smaller than emitted code and is deduplicated if the
same character class is matched repeatedly in the same pattern.

Note this mode is *not* implemented for the interpreter, since we
currently don't have a constant pool for irregexp bytecode, and thus
cannot reference ByteArrays.

Bug: v8:11069
Change-Id: I2d728e42d85114b796c637f791848731a104cd54
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3229377Reviewed-by: Patrick Thier <pthier@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77463}

8bbb44e5

14 Oct, 2021 1 commit

[regexp] More cleanups · a2b9710f

Jakob Gruber authored 3 years ago

- Anonymous namespaces instead of static functions.
- Comments.
- Reserve enough space in the range ZoneList.

Change-Id: Ie79fda770974796cd590a155dc5fd504472e5bc9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3220341
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77391}

a2b9710f

12 Oct, 2021 1 commit

[regexp] Add dedicated enums for standard character sets · b4aa41d0

Jakob Gruber authored 3 years ago

.. instead of referring to them through magic chars {s,S,w,W,d,D,n,.,*}.

Change-Id: Ib50937a2a7d4229a021377586a54be3db9ed8c1d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3217196
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77337}

b4aa41d0

19 Aug, 2021 2 commits

[regexp] Replace JSRegExp::Flags uses by RegExpFlags · 66a85b8e

Jakob Gruber authored 3 years ago

.. and decrease the include-ball size.

Change-Id: Id35358a6882156f6684475b7f0b0193f8ca5eaf5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3103313
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76386}

66a85b8e

[regexp] Break dependency on JSRegExp::Flags · d586518a

Jakob Gruber authored 3 years ago

The JSRegExp heap object should not be the source of truth for regexp
flags, which are also relevant in places that don't need or want to
care about the heap object layout (e.g.: the regexp parser).

Introduce RegExpFlags as a new source of truth, and base everything
else on these flags.

As a first change, remove the js-regexp.h dependency from the regexp
parser. Other files in src/regexp/ should be updated in follow-up
work.

Change-Id: Id9a6706c7f09e93f743b08b647b211d0cb0b9c76
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3103306Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76379}

d586518a

10 Aug, 2021 1 commit

[regexp] Handle another regexp-too-big path for fuzzer suppressions · 3e21b6d0

Jakob Gruber authored 3 years ago

The behavior here depends on the platform and may also differ between
fast and slow paths [0]. Crash to let the fuzzer know there's nothing
interesting here.

[0] The reason for the fast-slow-path difference is that sometimes we
may trigger different compile jobs on these paths. One example is
`split`, which creates a new regexp instance on the slow path, but
reuses an existing instance on the fast path.

Bug: chromium:1236845
Change-Id: I87d9eb2601b235440014530d98df0e938b717650
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3080577
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>
Reviewed-by: Michael Achenbach <machenbach@chromium.org>
Cr-Commit-Position: refs/heads/master@{#76197}

3e21b6d0

27 Jul, 2021 1 commit

[regexp] Remove experimental mode modifiers feature · 7e97b2cf

Jakob Gruber authored 3 years ago

The implementation came in with
https://chromium-review.googlesource.com/758999.

This feature was never enabled by default, is not used anywhere, and
is not on any standardization path.

Bug: v8:10953
Change-Id: Ia2b0a556c1fb504a4cd05bdfa9f0a9c5be608d26
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3053589Reviewed-by: Mathias Bynens <mathias@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#75934}

7e97b2cf

01 Jul, 2021 1 commit

Fix most instances of -Wunreachable-code-aggressive. · ae1eee10

Peter Kasting authored 3 years ago

There are still a few cases remaining that seem more controversial;
I'll upload those separately.

Bug: chromium:1066980
Change-Id: Iabbaf23f9bbe97781857c0c589f2b3db685dfdc2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2994804
Commit-Queue: Peter Kasting <pkasting@chromium.org>
Auto-Submit: Peter Kasting <pkasting@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#75494}

ae1eee10

24 Jun, 2021 3 commits

Reland "[base] Move most of src/numbers into base" · 44e73e0b

Dan Elphick authored 3 years ago

This is a reland of 9701d4a4
with a small fix for some code landed in between the dry-run and
submission.

Original change's description:
> [base] Move most of src/numbers into base
>
> Moves all but conversions.*, hash-seed-inl.h and math-random.* into
> base, in preparation for moving the parts of conversions that don't
> access HeapObjects.
>
> Also moves uc16 and uc32 out of commons/globals.h into base/strings.h.
>
> Bug: v8:11917
> Change-Id: Ife359148bb0961a63833aff40d26331454b6afb6
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2979595
> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Clemens Backes <clemensb@chromium.org>
> Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
> Auto-Submit: Dan Elphick <delphick@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#75354}

Bug: v8:11917
Change-Id: Ie1ec9032fe56646a7c7303185cecc70fce5694ae
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2982607Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Commit-Queue: Dan Elphick <delphick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#75368}

44e73e0b

Revert "[base] Move most of src/numbers into base" · 10f6151d

Nico Hartmann authored 3 years ago

This reverts commit 9701d4a4.

Reason for revert: https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Mac64/40802/overview

Original change's description:
> [base] Move most of src/numbers into base
>
> Moves all but conversions.*, hash-seed-inl.h and math-random.* into
> base, in preparation for moving the parts of conversions that don't
> access HeapObjects.
>
> Also moves uc16 and uc32 out of commons/globals.h into base/strings.h.
>
> Bug: v8:11917
> Change-Id: Ife359148bb0961a63833aff40d26331454b6afb6
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2979595
> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Clemens Backes <clemensb@chromium.org>
> Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
> Auto-Submit: Dan Elphick <delphick@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#75354}

Bug: v8:11917
Change-Id: Iacf796c95256016fa74f0a910c5bb1a86baa425a
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2982605
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Reviewed-by: Nico Hartmann <nicohartmann@chromium.org>
Commit-Queue: Nico Hartmann <nicohartmann@chromium.org>
Cr-Commit-Position: refs/heads/master@{#75356}

10f6151d

[base] Move most of src/numbers into base · 9701d4a4

Dan Elphick authored 3 years ago

Moves all but conversions.*, hash-seed-inl.h and math-random.* into
base, in preparation for moving the parts of conversions that don't
access HeapObjects.

Also moves uc16 and uc32 out of commons/globals.h into base/strings.h.

Bug: v8:11917
Change-Id: Ife359148bb0961a63833aff40d26331454b6afb6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2979595Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Auto-Submit: Dan Elphick <delphick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#75354}

9701d4a4

18 Jun, 2021 1 commit

[base] Move utils/vector.h to base/vector.h · 7f5383e8

Dan Elphick authored 3 years ago

The adding of base:: was mostly prepared using git grep and sed:
git grep -l <pattern> | grep -v base/vector.h | \
  xargs sed -i 's/\b<pattern>\b/base::<pattern>/
with lots of manual clean-ups due to the resulting
v8::internal::base::Vectors.

#includes were fixed using:
git grep -l "src/utils/vector.h" | \
  axargs sed -i 's!src/utils/vector.h!src/base/vector.h!'

Bug: v8:11879
Change-Id: I3e6d622987fee4478089c40539724c19735bd625
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2968412Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Commit-Queue: Dan Elphick <delphick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#75243}

7f5383e8

09 Jun, 2021 1 commit

[regexp] Propagate eats_at_least for negative lookahead · 363ab5ae

Iain Ireland authored 3 years ago

In issue 11290, we disabled the propagation of EAL data out of
lookarounds, because it was incorrect for lookahead nodes in
loops. This caused performance regressions: for example,
`/^\P{Letter}+$/u` (matching only characters that are not in Unicode's
Letter category) uses negative lookahead when matching lone
surrogates, and became about 2x slower. I spent some time looking into
fixes, and this is what I've settled on.

Some background: the implementation of lookarounds in irregexp is
split between positive and negative lookaheads. (Lookbehinds aren't
relevant here, because backwards matches always have EAL=0.) Positive
lookaheads are wrapped in BEGIN_SUBMATCH and POSITIVE_SUBMATCH_SUCCESS
ActionNodes. BEGIN_SUBMATCH saves the current state.
POSITIVE_SUBMATCH_SUCCESS restores the necessary state (while leaving
any captures that occurred during the lookaround intact).

Negative lookaheads also begin with a BEGIN_SUBMATCH node, but follow
it with a NegativeLookaroundChoiceNode. This node has two successors:
a lookaround node, and a continue node. It only executes the continue
node if the lookaround node backtracks, which automatically restores
the previous state. Negative lookarounds also can't update captures.

This affects EAL calculations. It turns out that negative lookaheads
are already doing the right thing: EatsAtLeastPropagator only
propagates information from the continue node, ignoring the lookaround
node. The same is true for quick checks (see the comment in
RegExpLookaround:Builder::ForMatch). A BEGIN_SUBMATCH for a negative
lookahead can simply propagate the EAL data from its successor like
any other ActionNode, and everything works.

Positive lookaheads are harder. I tried saving a pointer to the
successor in BEGIN_SUBMATCH, but ran into problems in FillInBMInfo,
because the EAL value corresponded to the nodes after the lookahead,
but the analysis was still looking at the nodes inside. I fell back
to a more modest approach: split BEGIN_SUBMATCH in two, and propagate
EAL info for BEGIN_NEGATIVE_SUBMATCH while keeping the current
behaviour for BEGIN_POSITIVE_SUBMATCH. This fixes the performance
regression at hand.

Two potential approaches for fixing EAL for positive lookahead are:
1. Handling positive lookahead with its own dedicated choice node,
like NegativeLookaroundChoiceNode.
2. Adding an eats_at_least_inside_loop field to EatsAtLeastInfo,
which is <= eats_at_least_from_possibly_start, and using that
value in EatsAtLeastFromLoopEntry.

Both of those approaches are more complex than I want to tackle
right now, though.

Bug: v8:11844
Change-Id: I2a43509c2c21194b8c18f0a587fa21c194db76c2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2934858Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#75031}

363ab5ae

08 Apr, 2021 2 commits

[regexp] Don't propagate lookaround eats_at_least to surroundings · 59e218c8

Jakob Gruber authored 3 years ago

Lookarounds rewind the position after matching, and thus don't play
well with eats_at_least (EAL). This CL disables EAL propagation from
lookarounds.

In the future we could be a bit smarter by skipping over lookarounds
instead of resetting to 0.

Bug: v8:11290
Change-Id: I935400a7f9cda96d9c5a80e412ba7d04de70a84f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2808944Reviewed-by: Seth Brenith <seth.brenith@microsoft.com>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73849}

59e218c8

[regexp] Don't use eats_at_least for backwards loops · c977b65b

Jakob Gruber authored 3 years ago

The eats_at_least (EAL) value is applied in forward-directions only.
Two reasons for that which are relevant to this CL:

- EAL's of neighboring nodes are combined additively, irrespective of
  their read_backward value.
- EatsAtLeastPropagator::VisitText uses the successor's
  eats_at_least_from_not_start value, which doesn't work properly for
  read_backwards successors (which may end at the start).

A symptom of this bug was that we applied an incorrect EAL of 255
starting at the initial 'x' of /x(?<=^x{4})/); for subject strings
shorter than 255 chars, this would result in an incorrect failure
result.

Bug: v8:11616
Change-Id: I4b2b1b78f0cea8f59e4beb1037ee46035d83c927
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2807596Reviewed-by: Seth Brenith <seth.brenith@microsoft.com>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73848}

c977b65b

08 Feb, 2021 1 commit

[regexp] Change rangeBoundaries to use uc32 · f905e3f4

Iain Ireland authored 4 years ago

Some of the DCHECK_LT assertions in GenerateBranches were generating
signed-vs-unsigned comparisons in SM. While I was looking at this code,
it seemed reasonable to just fix the whole thing to use uc32/uint32_t
where appropriate.

Bug: v8:11380
Change-Id: I7e27fb7e34ce962349d7204d6306217292746e33
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2666986Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#72557}

f905e3f4

14 Jan, 2021 1 commit

[regexp] Throw when length of text nodes in alternatives is too large. · 3466daa7

Patrick Thier authored 4 years ago

Offsets in regular expressions are limited to 16 bits.
It was possible to exceed this limit when emitting greedy loops where
the length of text nodes exceeded 16 bits, resulting in overflowing
offsets.
With this CL we throw a SyntaxError "Regular expression too large" to
prevent this overflow.

Bug: chromium:1166138
Change-Id: Ica624a243bf9827083ff883d9a976f13c8da02e5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2629286
Commit-Queue: Patrick Thier <pthier@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#72095}

3466daa7

24 Nov, 2020 1 commit

[cleanup] Replace all remaining Min/Max uses with std::min/max · 3836aeb0

Georg Neis authored 4 years ago

Apart from removing Min and Max (utils.h), this is mostly a renaming.

In a few cases I had to add a cast. In a bunch of cases I had to use
initializer lists to force call-by-value for static member constants
because call-by-reference wouldn't compile (like in the previous CL).
In a few places I used initializer lists in place of nested min/max
operations.

Bug: v8:11074
Change-Id: I53a5411be6334ff41e7a8517e6b87fb46f14d086
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2545523
Commit-Queue: Georg Neis <neis@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71380}

3836aeb0

09 Nov, 2020 1 commit

[cleanup] Remove DISALLOW_COPY_AND_ASSIGN in regexp/ · b06f7da4

Zhi An Ng authored 4 years ago

Bug: v8:11074
Change-Id: I8deefa9cf5ac10b769e4ebb7029a82957cf669c3
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2525540Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71029}

b06f7da4

10 Jul, 2020 1 commit

[zone] Cleanup zone allocations in src/regexp and tests · 921c2476

Igor Sheludko authored 4 years ago

... by migrating old-style code
  MyObject* obj = new (zone) MyObject(...)

to the new style
  MyObject* obj = zone->New<MyObject>(...)

Bug: v8:10689
Change-Id: Icc60fdbf247ec05f9b5688b3d2d73d4fed06ea89
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2289770
Commit-Queue: Igor Sheludko <ishell@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68784}

921c2476

10 Jun, 2020 3 commits

[clang-tidy] Prefer static_cast to c style casts · f3646333

Ng Zhi An authored 4 years ago

See
https://clang.llvm.org/extra/clang-tidy/checks/google-readability-casting.html
and https://google.github.io/styleguide/cppguide.html#Casting.

Change-Id: Ib5a3bb8873bc6d050c4d0abe36a3ae813bbd448a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2233987Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68301}

f3646333

[regexp] Fix integer overflows in TextNode::GetQuickCheckDetails · a305d2de

Jakob Gruber authored 4 years ago

Several uc32 (= int32_t) fields were incorrectly treated as uc16
(= uint16_t):

CharacterRange::from()
CharacterRange::to()
QuickCheckDetails::Position::mask
QuickCheckDetails::Position::value

Bug: v8:10568
Change-Id: I9ea7d76e4a0cbc6ee681de2136c398cdc622bca2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2230527
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68290}

a305d2de

[globals] Change uc32 to be unsigned · f6874c73

Jakob Gruber authored 4 years ago

Prior to this change, uc16 was typedef'd to (unsigned) uint16_t while
uc32 was typedef'd to (signed) int32_t.

For consistency, and to avoid unexpected behavior around
signed/unsigned comparisons, this changes uc32 to the unsigned
uint32_t type.

As part of this change, old-style error passing (return -1, check for
negative return values) was updated to use named error values.

Bug: v8:10568
Change-Id: I8524e66ee20e8738749cd34c4fe82c14e885dcb3
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2235533Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Jakob Kummerow <jkummerow@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68282}

f6874c73

03 Jun, 2020 1 commit

[regexp] Fix non-unicode ignore-case backreferences · b65fcfe9

Iain Ireland authored 4 years ago

https://crrev.com/c/2072858 rewrote the implementation of non-unicode
ignore-case matches to comply with the JS spec in some corner
cases. It fixed character matches and character class matches.

We missed a similar bug in the implementation of back references. This
CL fixes that bug.

The main change is in regexp-macro-assembler.cc, where
CaseInsensitiveCompareUC16 is split into CaseInsensitiveCompareUnicode
(which has the same semantics as before) and
CaseInsensitiveCompareNonUnicode (which has the semantics described
here: https://tc39.es/ecma262/#sec-runtime-semantics-canonicalize-ch).

Most of the rest of the patch undoes https://crrev.com/c/2081816 to
once again make the unicode flag available to the macroassembler, so
that we can decide which helper function to call.

The testcase is a version of test/intl/regress-10248.js, modified to
test backreferences.

Bug: v8:10573
Change-Id: I70ef7d134d37f99b1f75a5eba17020e82d59f1b9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2219284Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68129}

b65fcfe9

28 Apr, 2020 1 commit

[regexp] Handlify RegExpCompileData::code · 6bb3f0c0

Iain Ireland authored 4 years ago

RegExpMacroAssembler::GetCode returns a Handle<Object>. However, that
Handle is almost immediately dereferenced, and is stored as a bare
Object in both RegExpCompiler::CompilationResult and RegExpCompileData.

This makes SpiderMonkey's rooting hazard analysis somewhat
antsy. While RegExpCompileData is alive on the stack, the hazard
analysis will not allow any calls that might GC, because it isn't
smart enough to prove that the code field can't be clobbered by a GC.

As far as I can tell, there is no real hazard here, but storing a
Handle in RegExpCompileData instead of a bare Object will simplify SM
and prevent a future patch from accidentally breaking something.

Bug: v8:10406
Change-Id: I9642dd05c591bfd23b340a89df2f2bf5c9fcac2c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2161578Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67441}

6bb3f0c0

21 Apr, 2020 2 commits

[regexp] Consistent expectations for output registers · fe609139

Jakob Gruber authored 4 years ago

... between the interpreter and generated code.

Prior to this CL, pre- and post conditions on the output register
array differed between the interpreter and generated code.

Interpreter
Pre: `output` fits captures and temporary registers.
Post: None.

Generated code
Pre:  `output` fits capture registers.
Post: `output` is modified if and only if the match succeeded.

This CL changes the interpreter to match generated code pre- and
post conditions by allocating space for temporary registers inside
the interpreter.

Drive-by: Add MaxRegisterCount, RegistersForCaptureCount helpers.

Bug: chromium:1067270
Change-Id: I2900ef2f31207d817ec7ead3e0e2215b23b398f0
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2135642
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67268}

fe609139

[regexp] Factor out PreprocessRegExp · 58ac66b7

Iain Ireland authored 4 years ago

RegExpImpl::Compile does a number of transformations that require
directly manipulating the internal representation of the regexp. For
example, when matching a (non-sticky, non-anchored) regular
expression, the pattern must be wrapped in .* so that it can match
anywhere in the input.

In the interest of moving towards a cleaner division between irregexp
and the outside world, it makes sense to move this code into
RegExpCompiler.

R=jgruber@chromium.org

Bug: v8:10406
Change-Id: I6da251c91c0016914a51480f80bb46c337fd0b23
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2140246Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67262}

58ac66b7

19 Mar, 2020 3 commits

Reland "[regexp] Rewrite error handling" · 560f2d8b

Iain Ireland authored 4 years ago

This is a reland of e80ca24c

Original change's description:
> [regexp] Rewrite error handling
>
> This patch modifies irregexp's error handling. Instead of representing
> errors as C strings, they are represented as an enumeration value
> (RegExpError), and only converted to strings when throwing the error
> object in regexp.cc. This makes it significantly easier to integrate
> into SpiderMonkey. A few notes:
>
> 1. Depending on whether the stack overflows during parsing or
>    analysis, the stack overflow message can vary ("Stack overflow" or
>    "Maximum call stack size exceeded"). I kept that behaviour in this
>    patch, under the assumption that stack overflow messages are
>    (sadly) the sorts of things that real world code ends up depending
>    on.
>
> 2. Depending on the point in code where the error was identified,
>    invalid unicode escapes could be reported as "Invalid Unicode
>    escape", "Invalid unicode escape", or "Invalid Unicode escape
>    sequence". I fervently hope that nobody depends on the specific
>    wording of a syntax error, so I standardized on the first one. (It
>    was both the most common, and the most consistent with other
>    "Invalid X escape" messages.)
>
> 3. In addition to changing the representation, this patch also adds an
>    error_pos field to RegExpParser and RegExpCompileData, which stores
>    the position at which an error occurred. This is used by
>    SpiderMonkey to provide more helpful messages about where a syntax
>    error occurred in large regular expressions.
>
> 4. This model is closer to V8's existing MessageTemplate
>    infrastructure. I considered trying to integrate it more closely
>    with MessageTemplate, but since one of our stated goals for this
>    project was to make it easier to use irregexp outside of V8, I
>    decided to hold off.
>
> R=jgruber@chromium.org
>
> Bug: v8:10303
> Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#66784}

R=jgruber@chromium.org

Bug: v8:10303
Change-Id: Iad1f11a0e0b9e525d7499aacb56c27eff9e7c7b5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2109952Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66798}

560f2d8b

Revert "[regexp] Rewrite error handling" · 2193f691

Leszek Swirski authored 4 years ago

This reverts commit e80ca24c.

Reason for revert: Causes failures in the fast/regex/non-pattern-characters.html Blink web test (https://ci.chromium.org/p/v8/builders/ci/V8%20Blink%20Linux/3679)

Original change's description:
> [regexp] Rewrite error handling
> 
> This patch modifies irregexp's error handling. Instead of representing
> errors as C strings, they are represented as an enumeration value
> (RegExpError), and only converted to strings when throwing the error
> object in regexp.cc. This makes it significantly easier to integrate
> into SpiderMonkey. A few notes:
> 
> 1. Depending on whether the stack overflows during parsing or
>    analysis, the stack overflow message can vary ("Stack overflow" or
>    "Maximum call stack size exceeded"). I kept that behaviour in this
>    patch, under the assumption that stack overflow messages are
>    (sadly) the sorts of things that real world code ends up depending
>    on.
> 
> 2. Depending on the point in code where the error was identified,
>    invalid unicode escapes could be reported as "Invalid Unicode
>    escape", "Invalid unicode escape", or "Invalid Unicode escape
>    sequence". I fervently hope that nobody depends on the specific
>    wording of a syntax error, so I standardized on the first one. (It
>    was both the most common, and the most consistent with other
>    "Invalid X escape" messages.)
> 
> 3. In addition to changing the representation, this patch also adds an
>    error_pos field to RegExpParser and RegExpCompileData, which stores
>    the position at which an error occurred. This is used by
>    SpiderMonkey to provide more helpful messages about where a syntax
>    error occurred in large regular expressions.
> 
> 4. This model is closer to V8's existing MessageTemplate
>    infrastructure. I considered trying to integrate it more closely
>    with MessageTemplate, but since one of our stated goals for this
>    project was to make it easier to use irregexp outside of V8, I
>    decided to hold off.
> 
> R=jgruber@chromium.org
> 
> Bug: v8:10303
> Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#66784}

TBR=jgruber@chromium.org,iireland@mozilla.com

Change-Id: I9247635f3c5b17c943b9c4abaf82ebe7b2de165e
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:10303
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2108550Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66786}

2193f691

[regexp] Rewrite error handling · e80ca24c

Iain Ireland authored 4 years ago

This patch modifies irregexp's error handling. Instead of representing
errors as C strings, they are represented as an enumeration value
(RegExpError), and only converted to strings when throwing the error
object in regexp.cc. This makes it significantly easier to integrate
into SpiderMonkey. A few notes:

1. Depending on whether the stack overflows during parsing or
   analysis, the stack overflow message can vary ("Stack overflow" or
   "Maximum call stack size exceeded"). I kept that behaviour in this
   patch, under the assumption that stack overflow messages are
   (sadly) the sorts of things that real world code ends up depending
   on.

2. Depending on the point in code where the error was identified,
   invalid unicode escapes could be reported as "Invalid Unicode
   escape", "Invalid unicode escape", or "Invalid Unicode escape
   sequence". I fervently hope that nobody depends on the specific
   wording of a syntax error, so I standardized on the first one. (It
   was both the most common, and the most consistent with other
   "Invalid X escape" messages.)

3. In addition to changing the representation, this patch also adds an
   error_pos field to RegExpParser and RegExpCompileData, which stores
   the position at which an error occurred. This is used by
   SpiderMonkey to provide more helpful messages about where a syntax
   error occurred in large regular expressions.

4. This model is closer to V8's existing MessageTemplate
   infrastructure. I considered trying to integrate it more closely
   with MessageTemplate, but since one of our stated goals for this
   project was to make it easier to use irregexp outside of V8, I
   decided to hold off.

R=jgruber@chromium.org

Bug: v8:10303
Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66784}

e80ca24c

17 Mar, 2020 1 commit

[arm64] Use BTI instructions for forward CFI · ea82d031

Georgia Kouveli authored 4 years ago

Generate a BTI instruction at each target of an indirect branch
(BR/BLR). An indirect branch that doesn't jump to a BTI instruction
will generate an exception on a BTI-enabled core. On cores that do
not support the BTI extension, the BTI instruction is a NOP.

Targets of indirect branch instructions include, among other things,
function entrypoints, exception handlers and jump tables. Lazy deopt
exits can potentially be reached through an indirect branch when an
exception is thrown, so they also get an additional BTI instruction.

Bug: v8:10026
Change-Id: I0ebf51071f1b604f60f524096e013dfd64fcd7ff
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1967315
Commit-Queue: Georgia Kouveli <georgia.kouveli@arm.com>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66751}

ea82d031

16 Mar, 2020 1 commit

[regexp] Simplify allocation of RegExpMacroAssemblerTracer · e5fd9cba

Iain Ireland authored 4 years ago

This change is motivated by SpiderMonkey's policy against bare
new/delete. (I also think it's just a nicer way to write this.)

R=jgruber@chromium.org

here is the same as the change I made in the equivalent SM code.

Note: I'm not importing regexp.cc into SpiderMonkey, but the change
Bug: v8:10303
Change-Id: I3c81727eb7dea9c0ec78241e3c82ffc9e7007827
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091858
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66713}

e5fd9cba