- 14 Jan, 2021 1 commit
-
-
Patrick Thier authored
Offsets in regular expressions are limited to 16 bits. It was possible to exceed this limit when emitting greedy loops where the length of text nodes exceeded 16 bits, resulting in overflowing offsets. With this CL we throw a SyntaxError "Regular expression too large" to prevent this overflow. Bug: chromium:1166138 Change-Id: Ica624a243bf9827083ff883d9a976f13c8da02e5 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2629286 Commit-Queue: Patrick Thier <pthier@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#72095}
-
- 24 Nov, 2020 1 commit
-
-
Georg Neis authored
Apart from removing Min and Max (utils.h), this is mostly a renaming. In a few cases I had to add a cast. In a bunch of cases I had to use initializer lists to force call-by-value for static member constants because call-by-reference wouldn't compile (like in the previous CL). In a few places I used initializer lists in place of nested min/max operations. Bug: v8:11074 Change-Id: I53a5411be6334ff41e7a8517e6b87fb46f14d086 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2545523 Commit-Queue: Georg Neis <neis@chromium.org> Reviewed-by:
Hannes Payer <hpayer@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Ulan Degenbaev <ulan@chromium.org> Cr-Commit-Position: refs/heads/master@{#71380}
-
- 09 Nov, 2020 1 commit
-
-
Zhi An Ng authored
Bug: v8:11074 Change-Id: I8deefa9cf5ac10b769e4ebb7029a82957cf669c3 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2525540Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#71029}
-
- 10 Jul, 2020 1 commit
-
-
Igor Sheludko authored
... by migrating old-style code MyObject* obj = new (zone) MyObject(...) to the new style MyObject* obj = zone->New<MyObject>(...) Bug: v8:10689 Change-Id: Icc60fdbf247ec05f9b5688b3d2d73d4fed06ea89 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2289770 Commit-Queue: Igor Sheludko <ishell@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#68784}
-
- 10 Jun, 2020 3 commits
-
-
Ng Zhi An authored
See https://clang.llvm.org/extra/clang-tidy/checks/google-readability-casting.html and https://google.github.io/styleguide/cppguide.html#Casting. Change-Id: Ib5a3bb8873bc6d050c4d0abe36a3ae813bbd448a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2233987Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Zhi An Ng <zhin@chromium.org> Cr-Commit-Position: refs/heads/master@{#68301}
-
Jakob Gruber authored
Several uc32 (= int32_t) fields were incorrectly treated as uc16 (= uint16_t): CharacterRange::from() CharacterRange::to() QuickCheckDetails::Position::mask QuickCheckDetails::Position::value Bug: v8:10568 Change-Id: I9ea7d76e4a0cbc6ee681de2136c398cdc622bca2 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2230527 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Leszek Swirski <leszeks@chromium.org> Cr-Commit-Position: refs/heads/master@{#68290}
-
Jakob Gruber authored
Prior to this change, uc16 was typedef'd to (unsigned) uint16_t while uc32 was typedef'd to (signed) int32_t. For consistency, and to avoid unexpected behavior around signed/unsigned comparisons, this changes uc32 to the unsigned uint32_t type. As part of this change, old-style error passing (return -1, check for negative return values) was updated to use named error values. Bug: v8:10568 Change-Id: I8524e66ee20e8738749cd34c4fe82c14e885dcb3 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2235533Reviewed-by:
Leszek Swirski <leszeks@chromium.org> Reviewed-by:
Clemens Backes <clemensb@chromium.org> Reviewed-by:
Jakob Kummerow <jkummerow@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#68282}
-
- 03 Jun, 2020 1 commit
-
-
Iain Ireland authored
https://crrev.com/c/2072858 rewrote the implementation of non-unicode ignore-case matches to comply with the JS spec in some corner cases. It fixed character matches and character class matches. We missed a similar bug in the implementation of back references. This CL fixes that bug. The main change is in regexp-macro-assembler.cc, where CaseInsensitiveCompareUC16 is split into CaseInsensitiveCompareUnicode (which has the same semantics as before) and CaseInsensitiveCompareNonUnicode (which has the semantics described here: https://tc39.es/ecma262/#sec-runtime-semantics-canonicalize-ch). Most of the rest of the patch undoes https://crrev.com/c/2081816 to once again make the unicode flag available to the macroassembler, so that we can decide which helper function to call. The testcase is a version of test/intl/regress-10248.js, modified to test backreferences. Bug: v8:10573 Change-Id: I70ef7d134d37f99b1f75a5eba17020e82d59f1b9 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2219284Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#68129}
-
- 28 Apr, 2020 1 commit
-
-
Iain Ireland authored
RegExpMacroAssembler::GetCode returns a Handle<Object>. However, that Handle is almost immediately dereferenced, and is stored as a bare Object in both RegExpCompiler::CompilationResult and RegExpCompileData. This makes SpiderMonkey's rooting hazard analysis somewhat antsy. While RegExpCompileData is alive on the stack, the hazard analysis will not allow any calls that might GC, because it isn't smart enough to prove that the code field can't be clobbered by a GC. As far as I can tell, there is no real hazard here, but storing a Handle in RegExpCompileData instead of a bare Object will simplify SM and prevent a future patch from accidentally breaking something. Bug: v8:10406 Change-Id: I9642dd05c591bfd23b340a89df2f2bf5c9fcac2c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2161578Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#67441}
-
- 21 Apr, 2020 2 commits
-
-
Jakob Gruber authored
... between the interpreter and generated code. Prior to this CL, pre- and post conditions on the output register array differed between the interpreter and generated code. Interpreter Pre: `output` fits captures and temporary registers. Post: None. Generated code Pre: `output` fits capture registers. Post: `output` is modified if and only if the match succeeded. This CL changes the interpreter to match generated code pre- and post conditions by allocating space for temporary registers inside the interpreter. Drive-by: Add MaxRegisterCount, RegistersForCaptureCount helpers. Bug: chromium:1067270 Change-Id: I2900ef2f31207d817ec7ead3e0e2215b23b398f0 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2135642 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Leszek Swirski <leszeks@chromium.org> Cr-Commit-Position: refs/heads/master@{#67268}
-
Iain Ireland authored
RegExpImpl::Compile does a number of transformations that require directly manipulating the internal representation of the regexp. For example, when matching a (non-sticky, non-anchored) regular expression, the pattern must be wrapped in .* so that it can match anywhere in the input. In the interest of moving towards a cleaner division between irregexp and the outside world, it makes sense to move this code into RegExpCompiler. R=jgruber@chromium.org Bug: v8:10406 Change-Id: I6da251c91c0016914a51480f80bb46c337fd0b23 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2140246Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#67262}
-
- 19 Mar, 2020 3 commits
-
-
Iain Ireland authored
This is a reland of e80ca24c Original change's description: > [regexp] Rewrite error handling > > This patch modifies irregexp's error handling. Instead of representing > errors as C strings, they are represented as an enumeration value > (RegExpError), and only converted to strings when throwing the error > object in regexp.cc. This makes it significantly easier to integrate > into SpiderMonkey. A few notes: > > 1. Depending on whether the stack overflows during parsing or > analysis, the stack overflow message can vary ("Stack overflow" or > "Maximum call stack size exceeded"). I kept that behaviour in this > patch, under the assumption that stack overflow messages are > (sadly) the sorts of things that real world code ends up depending > on. > > 2. Depending on the point in code where the error was identified, > invalid unicode escapes could be reported as "Invalid Unicode > escape", "Invalid unicode escape", or "Invalid Unicode escape > sequence". I fervently hope that nobody depends on the specific > wording of a syntax error, so I standardized on the first one. (It > was both the most common, and the most consistent with other > "Invalid X escape" messages.) > > 3. In addition to changing the representation, this patch also adds an > error_pos field to RegExpParser and RegExpCompileData, which stores > the position at which an error occurred. This is used by > SpiderMonkey to provide more helpful messages about where a syntax > error occurred in large regular expressions. > > 4. This model is closer to V8's existing MessageTemplate > infrastructure. I considered trying to integrate it more closely > with MessageTemplate, but since one of our stated goals for this > project was to make it easier to use irregexp outside of V8, I > decided to hold off. > > R=jgruber@chromium.org > > Bug: v8:10303 > Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863 > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#66784} R=jgruber@chromium.org Bug: v8:10303 Change-Id: Iad1f11a0e0b9e525d7499aacb56c27eff9e7c7b5 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2109952Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#66798}
-
Leszek Swirski authored
This reverts commit e80ca24c. Reason for revert: Causes failures in the fast/regex/non-pattern-characters.html Blink web test (https://ci.chromium.org/p/v8/builders/ci/V8%20Blink%20Linux/3679) Original change's description: > [regexp] Rewrite error handling > > This patch modifies irregexp's error handling. Instead of representing > errors as C strings, they are represented as an enumeration value > (RegExpError), and only converted to strings when throwing the error > object in regexp.cc. This makes it significantly easier to integrate > into SpiderMonkey. A few notes: > > 1. Depending on whether the stack overflows during parsing or > analysis, the stack overflow message can vary ("Stack overflow" or > "Maximum call stack size exceeded"). I kept that behaviour in this > patch, under the assumption that stack overflow messages are > (sadly) the sorts of things that real world code ends up depending > on. > > 2. Depending on the point in code where the error was identified, > invalid unicode escapes could be reported as "Invalid Unicode > escape", "Invalid unicode escape", or "Invalid Unicode escape > sequence". I fervently hope that nobody depends on the specific > wording of a syntax error, so I standardized on the first one. (It > was both the most common, and the most consistent with other > "Invalid X escape" messages.) > > 3. In addition to changing the representation, this patch also adds an > error_pos field to RegExpParser and RegExpCompileData, which stores > the position at which an error occurred. This is used by > SpiderMonkey to provide more helpful messages about where a syntax > error occurred in large regular expressions. > > 4. This model is closer to V8's existing MessageTemplate > infrastructure. I considered trying to integrate it more closely > with MessageTemplate, but since one of our stated goals for this > project was to make it easier to use irregexp outside of V8, I > decided to hold off. > > R=jgruber@chromium.org > > Bug: v8:10303 > Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863 > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#66784} TBR=jgruber@chromium.org,iireland@mozilla.com Change-Id: I9247635f3c5b17c943b9c4abaf82ebe7b2de165e No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:10303 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2108550Reviewed-by:
Leszek Swirski <leszeks@chromium.org> Commit-Queue: Leszek Swirski <leszeks@chromium.org> Cr-Commit-Position: refs/heads/master@{#66786}
-
Iain Ireland authored
This patch modifies irregexp's error handling. Instead of representing errors as C strings, they are represented as an enumeration value (RegExpError), and only converted to strings when throwing the error object in regexp.cc. This makes it significantly easier to integrate into SpiderMonkey. A few notes: 1. Depending on whether the stack overflows during parsing or analysis, the stack overflow message can vary ("Stack overflow" or "Maximum call stack size exceeded"). I kept that behaviour in this patch, under the assumption that stack overflow messages are (sadly) the sorts of things that real world code ends up depending on. 2. Depending on the point in code where the error was identified, invalid unicode escapes could be reported as "Invalid Unicode escape", "Invalid unicode escape", or "Invalid Unicode escape sequence". I fervently hope that nobody depends on the specific wording of a syntax error, so I standardized on the first one. (It was both the most common, and the most consistent with other "Invalid X escape" messages.) 3. In addition to changing the representation, this patch also adds an error_pos field to RegExpParser and RegExpCompileData, which stores the position at which an error occurred. This is used by SpiderMonkey to provide more helpful messages about where a syntax error occurred in large regular expressions. 4. This model is closer to V8's existing MessageTemplate infrastructure. I considered trying to integrate it more closely with MessageTemplate, but since one of our stated goals for this project was to make it easier to use irregexp outside of V8, I decided to hold off. R=jgruber@chromium.org Bug: v8:10303 Change-Id: I62605fd2def2fc539f38a7e0eefa04d36e14bbde Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091863 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#66784}
-
- 17 Mar, 2020 1 commit
-
-
Georgia Kouveli authored
Generate a BTI instruction at each target of an indirect branch (BR/BLR). An indirect branch that doesn't jump to a BTI instruction will generate an exception on a BTI-enabled core. On cores that do not support the BTI extension, the BTI instruction is a NOP. Targets of indirect branch instructions include, among other things, function entrypoints, exception handlers and jump tables. Lazy deopt exits can potentially be reached through an indirect branch when an exception is thrown, so they also get an additional BTI instruction. Bug: v8:10026 Change-Id: I0ebf51071f1b604f60f524096e013dfd64fcd7ff Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1967315 Commit-Queue: Georgia Kouveli <georgia.kouveli@arm.com> Reviewed-by:
Ross McIlroy <rmcilroy@chromium.org> Reviewed-by:
Georg Neis <neis@chromium.org> Reviewed-by:
Clemens Backes <clemensb@chromium.org> Cr-Commit-Position: refs/heads/master@{#66751}
-
- 16 Mar, 2020 1 commit
-
-
Iain Ireland authored
This change is motivated by SpiderMonkey's policy against bare new/delete. (I also think it's just a nicer way to write this.) R=jgruber@chromium.org here is the same as the change I made in the equivalent SM code. Note: I'm not importing regexp.cc into SpiderMonkey, but the change Bug: v8:10303 Change-Id: I3c81727eb7dea9c0ec78241e3c82ffc9e7007827 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091858 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#66713}
-
- 12 Mar, 2020 3 commits
-
-
Iain Ireland authored
This is a grab-bag of small compatibility fixes to make it easier to import irregexp into SpiderMonkey. For changes where the commit message was longer than the change itself, it didn't seem worth opening a separate review. [regexp] Use uc16 in FilterOneByte SpiderMonkey uses char16_t instead of uint16_t for its two-byte strings. (This matches ICU. It looks like V8 considered making the same change, but decided against it: see https://bugs.chromium.org/p/v8/issues/detail?id=6487.) Fortunately, irregexp is careful about only using uc16, so SpiderMonkey can just define uc16 = char16_t and *almost* everything works out. This patch fixes the single place in irregexp where that is not true. [regexp] Remove unreachable return The return statement at the end of RegExpParser::ParseClassCharacterEscape is unreachable, because every branch of the switch returns. This triggered static analysis errors in SpiderMonkey. [regexp] Remove trivial assertion The assertion in BytecodeSequenceNode::ArgumentMapping cannot fail, because size_t is an unsigned type. This triggered static analysis warnings in SpiderMonkey. [regexp] Make RegExpStack constructor public In V8, the RegExpStack's private constructor is called from Isolate, which is a friend class. In SpiderMonkey, we use a wrapper around new to control where memory is allocated, so we need the RegExpStack constructor to be visible outside of Isolate. [regexp] Refactor Isolate::IncreaseTotalRegexpCodeGenerated The call-site of Isolate::IncreaseTotalRegexpCodeGenerated is the only place inside irregexp where HeapObject::Size is called. SpiderMonkey's heap-allocated objects live in arenas, and don't have a standardized way of finding the size. In this particular case it would be safe to hardcode a size of 0, but leaving HeapObject::Size undefined will ensure that SpiderMonkey doesn't silently do the wrong thing if somebody in V8 adds a new, more meaningful call to HeapObject::Size. R=jgruber@chromium.org Bug: v8:10303 Change-Id: I5b81e1a261fec8c85a63f71f34cd12d68f638334 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2090191 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#66676}
-
Iain Ireland authored
For a variety of reasons related to OOM handling and custom allocators, SpiderMonkey wants to be able to see all memory allocations. To enforce this, we have a static analysis that verifies that we don't link in malloc/new/etc in unexpected places. One consequence of this is that we can't use STL containers without a custom allocator, because they call operator new internally. This is mostly not an issue in irregexp, which makes heavy use of zone allocation. The main exceptions are a handful of uses of std::vector in regexp-compiler.* and regexp-parser.*. If these vectors are converted to ZoneVectors, then our static analysis is satisfied. R=jgruber@chromium.org Bug: v8:10303 Change-Id: I8b14a2eb54d3b20959e3fbe878f77effae124a2c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091402Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#66674}
-
Iain Ireland authored
TextEmitPass uses a function pointer to determine which pass to call. This function pointer is only assigned inside TextEmitPass, and can easily be eliminated by moving the calls to each possible target inside the switch statement that assigns the function pointer. I made this change because SpiderMonkey uses a static analysis pass to verify that everything is rooted properly across calls that might GC, and that analysis is conservative when calling function pointers. We can white-list function pointers that are known to be safe, but the code being called through this function pointer is complex enough (and the function pointer is unnecessary enough) that it seemed best to just remove the function pointer entirely. R=jgruber@chromium.org Bug: v8:10303 Change-Id: I5fbb0df290a2288c4d3db6d43a563385337162ea Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2091398 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#66672}
-
- 10 Mar, 2020 1 commit
-
-
Iain Ireland authored
Non-unicode, case-insensitive regexps (e.g. /foo/i, not foo/iu) use a case-folding algorithm that doesn't quite match the Unicode definition. There are two places in irregexp that need to do case-folding. Prior to this patch, neither of them quite matched the spec (https://tc39.es/ecma262/#sec-runtime-semantics-canonicalize-ch). This patch implements the "Canonicalize" algorithm in src/regexp/special-case.h, and uses it in the relevant places. It replaces special-case logic around upper-casing / ASCII characters with the following approach: 1. For most characters, calling UnicodeSet::closeOver on a set containing that character will produce the correct set of case-insensitive matches. 2. For a small handful of characters (like the sharp S that prompted this change), UnicodeSet::closeOver will include some characters that should be omitted. For example, although closeOver('ß') = "ßẞ", uppercase('ß') is "SS", so step 3.e means that 'ß' canonicalizes to itself, and should not match 'ẞ'. In these cases, we can skip the closeOver entirely, because it will never add an equivalent character. These characters are in the IgnoreSet. 3. For an even smaller handful of characters, UnicodeSet::closeOver will produce some characters that should be omitted, but also some characters that should be included. For example, closeOver('k') = "kKK" (lowercase k, uppercase K, U+212A KELVIN SIGN), but KELVIN SIGN should not match either of the other two (step 3.g). To handle this, we put such characters in the SpecialAddSet. In these cases, we closeOver the original character, but filter out the results that do not have the same canonical value. The computation of IgnoreSet and SpecialAddSet happens at build time, using the pre-existing gen-regexp-special-case.cc step. R=jgruber@chromium.org Bug: v8:10248 Change-Id: I00d48b180c83bb8e645cc59eda57b01eab134f0b Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2072858Reviewed-by:
Frank Tang <ftang@chromium.org> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#66641}
-
- 02 Mar, 2020 2 commits
-
-
Michael Achenbach authored
Bug: chromium:1055060 Change-Id: I9fd06642ba47d1452b33b0e97715dd2e64943c6c Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2083027Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> Cr-Commit-Position: refs/heads/master@{#66529}
-
Jakob Gruber authored
In the past we've used the isolate argument to signal whether we were in unicode mode (nullptr) or not (the real isolate). This is no longer needed, and in fact breaks no-i18n mode which always expects to have a real isolate. Bug: v8:10120 Change-Id: I2f848c4ff8c2ff0e9b84278cbcdf3c3670e44e58 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2081816Reviewed-by:
Jakob Kummerow <jkummerow@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#66520}
-
- 09 Oct, 2019 1 commit
-
-
Frank Tang authored
Move the "if (!((start >= 128) && (character < 128))) {" check outside while (start <= end) loop. Bug: v8:9731 Change-Id: I1f7e2fea189f0eba0b52100ac8d4063e7d79a306 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1832911Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Frank Tang <ftang@chromium.org> Cr-Commit-Position: refs/heads/master@{#64194}
-
- 27 Sep, 2019 2 commits
-
-
Frank Tang authored
Address special case condiction for U+017F. Bug: v8:9356 Change-Id: Id24e5e2c999b198bf0f696aea8c98f223508c051 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1827683 Commit-Queue: Frank Tang <ftang@chromium.org> Reviewed-by:
Yang Guo <yangguo@chromium.org> Reviewed-by:
Jakob Kummerow <jkummerow@chromium.org> Cr-Commit-Position: refs/heads/master@{#64028}
-
Frank Tang authored
Add logic stated in Bug: v8:9731 Change-Id: I0b3468bbad11a178f36d682febd0e44214646de8 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1828279 Commit-Queue: Frank Tang <ftang@chromium.org> Reviewed-by:
Jakob Kummerow <jkummerow@chromium.org> Cr-Commit-Position: refs/heads/master@{#64027}
-
- 29 Aug, 2019 2 commits
-
-
Jakob Gruber authored
Similar to CheckNotAtStart, one can now apply an offset to the CheckAtStart operation. Due to a recent change, all callsites of CheckNotAtStart now need to pass an offset, whereas previously the offset was just assumed to be zero. Bug: chromium:996391 Change-Id: Ia59a584e93e5384479f05abddef7859b420b023a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1773272 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Auto-Submit: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Cr-Commit-Position: refs/heads/master@{#63444}
-
Jakob Gruber authored
Printing regexp code used to behind the generic --print-code flag, but there was no way to distinguish between irregexp-generated code; and printing regexp bytecode was not supported at all (the --trace-regexp-bytecodes flag *did* exist, but prints the execution trace at runtime and not the generated bytecode sequence). This CL adds two new flags: --print-regexp-code --print-regexp-bytecode Regexp code is no longer printed as part of --print-code. Example output for --print-regexp-bytecode: generated bytecode for regexp pattern: .(?<!^.) 0x1ddcc614cbd0 0 PUSH_BT, 02, 00, 00, 00, c0, 00, 00, 00 ....... 0x1ddcc614cbd8 8 LOAD_CURRENT_CHAR, 11, 00, 00, 00, b0, 00, 00, 00 ....... 0x1ddcc614cbe0 10 CHECK_CHAR, 18, 0a, 00, 00, b0, 00, 00, 00 ....... 0x1ddcc614cbe8 18 CHECK_CHAR, 18, 0d, 00, 00, b0, 00, 00, 00 ....... 0x1ddcc614cbf0 20 PUSH_CP, 01, 00, 00, 00 ... Bug: chromium:996391 Change-Id: I731defbd7cf9ed29753a39bb1d7205dc136ca950 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1773249 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Auto-Submit: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Cr-Commit-Position: refs/heads/master@{#63442}
-
- 28 Aug, 2019 1 commit
-
-
Jakob Gruber authored
This fixes an invalid assumption when emitting code for matching '^' (start of line) in multiline regexps and '\b', '\B' in general. What we used to do: if the current trace's cp_offset (the offset from the current position) was non-zero, we assumed that we were looking at subject string index 1 or greater (i.e.: not at the start of the string or before). This is no longer valid since cp_offsets can now be negative. This CL changes the logic to omit start- and bounds-checks only for strictly positive cp_offsets, where the above assumption still holds. Bug: chromium:996391 Change-Id: I79be4fc295c6f0b63e41c13d1e91fdd00f2f2b42 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1771794 Commit-Queue: Erik Corry <erikcorry@chromium.org> Auto-Submit: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Erik Corry <erikcorry@chromium.org> Cr-Commit-Position: refs/heads/master@{#63424}
-
- 23 Aug, 2019 1 commit
-
-
Ana Peško authored
is enabled. Change-Id: Iab87b9c7a0d0600782b02537844338ff065622ab Bug: chromium:996234 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1765531Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Reviewed-by:
Yang Guo <yangguo@chromium.org> Commit-Queue: Ana Pesko <anapesko@google.com> Cr-Commit-Position: refs/heads/master@{#63360}
-
- 31 Jul, 2019 1 commit
-
-
Seth Brenith authored
This is a reland of 4b15b984 Updates since original: fix an arithmetic overflow bug, remove an invalid DCHECK, add a unit test that would trigger that DCHECK. Original change's description: > [regexp] Better quick checks on loop entry nodes > > Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this > change is inspired by attempting to exit earlier from generated RegExp > code, when no further matches are possible because any match would be > too long. The motivating example this time is the following expression, > which tests whether a string of Unicode playing cards has five of the > same suit in a row: > > /([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u > > A human reading this expression can readily see that any match requires > at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for > each repeated option reports its minimum distance to the end of a match > as zero. This is correct, because the LoopChoiceNode's behavior depends > on additional state (the loop counter). However, the preceding node, a > SET_REGISTER action that initializes the loop counter, could confidently > state that it consumes at least 10 characters. Furthermore, when we try > to emit a quick check for that action, we could follow only paths from > the LoopChoiceNode that are possible based on the minimum iteration > count. This change implements both of those "could"s. > > I expect this improvement to apply pretty broadly to expressions that > use minimum repetition counts and that don't meet the criteria for > unrolling. In this particular case, I get about 12% improvement on the > overall UniPoker test, due to reducing the execution time of this > expression by 85% and the execution time of another similar expression > that checks for n-of-a-kind by 20%. > > Bug: v8:9305 > > Change-Id: I319e381743967bdf83324be75bae943fbb5dd496 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941 > Commit-Queue: Seth Brenith <seth.brenith@microsoft.com> > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#62963} Bug: v8:9305 Change-Id: I992070d383009013881bf778242254c27134b650 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1726674Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Commit-Queue: Seth Brenith <seth.brenith@microsoft.com> Cr-Commit-Position: refs/heads/master@{#63009}
-
- 30 Jul, 2019 1 commit
-
-
Leszek Swirski authored
This reverts commit 4b15b984. Reason for revert: UBSan failure (https://logs.chromium.org/logs/v8/buildbucket/cr-buildbucket.appspot.com/8906578530303352544/+/steps/Check/0/logs/regress-126412/0). Original change's description: > [regexp] Better quick checks on loop entry nodes > > Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this > change is inspired by attempting to exit earlier from generated RegExp > code, when no further matches are possible because any match would be > too long. The motivating example this time is the following expression, > which tests whether a string of Unicode playing cards has five of the > same suit in a row: > > /([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u > > A human reading this expression can readily see that any match requires > at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for > each repeated option reports its minimum distance to the end of a match > as zero. This is correct, because the LoopChoiceNode's behavior depends > on additional state (the loop counter). However, the preceding node, a > SET_REGISTER action that initializes the loop counter, could confidently > state that it consumes at least 10 characters. Furthermore, when we try > to emit a quick check for that action, we could follow only paths from > the LoopChoiceNode that are possible based on the minimum iteration > count. This change implements both of those "could"s. > > I expect this improvement to apply pretty broadly to expressions that > use minimum repetition counts and that don't meet the criteria for > unrolling. In this particular case, I get about 12% improvement on the > overall UniPoker test, due to reducing the execution time of this > expression by 85% and the execution time of another similar expression > that checks for n-of-a-kind by 20%. > > Bug: v8:9305 > > Change-Id: I319e381743967bdf83324be75bae943fbb5dd496 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941 > Commit-Queue: Seth Brenith <seth.brenith@microsoft.com> > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#62963} TBR=jgruber@chromium.org,seth.brenith@microsoft.com Change-Id: Iac085b75e054fdf0d218987cfe449be1f1630545 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: v8:9305 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1725621Reviewed-by:
Leszek Swirski <leszeks@chromium.org> Commit-Queue: Leszek Swirski <leszeks@chromium.org> Cr-Commit-Position: refs/heads/master@{#62977}
-
- 29 Jul, 2019 1 commit
-
-
Seth Brenith authored
Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this change is inspired by attempting to exit earlier from generated RegExp code, when no further matches are possible because any match would be too long. The motivating example this time is the following expression, which tests whether a string of Unicode playing cards has five of the same suit in a row: /([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u A human reading this expression can readily see that any match requires at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for each repeated option reports its minimum distance to the end of a match as zero. This is correct, because the LoopChoiceNode's behavior depends on additional state (the loop counter). However, the preceding node, a SET_REGISTER action that initializes the loop counter, could confidently state that it consumes at least 10 characters. Furthermore, when we try to emit a quick check for that action, we could follow only paths from the LoopChoiceNode that are possible based on the minimum iteration count. This change implements both of those "could"s. I expect this improvement to apply pretty broadly to expressions that use minimum repetition counts and that don't meet the criteria for unrolling. In this particular case, I get about 12% improvement on the overall UniPoker test, due to reducing the execution time of this expression by 85% and the execution time of another similar expression that checks for n-of-a-kind by 20%. Bug: v8:9305 Change-Id: I319e381743967bdf83324be75bae943fbb5dd496 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941 Commit-Queue: Seth Brenith <seth.brenith@microsoft.com> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#62963}
-
- 25 Jul, 2019 1 commit
-
-
Seth Brenith authored
The motivating example is JetStream 2's UniPoker test, which tests whether a sorted string of Unicode playing cards contains a five-card straight using a regular expression. In the top-level generated loop for this RegExp, we see this loop exit condition: 00000350000C2067 27 83fffe cmpl rdi,0xfe 00000350000C206A 2a 0f8da8e40000 jge 00000350000D0518 <+0xe4d8> Meaning if the current position is pointing at the very last (16-bit) character, then we exit the loop. Otherwise we go on and try to find various matches starting at the current position. However, we can see in the original expression that any possible match is at least 10 characters (5 astral-plane Unicode values), so we're wasting a lot of time attempting to find matches in cases where we're too close to the end of the string for any match to succeed. This example might be a bit contrived, but I expect that an improvement in this bounds check would help a larger family of regular expressions, where the minimum match length is large relative to the string being matched and we don't meet the other necessary criteria for fast Boyer- Moore lookahead. To get the desired bounds check in this case, this patch does the following: 1. Compute accurate EatsAtLeast values for every node during the analysis phase. This could end up doing more work than the current implementation, but analysis already has to touch every node, so it seems like a cache-friendly time to compute these values. In some cases, this might be less total work than the current implementation, because the current implementation might recompute the same node multiple times. 2. When emitting a quick check, use the EatsAtLeast value from the predecessor ChoiceNode for the bounds check. This improves the UniPoker score on my machine by about 4%, because it cuts the time spent checking for straights roughly in half, and checking for straights originally accounted for about 8% of the total time. Bug: v8:9305 Change-Id: I110b190c2578f73b2263259d5aa5750e921b01be Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1702125 Commit-Queue: Seth Brenith <seth.brenith@microsoft.com> Reviewed-by:
Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#62919}
-
- 01 Jul, 2019 1 commit
-
-
Jakob Gruber authored
Since https://codereview.chromium.org/2777583003, the Boyer-Moore lookahead (used by the irregexp engine) also looks inside submatches to narrow down its range of accepted characters at specific offsets. But the end of a submatch, designated by a PositiveSubmatchSuccess action node, was not handled correctly. When a submatch terminates, we have no knowledge of what may follow, and thus must accept any character at following positions. This is done by the SetRest call added in this CL. An example, since this is fairly obscure: /^.*?Y(((?=B?).)*)Y$/s The initial non-greedy loop, together with the s flag, will trigger an attempted Boyer-Moore lookahead. After this follows an unconditional Y, a *-quantified loop matching any char and containing a lookahead that matches either 1 B or 0 B's, and an unconditional trailing Y. When the BM lookahead scans the subject string for the beginning of this pattern after the non-greedy loop, it should look for: a Y at offset 0, and either a B, a Y, or '.' (-> any character) at offset 1. Prior to this CL this was not the case: - The lookaround is internally generated as a submatch. - The optional 'B?' is unrolled into 'either B followed by submatch end' or 'submatch end'. - Filling in BM infos terminates when encountering a submatch end. Thus in the former case we added B to the set of accepted characters and terminated, while in the latter case we simply terminated.o This CL ensures that BM will accept any character at any offset at or exceeding the first encountered submatch end. Bug: v8:8770 Change-Id: Iff998ba307cd9669203846a9182798b8cf6a85dc Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1679506 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Erik Corry <erikcorry@chromium.org> Reviewed-by:
Yang Guo <yangguo@chromium.org> Auto-Submit: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#62460}
-
- 27 Jun, 2019 1 commit
-
-
Jakob Gruber authored
BoyerMoorePositionInfo is a simple wrapper around a bitset and an associated ContainedInLattice field. This CL refactors bitset-related operations that used to be implemented naively (e.g.: loop over all bits to find a single set bit, or to generate a union of two bitsets). Instead, use more suitable methods from base::bits and std::bitset. Drive-by: Remove dead class members. Drive-by: Zero the ByteArray with memset. Bug: v8:9359 Change-Id: I33923c7d216320f4e3d7e4a6df2967f4aa86ab05 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1667407 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Leszek Swirski <leszeks@chromium.org> Cr-Commit-Position: refs/heads/master@{#62419}
-
- 19 Jun, 2019 2 commits
-
-
Jakob Gruber authored
Outset: The more advanced features of OutSet are no longer used, thus the rename to DynamicBitSet to reflect its current purpose. BoyerMoorePositionInfo: Use bitset backing store in BoyerMoorePositionInfo (previously this was based on a (statically-sized) ZoneList<bool>). Bug: v8:9359 Change-Id: I40ca89467ae90ee90c616be5fd0d51e54e94e157 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1664064 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Cr-Commit-Position: refs/heads/master@{#62277}
-
Jakob Gruber authored
Bug: v8:9359 Change-Id: I237f16324ff036f2cbfb7ca97b4ac208442b06cc Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1664056 Auto-Submit: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Sigurd Schneider <sigurds@chromium.org> Reviewed-by:
Sigurd Schneider <sigurds@chromium.org> Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Cr-Commit-Position: refs/heads/master@{#62268}
-
- 18 Jun, 2019 3 commits
-
-
Jakob Gruber authored
This class used to be based on DispatchTable, which itself uses an interval tree to both categorize and canonicalize ranges (i.e. such that no overlap and all immediately adjacent ranges are merged). The produced ranges were then entered into lists for {bmp,lead_surrogate,trail_surrogate,non_bmp} splits. With this CL, we simplify to a plain loop over all character range kinds instead. The dispatch table (and ZoneSplayList, perhaps SplayList) can be removed in follow-ups. Bug: v8:9359 Change-Id: I9c6b72f3bc44d1557af7c74419709ae5662611f1 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1664053 Auto-Submit: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Michael Lippautz <mlippautz@chromium.org> Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#62260}
-
Jakob Gruber authored
Bug: v8:9359 Change-Id: I1b490c928ed884f4ad33e005699f98614be75233 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1662306 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Auto-Submit: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Cr-Commit-Position: refs/heads/master@{#62242}
-
Jakob Gruber authored
Move this straggler to its use-site in regexp-compiler.cc. Bug: v8:9359 Change-Id: Ia5393140de5a1c8d70ac410ef6239eabfec130b3 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1662303 Commit-Queue: Jakob Gruber <jgruber@chromium.org> Auto-Submit: Jakob Gruber <jgruber@chromium.org> Reviewed-by:
Peter Marshall <petermarshall@chromium.org> Cr-Commit-Position: refs/heads/master@{#62241}
-