1. 13 May, 2022 1 commit
  2. 19 Oct, 2021 1 commit
    • Jakob Gruber's avatar
      [regexp] Compact codegen for large character classes · 8bbb44e5
      Jakob Gruber authored
      Large character classes may easily be created when unicode
      properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
      expanded internally into character classes that consist of hundreds
      of character ranges. Previously to this CL, we'd emit branching code
      for each of these ranges, leading to very large regexp code objects.
      
      This CL adds a new codegen mode for large character classes (where
      'large' currently means > 16 ranges). Instead of emitting branching
      code inline, the ranges are written into a ByteArray and we call into
      the C function IsCharacterInRangeArray for the actual branching logic.
      The ByteArray is smaller than emitted code and is deduplicated if the
      same character class is matched repeatedly in the same pattern.
      
      Note this mode is *not* implemented for the interpreter, since we
      currently don't have a constant pool for irregexp bytecode, and thus
      cannot reference ByteArrays.
      
      Bug: v8:11069
      Change-Id: I2d728e42d85114b796c637f791848731a104cd54
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3229377Reviewed-by: 's avatarPatrick Thier <pthier@chromium.org>
      Auto-Submit: Jakob Gruber <jgruber@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/main@{#77463}
      8bbb44e5
  3. 24 Jun, 2021 3 commits
  4. 10 May, 2021 1 commit
  5. 18 Jun, 2020 1 commit
  6. 03 Jun, 2020 1 commit
  7. 19 May, 2020 1 commit
  8. 02 Mar, 2020 1 commit
  9. 01 Oct, 2019 1 commit
    • Jakob Gruber's avatar
      Reland "[regexp] Bytecode peephole optimization" · 282a74c7
      Jakob Gruber authored
      This is a reland of 66129430
      
      Fixed: Unaligned reads, unspecified evaluation order.
      
      Original change's description:
      > [regexp] Bytecode peephole optimization
      >
      > Bytecodes used by the regular expression interpreter often occur in
      > specific sequences. The number of dispatches in the interpreter can be
      > reduced if those sequences are combined into a single bytecode.
      >
      > This CL adds a peephole optimization pass for regexp bytecodes.
      > This pass checks the generated bytecode for pre-defined sequences that
      > can be merged into a single bytecode.
      >
      > With the currently implemented bytecode sequences a speedup of 1.12x on
      > regex-dna and octane-regexp is achieved.
      >
      > Bug: v8:9330
      > Change-Id: I827f93273a5848e5963c7e3329daeb898995d151
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1813743
      > Commit-Queue: Patrick Thier <pthier@google.com>
      > Reviewed-by: Peter Marshall <petermarshall@chromium.org>
      > Reviewed-by: Jakob Gruber <jgruber@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#63992}
      
      Cq-Include-Trybots: luci.v8.try:v8_linux64_ubsan_rel_ng
      Cq-Include-Trybots: luci.v8.try:v8_linux_gcc_rel
      Bug: v8:9330,chromium:1008502,chromium:1008631
      Change-Id: Ib9fc395b6809aa1debdb54d9fba5b7f09a235e5b
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1828917Reviewed-by: 's avatarPeter Marshall <petermarshall@chromium.org>
      Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#64064}
      282a74c7
  10. 26 Sep, 2019 2 commits
    • Clemens Backes [né Hammacher]'s avatar
      Revert "[regexp] Bytecode peephole optimization" · 05eda1ac
      Clemens Backes [né Hammacher] authored
      This reverts commit 66129430.
      
      Reason for revert: Fails on gcc: https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20gcc/3394
      
      Original change's description:
      > [regexp] Bytecode peephole optimization
      > 
      > Bytecodes used by the regular expression interpreter often occur in
      > specific sequences. The number of dispatches in the interpreter can be
      > reduced if those sequences are combined into a single bytecode.
      > 
      > This CL adds a peephole optimization pass for regexp bytecodes.
      > This pass checks the generated bytecode for pre-defined sequences that
      > can be merged into a single bytecode.
      > 
      > With the currently implemented bytecode sequences a speedup of 1.12x on
      > regex-dna and octane-regexp is achieved.
      > 
      > Bug: v8:9330
      > Change-Id: I827f93273a5848e5963c7e3329daeb898995d151
      > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1813743
      > Commit-Queue: Patrick Thier <pthier@google.com>
      > Reviewed-by: Peter Marshall <petermarshall@chromium.org>
      > Reviewed-by: Jakob Gruber <jgruber@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#63992}
      
      TBR=jgruber@chromium.org,petermarshall@chromium.org,pthier@google.com
      
      Change-Id: Ie526fe3691f6abdd16b51979000fdafb7afce8ef
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
      Bug: v8:9330
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1826727Reviewed-by: 's avatarClemens Backes [né Hammacher] <clemensb@chromium.org>
      Commit-Queue: Clemens Backes [né Hammacher] <clemensb@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#63998}
      05eda1ac
    • Patrick Thier's avatar
      [regexp] Bytecode peephole optimization · 66129430
      Patrick Thier authored
      Bytecodes used by the regular expression interpreter often occur in
      specific sequences. The number of dispatches in the interpreter can be
      reduced if those sequences are combined into a single bytecode.
      
      This CL adds a peephole optimization pass for regexp bytecodes.
      This pass checks the generated bytecode for pre-defined sequences that
      can be merged into a single bytecode.
      
      With the currently implemented bytecode sequences a speedup of 1.12x on
      regex-dna and octane-regexp is achieved.
      
      Bug: v8:9330
      Change-Id: I827f93273a5848e5963c7e3329daeb898995d151
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1813743
      Commit-Queue: Patrick Thier <pthier@google.com>
      Reviewed-by: 's avatarPeter Marshall <petermarshall@chromium.org>
      Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#63992}
      66129430
  11. 12 Sep, 2019 1 commit
    • Patrick Thier's avatar
      [regexp] Secure interpreter dispatch. · 67a70d7e
      Patrick Thier authored
      Currently the dispatch table could be accessed out of bounds if something
      is wrong with the generated bytecode.
      OOB access of the dispatch table can lead to jumps to arbitrary addresses
      in the code space.
      
      This CL prevents this issue by changing the following:
      BYTECODE_MASK now filters out all bits not currently used for bytecodes.
      All unused slots between the last actually defined bytecode and
      BYTECODE_MASK are now filled with BREAK Bytecodes (invalid operation).
      This way we can not access out of bounds of the dispatch table if
      something is broken/tampered with, preventing jumps to arbitrary code.
      
      Bug: v8:9699
      Change-Id: Ibce591ae94b52472ba74a9fd0666e55185af7b2c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1795349
      Commit-Queue: Patrick Thier <pthier@google.com>
      Reviewed-by: 's avatarJakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarPeter Marshall <petermarshall@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#63708}
      67a70d7e
  12. 29 Aug, 2019 1 commit
    • Jakob Gruber's avatar
      [regexp] Add dedicated flags for printing regexp code and bytecode · eebb18d3
      Jakob Gruber authored
      Printing regexp code used to behind the generic --print-code flag, but
      there was no way to distinguish between irregexp-generated code; and
      printing regexp bytecode was not supported at all (the
      --trace-regexp-bytecodes flag *did* exist, but prints the execution
      trace at runtime and not the generated bytecode sequence).
      
      This CL adds two new flags:
      
      --print-regexp-code
      --print-regexp-bytecode
      
      Regexp code is no longer printed as part of --print-code.
      
      Example output for --print-regexp-bytecode:
      
      generated bytecode for regexp pattern: .(?<!^.)
      0x1ddcc614cbd0     0  PUSH_BT, 02, 00, 00, 00, c0, 00, 00, 00 .......
      0x1ddcc614cbd8     8  LOAD_CURRENT_CHAR, 11, 00, 00, 00, b0, 00, 00, 00 .......
      0x1ddcc614cbe0    10  CHECK_CHAR, 18, 0a, 00, 00, b0, 00, 00, 00 .......
      0x1ddcc614cbe8    18  CHECK_CHAR, 18, 0d, 00, 00, b0, 00, 00, 00 .......
      0x1ddcc614cbf0    20  PUSH_CP, 01, 00, 00, 00 ...
      
      Bug: chromium:996391
      Change-Id: I731defbd7cf9ed29753a39bb1d7205dc136ca950
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1773249
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Auto-Submit: Jakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarPeter Marshall <petermarshall@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#63442}
      eebb18d3
  13. 26 Jul, 2019 1 commit
  14. 12 Jun, 2019 1 commit
  15. 23 Jan, 2019 1 commit
  16. 27 Jan, 2016 1 commit
  17. 18 Jan, 2016 1 commit
  18. 17 Nov, 2015 3 commits
  19. 30 Sep, 2015 1 commit
  20. 13 Aug, 2015 1 commit
  21. 29 Apr, 2014 1 commit
  22. 30 Mar, 2012 1 commit
  23. 29 Nov, 2011 1 commit
  24. 19 Oct, 2010 3 commits
  25. 25 May, 2009 1 commit
  26. 19 Feb, 2009 1 commit
  27. 18 Feb, 2009 1 commit
  28. 20 Jan, 2009 1 commit
  29. 19 Jan, 2009 1 commit
  30. 08 Jan, 2009 1 commit
  31. 19 Dec, 2008 1 commit
  32. 08 Dec, 2008 1 commit
    • erik.corry@gmail.com's avatar
      Irregexp: · ba09ec5e
      erik.corry@gmail.com authored
      * Facility for generating a node several ways.  This allows
        code to be generated for a node knowing where it is trying
        to match relative to the 'current position' and it allows
        code to be generated that knows where to backtrack to.  Both
        allow dramatic reductions in the amount of popping and pushing
        on the stack and the number of indirect jumps.
      * Generate special backtracking for greedy quantifiers on
        constant-length atoms.  This allows .* to run in constant
        space relative to input string size.
      * When we are checking a long sequence of characters or character
        classes in the input then we do them right to left and only the
        first (rightmost) needs to check for end-of-string.
      * Record the pattern in the profile instead of just <CompiledRegExp>
      * Nodes no longer contain an on_failure_ node.  This was only used
        for lookaheads and they are now handled with a choice node instead.
      Review URL: http://codereview.chromium.org/12900
      
      git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@930 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
      ba09ec5e
  33. 28 Nov, 2008 1 commit