1. 20 Jan, 2022 1 commit
  2. 18 Jan, 2022 2 commits
  3. 22 Dec, 2021 1 commit
  4. 21 Dec, 2021 1 commit
  5. 16 Dec, 2021 1 commit
  6. 01 Dec, 2021 1 commit
    • Jakob Gruber's avatar
      [regexp] Fix CharacterRange limits again again again · 2e17aaca
      Jakob Gruber authored
      When emitting code, character ranges must only specify ranges which
      the actual subject string (one- or two-byte) may contain.
      
      This was not always the case, specifically for ranges with
      `from <= kMaxUint8` and `to > kMaxUint8`.
      
      The reason this is so tricky: 1. not all parts of the pipeline know
      whether we are compiling for one- or two-byte subjects; 2. for
      case-insensitive regexps, an out-of-bounds CharacterRange may have an
      in-bounds case equivalent (e.g. /[Ÿ]/i also matches 'ÿ' == \u{ff}),
      which only gets added somewhere in the middle of the pipeline.
      
      Our current solution is to clamp immediately before code emission. We
      also keep the existing handling/dchecks of the 0x10ffff marker value
      which may occur in the two-byte subject case.
      
      Bug: v8:11069
      Change-Id: Ic7b34a13a900ea2aa3df032daac9236bf5682a42
      Fixed: chromium:1275096
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3306569
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
      Cr-Commit-Position: refs/heads/main@{#78186}
      2e17aaca
  7. 15 Nov, 2021 2 commits
  8. 09 Nov, 2021 1 commit
  9. 08 Nov, 2021 2 commits
  10. 05 Nov, 2021 1 commit
  11. 04 Nov, 2021 1 commit
    • Jakob Gruber's avatar
      [string] Micro-optimize String::Flatten · 4593f3c6
      Jakob Gruber authored
      - Use a StringShape instead of repeatedly querying type.
      - Add a shortcut for already-flat strings.
      - Unhandlify where possible (all except SlowFlatten).
      - Mark String::Flatten and StringShape methods V8_INLINE.
      - Add a specialized ConsString::IsFlat overload.
      
      Drive-by: Various (add const, remove this->, helper methods).
      
      Bug: v8:12195
      Change-Id: If20df12bc29c29cff2005fdc9bd826ed9f303463
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3259527
      Auto-Submit: Jakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarCamillo Bruni <cbruni@chromium.org>
      Commit-Queue: Camillo Bruni <cbruni@chromium.org>
      Cr-Commit-Position: refs/heads/main@{#77701}
      4593f3c6
  12. 03 Nov, 2021 3 commits
  13. 27 Oct, 2021 1 commit
  14. 26 Oct, 2021 1 commit
  15. 25 Oct, 2021 1 commit
    • Jakob Gruber's avatar
      [regexp] Only emit valid ranges in MakeRangeArray · b7dc9915
      Jakob Gruber authored
      Character class handling in the irregexp pipeline is quite complex;
      codepoints outside the BMP (basic multilingual plane) are only
      translated into surrogate pairs when needed, e.g. when the subject
      string is two-byte. If not needed, the codepoints simply stay part of
      the list of CharacterRanges.
      
      In EmitCharClass, we determine the valid subset of ranges through
      ranges_length; until this CL, we forgot to pass that information on to
      MakeRangeArray. Do that now by truncating the list of CharacterRanges.
      
      Fixed: chromium:1262423
      Change-Id: I5bb5b839e9935890ca2d10908ad66d72c3217178
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3240782
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Auto-Submit: Jakob Gruber <jgruber@chromium.org>
      Reviewed-by: 's avatarMathias Bynens <mathias@chromium.org>
      Cr-Commit-Position: refs/heads/main@{#77514}
      b7dc9915
  16. 21 Oct, 2021 1 commit
  17. 20 Oct, 2021 2 commits
    • Milad Fa's avatar
      PPC/s390: [regexp] Compact codegen for large character classes · 841d33a5
      Milad Fa authored
      Port 8bbb44e5
      
      Original Commit Message:
      
          Large character classes may easily be created when unicode
          properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
          expanded internally into character classes that consist of hundreds
          of character ranges. Previously to this CL, we'd emit branching code
          for each of these ranges, leading to very large regexp code objects.
      
          This CL adds a new codegen mode for large character classes (where
          'large' currently means > 16 ranges). Instead of emitting branching
          code inline, the ranges are written into a ByteArray and we call into
          the C function IsCharacterInRangeArray for the actual branching logic.
          The ByteArray is smaller than emitted code and is deduplicated if the
          same character class is matched repeatedly in the same pattern.
      
          Note this mode is *not* implemented for the interpreter, since we
          currently don't have a constant pool for irregexp bytecode, and thus
          cannot reference ByteArrays.
      
      R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
      BUG=
      LOG=N
      
      Change-Id: I2ded01fa2767e56e72be81b949eefb5fb85b7013
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3231981Reviewed-by: 's avatarJunliang Yan <junyan@redhat.com>
      Commit-Queue: Milad Fa <mfarazma@redhat.com>
      Cr-Commit-Position: refs/heads/main@{#77473}
      841d33a5
    • Zhao Jiazhong's avatar
      [loong64][mips][regexp] Compact codegen for large character classes · 58559fb7
      Zhao Jiazhong authored
      Port commit 8bbb44e5
      
      Bug: v8:11069
      Change-Id: I66532e8410390bc220d7811e320bb44181b00d1f
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3234303Reviewed-by: 's avatarLiu yu <liuyu@loongson.cn>
      Commit-Queue: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
      Cr-Commit-Position: refs/heads/main@{#77468}
      58559fb7
  18. 19 Oct, 2021 1 commit
    • Jakob Gruber's avatar
      [regexp] Compact codegen for large character classes · 8bbb44e5
      Jakob Gruber authored
      Large character classes may easily be created when unicode
      properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
      expanded internally into character classes that consist of hundreds
      of character ranges. Previously to this CL, we'd emit branching code
      for each of these ranges, leading to very large regexp code objects.
      
      This CL adds a new codegen mode for large character classes (where
      'large' currently means > 16 ranges). Instead of emitting branching
      code inline, the ranges are written into a ByteArray and we call into
      the C function IsCharacterInRangeArray for the actual branching logic.
      The ByteArray is smaller than emitted code and is deduplicated if the
      same character class is matched repeatedly in the same pattern.
      
      Note this mode is *not* implemented for the interpreter, since we
      currently don't have a constant pool for irregexp bytecode, and thus
      cannot reference ByteArrays.
      
      Bug: v8:11069
      Change-Id: I2d728e42d85114b796c637f791848731a104cd54
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3229377Reviewed-by: 's avatarPatrick Thier <pthier@chromium.org>
      Auto-Submit: Jakob Gruber <jgruber@chromium.org>
      Commit-Queue: Jakob Gruber <jgruber@chromium.org>
      Cr-Commit-Position: refs/heads/main@{#77463}
      8bbb44e5
  19. 14 Oct, 2021 1 commit
  20. 13 Oct, 2021 1 commit
  21. 12 Oct, 2021 7 commits
  22. 11 Oct, 2021 1 commit
  23. 06 Oct, 2021 4 commits
  24. 04 Oct, 2021 1 commit
  25. 30 Sep, 2021 1 commit
    • Milad Fa's avatar
      PPC/s390: [regexp] Fix stack growth for global regexps · 9227a8da
      Milad Fa authored
      Port 3e3a027d
      
      Original Commit Message:
      
          Irregexp reentrancy (crrev.com/c/3162604) introduced a bug for global
          regexp execution in which each iteration would use a new stack region
          (i.e. we forgot to pop the regexp stack pointer when starting a new
          iteration).
      
          This CL fixes that by popping the stack pointer on the loop backedge.
      
          At a high level:
      
          - Initialize the backtrack_stackpointer earlier and avoid clobbering
            it by setup code.
          - Pop it on the loop backedge.
          - Slightly refactor Push/Pop operations to avoid unneeded memory
            accesses.
      
      R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
      BUG=
      LOG=N
      
      Change-Id: Iafe6814d3695e83fced6a46209accf5e712d56f6
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3198391Reviewed-by: 's avatarJunliang Yan <junyan@redhat.com>
      Commit-Queue: Milad Fa <mfarazma@redhat.com>
      Cr-Commit-Position: refs/heads/main@{#77180}
      9227a8da