1. 31 Jul, 2018 1 commit
    • Jungshik Shin's avatar
      Fix canonicalization of grandfathered tags · f24b575d
      Jungshik Shin authored
      ICU maps a few grandfathered tags to made-up values even when there
      is no preferred value entry in the IANA language tag registry. [1]
      
      1. Check for grandfathered tags without preferred value upfront
         and return them as they're.
      2. Lowercase the input before structural validity check to simplify
         check for grandfathered tag without preferred value as well
         as regexps used in the structural validity check.
      
      intl/general/grandfathered_tags_without_preferred_value is added and
      intl/general/language_tags_with_preferred_values is changed to check
      for case-insensitive matching of grandfathered tags.
      
      [1] https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
      
      Bug: v8:7669
      Test: test262/intl402/Intl/getCanonicalLocales/preferred-grandfathered
      Test: intl/general/grandfathered_tags_without_preferred_value
      Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
      Cq-Include-Trybots: luci.chromium.try:linux_chromium_rel_ng
      Change-Id: Ie0520de8712928300fd71fe152909789483ec256
      Reviewed-on: https://chromium-review.googlesource.com/1156529
      Commit-Queue: Jungshik Shin <jshin@chromium.org>
      Reviewed-by: 's avatarSathya Gunasekaran <gsathya@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#54829}
      f24b575d
  2. 27 Jul, 2018 1 commit
  3. 11 Jun, 2018 1 commit
  4. 26 Apr, 2018 1 commit
    • Jungshik Shin's avatar
      Fix the fast path for locale canonicalization · 919270e0
      Jungshik Shin authored
      Not all 2 or 3 letter language codes are canonical. Some of them need
      to be canonicalized.
      
      Specifically, exclude {jw,ji,iw,in} and all three-letter codes from the
      fast path except for 'fil'.
      
      {jw,ji,iw,in} are deprecated ISO 639 codes for
      {Javanese, Yiddish, Hebrew, Indonesian}. They should be
      canonicalized to {jv,yi,he,id}. So, do not return early
      in the fast path, but pass it down to the full canonicalization.
      
      In addition, there are 70+ deprecated 3-letter codes that need to be
      replaced by their modern equivalents. Instead of checking and replacing
      in v8, just pass them to ICU to handle.
      
      Along with the following ICU change, two more tests will pass.
      
        https://chromium-review.googlesource.com/c/chromium/deps/icu/+/1026797
      
      These two tests still fail because of the disagreement between ICU and the test
      expectations about 5 grandfathered tags with no preferred value (e.g.
      i-default, zh-min, cel-gaulish).
      
        'intl402/Intl/getCanonicalLocales/canonicalized-tags'
        'intl402/Intl/getCanonicalLocales/preferred-grandfathered'
      
      Bug: v8:5693, v8:7669
      Test: test262/intl402/language-tags-canonicalized.js
      Test: test262/intl402/Intl/preferred-variants.js
      Test: intl/general/language_tags_with_preferred_values.js
      Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
      Change-Id: Ide7e9c90ac046859604c7b71c641f84ce9c64be5
      Reviewed-on: https://chromium-review.googlesource.com/1023379Reviewed-by: 's avatarJakob Kummerow <jkummerow@chromium.org>
      Commit-Queue: Jungshik Shin <jshin@chromium.org>
      Cr-Commit-Position: refs/heads/master@{#52823}
      919270e0
  5. 12 Oct, 2017 1 commit
  6. 02 Aug, 2017 1 commit
  7. 29 Jun, 2017 1 commit
  8. 04 May, 2017 1 commit
  9. 13 Jan, 2017 1 commit
    • jshin's avatar
      Fix two DCHECK failures in ICU case mapping code · ac9e6285
      jshin authored
      1.
      DCHECK in runtime-i18n.cc for case mapping was wrong to
      assume that the longest primary language tag is 3 characters.
      BCP 47 actually allows up to 8 characters.
      
      2. GetFlatContent() was called to a string without flattening it first.
      
      BUG=680314,680464
      TEST=intl/general/case-mapping (see also the bugs)
      
      Review-Url: https://codereview.chromium.org/2629763003
      Cr-Commit-Position: refs/heads/master@{#42343}
      ac9e6285
  10. 23 Dec, 2016 1 commit
    • littledan's avatar
      [intl] Add new semantics + compat fallback to Intl constructor · b0a09d78
      littledan authored
      ECMA 402 v2 made Intl constructors more strict in terms of how they would
      initialize objects, refusing to initialize objects which have already
      been constructed. However, when Chrome tried to ship these semantics,
      we ran into web compatibility issues.
      
      This patch tries to square the circle and implement the simpler v2 object
      semantics while including a compatibility workaround to allow objects to
      sort of be initialized later, storing the real underlying Intl object
      in a symbol-named property.
      
      The new semantics are described in this PR against the ECMA 402 spec:
      https://github.com/tc39/ecma402/pull/84
      
      BUG=v8:4360, v8:4870
      LOG=Y
      
      Review-Url: https://codereview.chromium.org/2582993002
      Cr-Commit-Position: refs/heads/master@{#41943}
      b0a09d78
  11. 19 Dec, 2016 1 commit
  12. 28 Nov, 2016 1 commit
    • jshin's avatar
      Fix the uppercasing of U+00E7(ç) and U+00F7(÷) · 2f5da9a5
      jshin authored
      Due to a typo in runtime-i18n.js, 'ç'(U+00E7) was not uppercased while
      '÷'(U+00F7) was incorrectly uppercased to '×'(U+00D7).
      
      Add a comprehensive test for Latin-1 supplemental block (U+00A0 ~ U+00FF).
      (they're special-cased for speed-up and needs to have a test for the range.).
      
      TEST=intl/general/case-mapping
      BUG=v8:5681
      
      Review-Url: https://codereview.chromium.org/2533033003
      Cr-Commit-Position: refs/heads/master@{#41331}
      2f5da9a5
  13. 15 Nov, 2016 1 commit
    • jshin's avatar
      Use a regular ICU API for el-Upper · 4f224b39
      jshin authored
      ICU now supports uppercasing in Greek via its regular uppercasing API.
      So, there's no need to use a slow transliteration API for uppercasing
      in Greek.
      
      This CL includes rolling ICU to ICU 58.1.
      
      Besides, drop intl402/Intl/getCanonicalLocales/weird-cases from
      test262.status because it passes now with ICU 58.1.
      
      BUG=chromium:637001,v8:5012
      
      Review-Url: https://codereview.chromium.org/2491333003
      Cr-Commit-Position: refs/heads/master@{#41009}
      4f224b39
  14. 18 Aug, 2016 1 commit
  15. 11 Aug, 2016 1 commit
  16. 10 Aug, 2016 1 commit
  17. 11 May, 2016 1 commit
    • jshin's avatar
      Use ICU case conversion/transliterator for case conversion · b348d47b
      jshin authored
      When I18N is enabled, use ICU's case conversion API and transliteration
      API [1] to implement String.prototype.to{Upper,Lower}Case and
      String.prototype.toLocale{Upper,Lower}Case.
      
      * ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js
      * The above 4 functions are overridden with those in i18n.js when
        --icu_case_mapping flag is turned on. To control the override by the flag,
        they're overriden in icu-case-mapping.js
      
      Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't
      support locale-sensitive case conversion for Turkic languages (az, tr),
      Greek (el) and Lithuanian (lt).
      
      Before ICU APIs for the most general case are called, a fast-path for Latin-1
      is tried. It's taken from Blink and adopted as necessary. This fast path
      is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken
      when a locale (explicitly specified or default) is not in {az, el, lt, tr}.
      
      With these changes, a build with --icu_case_mapping=true passes a bunch
      of tests in test262/intl402/Strings/* and intl/* that failed before.
      
      Handling of pure ASCII strings (aligned at word boundary) are not as fast
      as Unibrow's implementation that uses word-by-word case conversion. OTOH,
      Latin-1 input handling is faster than Unibrow. General Unicode input
      handling is slower but more accurate.
      
      See https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_HGSbrg/edit?usp=sharing for the benchmark.
      
      This CL started with http://crrev.com/1544023002#ps200001 by littledan@,
      but has changed significantly since.
      
      [1] See why transliteration API is needed for uppercasing in Greek.
          http://bugs.icu-project.org/trac/ticket/10582
      
      R=yangguo
      BUG=v8:4476,v8:4477
      LOG=Y
      TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*, mjsunit/string-case,
           intl/general/case*
      
      Review-Url: https://codereview.chromium.org/1812673005
      Cr-Commit-Position: refs/heads/master@{#36187}
      b348d47b
  18. 10 Oct, 2014 1 commit
    • yangguo@chromium.org's avatar
      Allow identifier code points from supplementary multilingual planes. · 0dd69ec4
      yangguo@chromium.org authored
      ES5.1 section 6 ("Source Text"):
      "Throughout the rest of this document, the phrase “code unit” and the
      word “character” will be used to refer to a 16-bit unsigned value
      used to represent a single 16-bit unit of text."
      
      This changed in ES6 draft section 10.1 ("Source Text"):
      "The ECMAScript code is expressed using Unicode, version 5.1 or later.
      ECMAScript source text is a sequence of code points. All Unicode code
      point values from U+0000 to U+10FFFF, including surrogate code points,
      may occur in source text where permitted by the ECMAScript grammars."
      
      This patch is to reflect this spec change.
      
      BUG=v8:3617
      LOG=Y
      R=jochen@chromium.org
      
      Review URL: https://codereview.chromium.org/640193002
      
      git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@24510 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
      0dd69ec4
  19. 01 Aug, 2013 1 commit
  20. 10 Jul, 2013 1 commit