- 11 May, 2016 1 commit
-
-
jshin authored
When I18N is enabled, use ICU's case conversion API and transliteration API [1] to implement String.prototype.to{Upper,Lower}Case and String.prototype.toLocale{Upper,Lower}Case. * ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js * The above 4 functions are overridden with those in i18n.js when --icu_case_mapping flag is turned on. To control the override by the flag, they're overriden in icu-case-mapping.js Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't support locale-sensitive case conversion for Turkic languages (az, tr), Greek (el) and Lithuanian (lt). Before ICU APIs for the most general case are called, a fast-path for Latin-1 is tried. It's taken from Blink and adopted as necessary. This fast path is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken when a locale (explicitly specified or default) is not in {az, el, lt, tr}. With these changes, a build with --icu_case_mapping=true passes a bunch of tests in test262/intl402/Strings/* and intl/* that failed before. Handling of pure ASCII strings (aligned at word boundary) are not as fast as Unibrow's implementation that uses word-by-word case conversion. OTOH, Latin-1 input handling is faster than Unibrow. General Unicode input handling is slower but more accurate. See https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_HGSbrg/edit?usp=sharing for the benchmark. This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but has changed significantly since. [1] See why transliteration API is needed for uppercasing in Greek. http://bugs.icu-project.org/trac/ticket/10582 R=yangguo BUG=v8:4476,v8:4477 LOG=Y TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*, mjsunit/string-case, intl/general/case* Review-Url: https://codereview.chromium.org/1812673005 Cr-Commit-Position: refs/heads/master@{#36187}
-
- 10 Oct, 2014 1 commit
-
-
yangguo@chromium.org authored
ES5.1 section 6 ("Source Text"): "Throughout the rest of this document, the phrase “code unit” and the word “character” will be used to refer to a 16-bit unsigned value used to represent a single 16-bit unit of text." This changed in ES6 draft section 10.1 ("Source Text"): "The ECMAScript code is expressed using Unicode, version 5.1 or later. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars." This patch is to reflect this spec change. BUG=v8:3617 LOG=Y R=jochen@chromium.org Review URL: https://codereview.chromium.org/640193002 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@24510 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
-
- 01 Aug, 2013 1 commit
-
-
jochen@chromium.org authored
R=jkummerow@chromium.org Review URL: https://codereview.chromium.org/21511002 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@16013 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
-
- 10 Jul, 2013 1 commit
-
-
jochen@chromium.org authored
BUG=v8:2745 R=jkummerow@chromium.org Review URL: https://codereview.chromium.org/18687003 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@15584 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
-