Commit 4e453429 authored by Marja Hölttä's avatar Marja Hölttä Committed by Commit Bot

[script streaming] Fix U+feff handling.

U+feff is the UTF BOM but if it occurs inside the text, it's a "zero-width
no-break space". However, the UTF-8 decoder in script streaming still thought
it's a BOM and skipped it. The correct way to handle it would be to create a
U+feff code point instead - the Scanner will then handle it as whitespace.

This is a discrepancy between the Blink UTF-8 decoder and the V8 UTF-8 decoder,
and caused the source positions be off by one. This bug went unnoticed, since
normally off-by-one in this situation doesn't make the code to break.

BUG=chromium:758508,chromium:758236

Change-Id: Ib92a3ee65c402e21b77e42537db2a021cff55379
Reviewed-on: https://chromium-review.googlesource.com/632096Reviewed-by: 's avatarAdam Klein <adamk@chromium.org>
Commit-Queue: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#47583}
parent 04d3b60e
......@@ -507,6 +507,9 @@ class Utf8ChunkSource : public ChunkSource {
// surrogate pair, hence addition of 1.
uc16* decoded_data = new uc16[byte_length + 1];
i::CopyCharsUnsigned(decoded_data, data, ascii_prefix_len);
if (ascii_prefix_len > 0) {
is_at_first_char_ = false;
}
size_t decoded_len = ascii_prefix_len;
for (size_t i = ascii_prefix_len; i < byte_length; ++i) {
unibrow::uchar t =
......
......@@ -498,10 +498,17 @@ TEST(Regress6377) {
"\x80" // second chunk - one-byte end of 4-byte seq
"a\xc3\0" // and an 'a' + start of 2-byte seq
"\xbf\0", // third chunk - end of 2-byte seq
// Regression test for
// https://bugs.chromium.org/p/chromium/issues/detail?id=758508
"X\xef\xbb\xbfX\0", // first chunk - no BOM but U+feff in the middle
};
const std::vector<std::vector<uint16_t>> unicode = {
{0xd800, 0xdc00, 97}, {0xfff, 97}, {0xff, 97}, {0xd800, 0xdc00, 97, 0xff},
};
{0xd800, 0xdc00, 97},
{0xfff, 97},
{0xff, 97},
{0xd800, 0xdc00, 97, 0xff},
{88, 0xfeff, 88}};
CHECK_EQ(unicode.size(), sizeof(cases) / sizeof(cases[0]));
for (size_t c = 0; c < unicode.size(); ++c) {
ChunkSource chunk_source(cases[c]);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment