[parser] Refactor streaming scanner streams.

Unify, simplify logic, reduce UTF8 specific handling. Intend of this is also to have stream views. Stream views can be used concurrently by multiple threads, but only one thread may fetch new data from the underlying source. This together with unified stream view creation is intended to be used for parse tasks. BUG=v8:6093 Change-Id: Ied8e93090c506d4735080298f0fdaeed32043915 Reviewed-on: https://chromium-review.googlesource.com/501789 Commit-Queue: Wiktor Garbacz <wiktorg@google.com> Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org> Reviewed-by: Marja Hölttä <marja@chromium.org> Cr-Commit-Position: refs/heads/master@{#45336}

[parser] Refactor streaming scanner streams.
Unify, simplify logic, reduce UTF8 specific handling. Intend of this is also to have stream views. Stream views can be used concurrently by multiple threads, but only one thread may fetch new data from the underlying source. This together with unified stream view creation is intended to be used for parse tasks. BUG=v8:6093 Change-Id: Ied8e93090c506d4735080298f0fdaeed32043915 Reviewed-on: https://chromium-review.googlesource.com/501789 Commit-Queue: Wiktor Garbacz <wiktorg@google.com> Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org> Reviewed-by: Marja Hölttä <marja@chromium.org> Cr-Commit-Position: refs/heads/master@{#45336}
ce538f70 · Wiktor Garbacz · Commit Bot · e418a1e4 · ce538f70 · ce538f70
Commit ce538f70 authored Apr 28, 2017 by Wiktor Garbacz Committed by Commit Bot May 16, 2017
Showing with 444 additions and 581 deletions

v8.h include/v8.h +0 -5

scanner-character-streams.cc src/parsing/scanner-character-streams.cc +432 -576

test-scanner-streams.cc test/cctest/parsing/test-scanner-streams.cc +12 -0

No files found.
--- a/include/v8.h
+++ b/include/v8.h
@@ -1266,11 +1266,6 @@ class V8_EXPORT ScriptCompiler {
     * length of the data returned. When the data ends, GetMoreData should
     * return 0. Caller takes ownership of the data.
     *
-     * When streaming UTF-8 data, V8 handles multi-byte characters split between
-     * two data chunks, but doesn't handle multi-byte characters split between
-     * more than two data chunks. The embedder can avoid this problem by always
-     * returning at least 2 bytes of data.
-     *
     * If the embedder wants to cancel the streaming, they should make the next
     * GetMoreData call return 0. V8 will interpret it as end of data (and most
     * probably, parsing will fail). The streaming task will return as soon as

--- a/src/parsing/scanner-character-streams.cc
+++ b/src/parsing/scanner-character-streams.cc
--- a/test/cctest/parsing/test-scanner-streams.cc
+++ b/test/cctest/parsing/test-scanner-streams.cc
@@ -435,6 +435,18 @@ TEST(CharacterStreams) {
  TestCharacterStreams(buffer, arraysize(buffer) - 1, 576, 3298);
 }
+TEST(Uft8MultipleBOMChunks) {
+  const char* chunks = "\xef\xbb\xbf\0\xef\xbb\xbf\0\xef\xbb\xbf\0a\0";
+  const uint16_t unicode[] = {0xFEFF, 0xFEFF, 97};
+  ChunkSource chunk_source(chunks);
+  std::unique_ptr<i::Utf16CharacterStream> stream(i::ScannerStream::For(
+      &chunk_source, v8::ScriptCompiler::StreamedSource::UTF8, nullptr));
+  for (size_t i = 0; i < arraysize(unicode); i++) {
+    CHECK_EQ(unicode[i], stream->Advance());
+  }
+  CHECK_EQ(i::Utf16CharacterStream::kEndOfInput, stream->Advance());
+}
 // Regression test for crbug.com/651333. Read invalid utf-8.
 TEST(Regress651333) {
  const uint8_t bytes[] =