Commit e895b7af authored by Seth Brenith's avatar Seth Brenith Committed by V8 LUCI CQ

Background merging of deserialized scripts

Recently, https://crrev.com/c/v8/v8/+/3681880 added new API functions
with which an embedder could request that V8 merge newly deserialized
script data into an existing Script from the Isolate's compilation
cache. This change implements those new functions. This functionality is
still disabled by default due to the flag
merge_background_deserialized_script_with_compilation_cache.

The goal of this new functionality is to reduce memory usage when
multiple frames load the same script with a long delay between (long
enough for the script to have been evicted from Blink's in-memory cache
and for the top-level SharedFunctionInfo to be flushed). In that case,
there are two Script objects for the same script: one which was found in
the Isolate compilation cache (the "old" script), and one which was
recently deserialized (the "new" script). The new script's object graph
is essentially standalone: it may point to internalized strings and
readonly objects such as the empty feedback metadata, but otherwise
it is unconnected to the rest of the heap. The merging logic takes any
useful data from the new script's object graph and attaches it into the
old script's object graph, so that the new Script object and any other
duplicated objects can be discarded. More specifically:

1. If the new Script has a SharedFunctionInfo for a particular function
   literal, and the old Script does not, then the old Script is updated
   to refer to the new SharedFunctionInfo.
2. If the new Script has a compiled SharedFunctionInfo for a particular
   function literal, and the old Script has an uncompiled
   SharedFunctionInfo, then the old SharedFunctionInfo is updated to
   point to the function_data and feedback_metadata from the new
   SharedFunctionInfo.
3. If any used object from the new object graph points to a
   SharedFunctionInfo, where the old object graph contains a matching
   SharedFunctionInfo for the same function literal, then that pointer
   is updated to point to the old SharedFunctionInfo.

The document at [0] includes diagrams showing an example merge on a very
small script.

Steps 1 and 2 above are pretty simple, but step 3 requires walking a
possibly large set of objects, so this new API lets the embedder run
step 3 from a background thread. Steps 1 and 2 are performed later, on
the main thread.

The next important question is: in what ways can the old script's object
graph be modified during the background execution of step 3, or during
the time after step 3 but before steps 1 and 2?

A. SharedFunctionInfos can go from compiled to uncompiled due to
   flushing. This is okay; the worst outcome is that the function would
   need to be compiled again later. Such a risk is already present,
   since V8 doesn't keep IsCompiledScopes for every compiled function in
   a background-deserialized script.
B. SharedFunctionInfos can go from uncompiled to compiled due to lazy
   compilation. This is also okay; the merge completion logic on the
   main thread will just keep this lazily compiled data rather than
   inserting compiled data from the newly deserialized object graph.
C. SharedFunctionInfos can be cleared from the Script's weak array if
   they are no longer referenced. This is mostly okay, because any
   SharedFunctionInfo that is needed by the background merge is strongly
   referenced and therefore can't be cleared. The only problem arises if
   the top-level SharedFunctionInfo gets cleared, so the merge task must
   deliberately keep a reference to that one.
D. SharedFunctionInfos can be created if they are needed due to lazy
   compilation of a parent function. This change is somewhat troublesome
   because it invalidates the background thread's work and requires a
   re-traversal on the main thread to update any pointers that should
   point to this lazily compiled SharedFunctionInfo.

At a high level, this change implements three previously unimplemented
functions in BackgroundDeserializeTask (in compiler.cc) and updates one:

- BackgroundDeserializeTask::SourceTextAvailable, run on the main
  thread, checks whether there is a matching Script in the Isolate
  compilation cache which doesn't already have a top-level
  SharedFunctionInfo. If so, it saves that Script in a persistent
  handle.
- BackgroundDeserializeTask::ShouldMergeWithExistingScript checks
  whether the persistent handle from the first step exists (a fast
  operation which can be called from any thread).
- BackgroundDeserializeTask::MergeWithExistingScript, run on a
  background thread, performs step 3 of the merge described above and
  generates lists of persistent data describing how the main thread can
  complete the merge.
- BackgroundDeserializeTask::Finish is updated to perform the merge
  steps 1 and 2 listed above, as well as a possible re-traversal of the
  graph if required due to newly created SharedFunctionInfos in the old
  Script.

The merge logic has nothing to do with deserialization, and indeed I
hope to reuse it for background compilation tasks as well, so it is all
contained within a new class BackgroundMergeTask (in compiler.h,cc). It
uses a second class, ForwardPointersVisitor (in compiler.cc) to perform
the object visitation that updates pointers to SharedFunctionInfos.

[0] https://docs.google.com/document/d/1UksB5Vm7TT1-f3S9W1dK_rP9jKn_ly0WVm_UDPpWuBw/edit

Bug: v8:12808
Change-Id: Id405869e9d5b106ca7afd9c4b08cb5813e6852c6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3739232Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Cr-Commit-Position: refs/heads/main@{#81941}
parent 4af62459
......@@ -1161,6 +1161,7 @@ filegroup(
"src/codegen/assembler.cc",
"src/codegen/assembler.h",
"src/codegen/atomic-memory-order.h",
"src/codegen/background-merge-task.h",
"src/codegen/bailout-reason.cc",
"src/codegen/bailout-reason.h",
"src/codegen/callable.h",
......
......@@ -2791,6 +2791,7 @@ v8_header_set("v8_internal_headers") {
"src/codegen/assembler-inl.h",
"src/codegen/assembler.h",
"src/codegen/atomic-memory-order.h",
"src/codegen/background-merge-task.h",
"src/codegen/bailout-reason.h",
"src/codegen/callable.h",
"src/codegen/code-comments.h",
......
// Copyright 2022 the V8 project authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
#ifndef V8_CODEGEN_BACKGROUND_MERGE_TASK_H_
#define V8_CODEGEN_BACKGROUND_MERGE_TASK_H_
#include <vector>
#include "src/handles/maybe-handles.h"
namespace v8 {
namespace internal {
class FeedbackMetadata;
class PersistentHandles;
class Script;
class SharedFunctionInfo;
class String;
struct ScriptDetails;
// Contains data transferred between threads for background merging between a
// newly compiled or deserialized script and an existing script from the Isolate
// compilation cache.
class V8_EXPORT_PRIVATE BackgroundMergeTask {
public:
// Step 1: on the main thread, check whether the Isolate compilation cache
// contains the script.
void SetUpOnMainThread(Isolate* isolate, Handle<String> source_text,
const ScriptDetails& script_details,
LanguageMode language_mode);
// Step 2: on the background thread, update pointers in the new Script's
// object graph to point to corresponding objects from the cached Script where
// appropriate. May only be called if HasCachedScript returned true.
void BeginMergeInBackground(LocalIsolate* isolate, Handle<Script> new_script);
// Step 3: on the main thread again, complete the merge so that all relevant
// objects are reachable from the cached Script. May only be called if
// HasPendingForegroundWork returned true. Returns the top-level
// SharedFunctionInfo that should be used.
Handle<SharedFunctionInfo> CompleteMergeInForeground(
Isolate* isolate, Handle<Script> new_script);
bool HasCachedScript() const { return !cached_script_.is_null(); }
bool HasPendingForegroundWork() const {
return !used_new_sfis_.empty() ||
!new_compiled_data_for_cached_sfis_.empty();
}
private:
std::unique_ptr<PersistentHandles> persistent_handles_;
// Data from main thread:
MaybeHandle<Script> cached_script_;
// Data from background thread:
// The top-level SharedFunctionInfo from the cached script, if one existed,
// just to keep it alive.
MaybeHandle<SharedFunctionInfo> toplevel_sfi_from_cached_script_;
// New SharedFunctionInfos which are used because there was no corresponding
// SharedFunctionInfo in the cached script. The main thread must:
// 1. Check whether the cached script gained corresponding SharedFunctionInfos
// for any of these, and if so, redo the merge.
// 2. Update the cached script's shared_function_infos list to refer to these.
std::vector<Handle<SharedFunctionInfo>> used_new_sfis_;
// SharedFunctionInfos from the cached script which were not compiled, with
// function_data and feedback_metadata from the corresponding new
// SharedFunctionInfo. If the SharedFunctionInfo from the cached script is
// still uncompiled when finishing, the main thread must set the two fields.
struct NewCompiledDataForCachedSfi {
Handle<SharedFunctionInfo> cached_sfi;
Handle<Object> function_data;
Handle<FeedbackMetadata> feedback_metadata;
};
std::vector<NewCompiledDataForCachedSfi> new_compiled_data_for_cached_sfis_;
};
} // namespace internal
} // namespace v8
#endif // V8_CODEGEN_BACKGROUND_MERGE_TASK_H_
This diff is collapsed.
......@@ -11,6 +11,7 @@
#include "src/ast/ast-value-factory.h"
#include "src/base/platform/elapsed-timer.h"
#include "src/base/small-vector.h"
#include "src/codegen/background-merge-task.h"
#include "src/codegen/bailout-reason.h"
#include "src/common/globals.h"
#include "src/execution/isolate.h"
......@@ -635,6 +636,7 @@ class V8_EXPORT_PRIVATE BackgroundDeserializeTask {
Isolate* isolate_for_local_isolate_;
AlignedCachedData cached_data_;
CodeSerializer::OffThreadDeserializeData off_thread_data_;
BackgroundMergeTask background_merge_task_;
};
} // namespace internal
......
......@@ -485,9 +485,16 @@ Handle<CompilationCacheTable> CompilationCacheTable::PutScript(
cache = EnsureScriptTableCapacity(isolate, cache);
entry = cache->FindInsertionEntry(isolate, key.Hash());
}
// TODO(v8:12808): Once all code paths are updated to reuse a Script if
// available, we could DCHECK here that the Script in the existing entry
// matches the Script in the new key. For now, there is no such guarantee.
// We might be tempted to DCHECK here that the Script in the existing entry
// matches the Script in the new key. However, replacing an existing Script
// can still happen in some edge cases that aren't common enough to be worth
// fixing. Consider the following unlikely sequence of events:
// 1. BackgroundMergeTask::SetUpOnMainThread finds a script S1 in the cache.
// 2. DevTools is attached and clears the cache.
// 3. DevTools is detached; the cache is reenabled.
// 4. A new instance of the script, S2, is compiled and placed into the cache.
// 5. The merge from step 1 finishes on the main thread, still using S1, and
// places S1 into the cache, replacing S2.
cache->SetKeyAt(entry, *k);
cache->SetPrimaryValueAt(entry, *value);
if (!found_existing) {
......
......@@ -422,6 +422,9 @@ class WeakArrayList
// duplicates.
V8_EXPORT_PRIVATE bool RemoveOne(const MaybeObjectHandle& value);
// Searches the array (linear time) and returns whether it contains the value.
V8_EXPORT_PRIVATE bool Contains(MaybeObject value);
class Iterator;
private:
......
......@@ -4285,6 +4285,13 @@ bool WeakArrayList::RemoveOne(const MaybeObjectHandle& value) {
return false;
}
bool WeakArrayList::Contains(MaybeObject value) {
for (int i = 0; i < length(); ++i) {
if (Get(i) == value) return true;
}
return false;
}
// static
Handle<WeakArrayList> PrototypeUsers::Add(Isolate* isolate,
Handle<WeakArrayList> array,
......
......@@ -9,6 +9,7 @@
#include "src/base/logging.h"
#include "src/base/platform/elapsed-timer.h"
#include "src/base/platform/platform.h"
#include "src/codegen/background-merge-task.h"
#include "src/common/globals.h"
#include "src/handles/maybe-handles.h"
#include "src/handles/persistent-handles.h"
......@@ -465,6 +466,25 @@ MaybeHandle<SharedFunctionInfo> CodeSerializer::Deserialize(
return scope.CloseAndEscape(result);
}
Handle<Script> CodeSerializer::OffThreadDeserializeData::GetOnlyScript(
LocalHeap* heap) {
std::unique_ptr<PersistentHandles> previous_persistent_handles =
heap->DetachPersistentHandles();
heap->AttachPersistentHandles(std::move(persistent_handles));
DCHECK_EQ(scripts.size(), 1);
// Make a non-persistent handle to return.
Handle<Script> script = handle(*scripts[0], heap);
DCHECK_EQ(*script, maybe_result.ToHandleChecked()->script());
persistent_handles = heap->DetachPersistentHandles();
if (previous_persistent_handles) {
heap->AttachPersistentHandles(std::move(previous_persistent_handles));
}
return script;
}
CodeSerializer::OffThreadDeserializeData
CodeSerializer::StartDeserializeOffThread(LocalIsolate* local_isolate,
AlignedCachedData* cached_data) {
......@@ -496,7 +516,8 @@ CodeSerializer::StartDeserializeOffThread(LocalIsolate* local_isolate,
MaybeHandle<SharedFunctionInfo> CodeSerializer::FinishOffThreadDeserialize(
Isolate* isolate, OffThreadDeserializeData&& data,
AlignedCachedData* cached_data, Handle<String> source,
ScriptOriginOptions origin_options) {
ScriptOriginOptions origin_options,
BackgroundMergeTask* background_merge_task) {
base::ElapsedTimer timer;
if (FLAG_profile_deserialization || FLAG_log_function_events) timer.Start();
......@@ -543,23 +564,32 @@ MaybeHandle<SharedFunctionInfo> CodeSerializer::FinishOffThreadDeserialize(
DCHECK(data.persistent_handles->Contains(result.location()));
result = handle(*result, isolate);
// Fix up the source on the script. This should be the only deserialized
// script, and the off-thread deserializer should have set its source to
// the empty string.
DCHECK_EQ(data.scripts.size(), 1);
DCHECK_EQ(result->script(), *data.scripts[0]);
DCHECK_EQ(Script::cast(result->script()).source(),
ReadOnlyRoots(isolate).empty_string());
Script::cast(result->script()).set_source(*source);
// Fix up the script list to include the newly deserialized script.
Handle<WeakArrayList> list = isolate->factory()->script_list();
for (Handle<Script> script : data.scripts) {
DCHECK(data.persistent_handles->Contains(script.location()));
list =
WeakArrayList::AddToEnd(isolate, list, MaybeObjectHandle::Weak(script));
}
isolate->heap()->SetRootScriptList(*list);
if (background_merge_task &&
background_merge_task->HasPendingForegroundWork()) {
Handle<Script> script = handle(Script::cast(result->script()), isolate);
result = background_merge_task->CompleteMergeInForeground(isolate, script);
DCHECK(Script::cast(result->script()).source().StrictEquals(*source));
DCHECK(isolate->factory()->script_list()->Contains(
MaybeObject::MakeWeak(MaybeObject::FromObject(result->script()))));
} else {
// Fix up the source on the script. This should be the only deserialized
// script, and the off-thread deserializer should have set its source to
// the empty string.
DCHECK_EQ(data.scripts.size(), 1);
DCHECK_EQ(result->script(), *data.scripts[0]);
DCHECK_EQ(Script::cast(result->script()).source(),
ReadOnlyRoots(isolate).empty_string());
Script::cast(result->script()).set_source(*source);
// Fix up the script list to include the newly deserialized script.
Handle<WeakArrayList> list = isolate->factory()->script_list();
for (Handle<Script> script : data.scripts) {
DCHECK(data.persistent_handles->Contains(script.location()));
list = WeakArrayList::AddToEnd(isolate, list,
MaybeObjectHandle::Weak(script));
}
isolate->heap()->SetRootScriptList(*list);
}
if (FLAG_profile_deserialization) {
double ms = timer.Elapsed().InMillisecondsF();
......
......@@ -13,6 +13,7 @@ namespace v8 {
namespace internal {
class PersistentHandles;
class BackgroundMergeTask;
class V8_EXPORT_PRIVATE AlignedCachedData {
public:
......@@ -62,6 +63,10 @@ enum class SerializedCodeSanityCheckResult {
class CodeSerializer : public Serializer {
public:
struct OffThreadDeserializeData {
public:
bool HasResult() const { return !maybe_result.is_null(); }
Handle<Script> GetOnlyScript(LocalHeap* heap);
private:
friend class CodeSerializer;
MaybeHandle<SharedFunctionInfo> maybe_result;
......@@ -87,10 +92,11 @@ class CodeSerializer : public Serializer {
AlignedCachedData* cached_data);
V8_WARN_UNUSED_RESULT static MaybeHandle<SharedFunctionInfo>
FinishOffThreadDeserialize(Isolate* isolate, OffThreadDeserializeData&& data,
AlignedCachedData* cached_data,
Handle<String> source,
ScriptOriginOptions origin_options);
FinishOffThreadDeserialize(
Isolate* isolate, OffThreadDeserializeData&& data,
AlignedCachedData* cached_data, Handle<String> source,
ScriptOriginOptions origin_options,
BackgroundMergeTask* background_merge_task = nullptr);
uint32_t source_hash() const { return source_hash_; }
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment