Commit 766b2a4d authored by Seth Brenith's avatar Seth Brenith Committed by V8 LUCI CQ

Reland "Background merging of deserialized scripts"

This is a reland of commit e895b7af

The unit test has been updated to work correctly when
--stress-incremental-marking is enabled.

Original change's description:
> Background merging of deserialized scripts
>
> Recently, https://crrev.com/c/v8/v8/+/3681880 added new API functions
> with which an embedder could request that V8 merge newly deserialized
> script data into an existing Script from the Isolate's compilation
> cache. This change implements those new functions. This functionality is
> still disabled by default due to the flag
> merge_background_deserialized_script_with_compilation_cache.
>
> The goal of this new functionality is to reduce memory usage when
> multiple frames load the same script with a long delay between (long
> enough for the script to have been evicted from Blink's in-memory cache
> and for the top-level SharedFunctionInfo to be flushed). In that case,
> there are two Script objects for the same script: one which was found in
> the Isolate compilation cache (the "old" script), and one which was
> recently deserialized (the "new" script). The new script's object graph
> is essentially standalone: it may point to internalized strings and
> readonly objects such as the empty feedback metadata, but otherwise
> it is unconnected to the rest of the heap. The merging logic takes any
> useful data from the new script's object graph and attaches it into the
> old script's object graph, so that the new Script object and any other
> duplicated objects can be discarded. More specifically:
>
> 1. If the new Script has a SharedFunctionInfo for a particular function
>    literal, and the old Script does not, then the old Script is updated
>    to refer to the new SharedFunctionInfo.
> 2. If the new Script has a compiled SharedFunctionInfo for a particular
>    function literal, and the old Script has an uncompiled
>    SharedFunctionInfo, then the old SharedFunctionInfo is updated to
>    point to the function_data and feedback_metadata from the new
>    SharedFunctionInfo.
> 3. If any used object from the new object graph points to a
>    SharedFunctionInfo, where the old object graph contains a matching
>    SharedFunctionInfo for the same function literal, then that pointer
>    is updated to point to the old SharedFunctionInfo.
>
> The document at [0] includes diagrams showing an example merge on a very
> small script.
>
> Steps 1 and 2 above are pretty simple, but step 3 requires walking a
> possibly large set of objects, so this new API lets the embedder run
> step 3 from a background thread. Steps 1 and 2 are performed later, on
> the main thread.
>
> The next important question is: in what ways can the old script's object
> graph be modified during the background execution of step 3, or during
> the time after step 3 but before steps 1 and 2?
>
> A. SharedFunctionInfos can go from compiled to uncompiled due to
>    flushing. This is okay; the worst outcome is that the function would
>    need to be compiled again later. Such a risk is already present,
>    since V8 doesn't keep IsCompiledScopes for every compiled function in
>    a background-deserialized script.
> B. SharedFunctionInfos can go from uncompiled to compiled due to lazy
>    compilation. This is also okay; the merge completion logic on the
>    main thread will just keep this lazily compiled data rather than
>    inserting compiled data from the newly deserialized object graph.
> C. SharedFunctionInfos can be cleared from the Script's weak array if
>    they are no longer referenced. This is mostly okay, because any
>    SharedFunctionInfo that is needed by the background merge is strongly
>    referenced and therefore can't be cleared. The only problem arises if
>    the top-level SharedFunctionInfo gets cleared, so the merge task must
>    deliberately keep a reference to that one.
> D. SharedFunctionInfos can be created if they are needed due to lazy
>    compilation of a parent function. This change is somewhat troublesome
>    because it invalidates the background thread's work and requires a
>    re-traversal on the main thread to update any pointers that should
>    point to this lazily compiled SharedFunctionInfo.
>
> At a high level, this change implements three previously unimplemented
> functions in BackgroundDeserializeTask (in compiler.cc) and updates one:
>
> - BackgroundDeserializeTask::SourceTextAvailable, run on the main
>   thread, checks whether there is a matching Script in the Isolate
>   compilation cache which doesn't already have a top-level
>   SharedFunctionInfo. If so, it saves that Script in a persistent
>   handle.
> - BackgroundDeserializeTask::ShouldMergeWithExistingScript checks
>   whether the persistent handle from the first step exists (a fast
>   operation which can be called from any thread).
> - BackgroundDeserializeTask::MergeWithExistingScript, run on a
>   background thread, performs step 3 of the merge described above and
>   generates lists of persistent data describing how the main thread can
>   complete the merge.
> - BackgroundDeserializeTask::Finish is updated to perform the merge
>   steps 1 and 2 listed above, as well as a possible re-traversal of the
>   graph if required due to newly created SharedFunctionInfos in the old
>   Script.
>
> The merge logic has nothing to do with deserialization, and indeed I
> hope to reuse it for background compilation tasks as well, so it is all
> contained within a new class BackgroundMergeTask (in compiler.h,cc). It
> uses a second class, ForwardPointersVisitor (in compiler.cc) to perform
> the object visitation that updates pointers to SharedFunctionInfos.
>
> [0] https://docs.google.com/document/d/1UksB5Vm7TT1-f3S9W1dK_rP9jKn_ly0WVm_UDPpWuBw/edit
>
> Bug: v8:12808
> Change-Id: Id405869e9d5b106ca7afd9c4b08cb5813e6852c6
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3739232
> Reviewed-by: Leszek Swirski <leszeks@chromium.org>
> Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
> Cr-Commit-Position: refs/heads/main@{#81941}

Bug: v8:12808
Change-Id: Id2036dfa4eba8670cac899773d7a906825fa2c50
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3787266Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Cr-Commit-Position: refs/heads/main@{#82045}
parent c34c85a5
......@@ -1160,6 +1160,7 @@ filegroup(
"src/codegen/assembler.cc",
"src/codegen/assembler.h",
"src/codegen/atomic-memory-order.h",
"src/codegen/background-merge-task.h",
"src/codegen/bailout-reason.cc",
"src/codegen/bailout-reason.h",
"src/codegen/callable.h",
......
......@@ -2792,6 +2792,7 @@ v8_header_set("v8_internal_headers") {
"src/codegen/assembler-inl.h",
"src/codegen/assembler.h",
"src/codegen/atomic-memory-order.h",
"src/codegen/background-merge-task.h",
"src/codegen/bailout-reason.h",
"src/codegen/callable.h",
"src/codegen/code-comments.h",
......
// Copyright 2022 the V8 project authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
#ifndef V8_CODEGEN_BACKGROUND_MERGE_TASK_H_
#define V8_CODEGEN_BACKGROUND_MERGE_TASK_H_
#include <vector>
#include "src/handles/maybe-handles.h"
namespace v8 {
namespace internal {
class FeedbackMetadata;
class PersistentHandles;
class Script;
class SharedFunctionInfo;
class String;
struct ScriptDetails;
// Contains data transferred between threads for background merging between a
// newly compiled or deserialized script and an existing script from the Isolate
// compilation cache.
class V8_EXPORT_PRIVATE BackgroundMergeTask {
public:
// Step 1: on the main thread, check whether the Isolate compilation cache
// contains the script.
void SetUpOnMainThread(Isolate* isolate, Handle<String> source_text,
const ScriptDetails& script_details,
LanguageMode language_mode);
// Step 2: on the background thread, update pointers in the new Script's
// object graph to point to corresponding objects from the cached Script where
// appropriate. May only be called if HasCachedScript returned true.
void BeginMergeInBackground(LocalIsolate* isolate, Handle<Script> new_script);
// Step 3: on the main thread again, complete the merge so that all relevant
// objects are reachable from the cached Script. May only be called if
// HasPendingForegroundWork returned true. Returns the top-level
// SharedFunctionInfo that should be used.
Handle<SharedFunctionInfo> CompleteMergeInForeground(
Isolate* isolate, Handle<Script> new_script);
bool HasCachedScript() const { return !cached_script_.is_null(); }
bool HasPendingForegroundWork() const {
return !used_new_sfis_.empty() ||
!new_compiled_data_for_cached_sfis_.empty();
}
private:
std::unique_ptr<PersistentHandles> persistent_handles_;
// Data from main thread:
MaybeHandle<Script> cached_script_;
// Data from background thread:
// The top-level SharedFunctionInfo from the cached script, if one existed,
// just to keep it alive.
MaybeHandle<SharedFunctionInfo> toplevel_sfi_from_cached_script_;
// New SharedFunctionInfos which are used because there was no corresponding
// SharedFunctionInfo in the cached script. The main thread must:
// 1. Check whether the cached script gained corresponding SharedFunctionInfos
// for any of these, and if so, redo the merge.
// 2. Update the cached script's shared_function_infos list to refer to these.
std::vector<Handle<SharedFunctionInfo>> used_new_sfis_;
// SharedFunctionInfos from the cached script which were not compiled, with
// function_data and feedback_metadata from the corresponding new
// SharedFunctionInfo. If the SharedFunctionInfo from the cached script is
// still uncompiled when finishing, the main thread must set the two fields.
struct NewCompiledDataForCachedSfi {
Handle<SharedFunctionInfo> cached_sfi;
Handle<Object> function_data;
Handle<FeedbackMetadata> feedback_metadata;
};
std::vector<NewCompiledDataForCachedSfi> new_compiled_data_for_cached_sfis_;
};
} // namespace internal
} // namespace v8
#endif // V8_CODEGEN_BACKGROUND_MERGE_TASK_H_
This diff is collapsed.
......@@ -11,6 +11,7 @@
#include "src/ast/ast-value-factory.h"
#include "src/base/platform/elapsed-timer.h"
#include "src/base/small-vector.h"
#include "src/codegen/background-merge-task.h"
#include "src/codegen/bailout-reason.h"
#include "src/common/globals.h"
#include "src/execution/isolate.h"
......@@ -635,6 +636,7 @@ class V8_EXPORT_PRIVATE BackgroundDeserializeTask {
Isolate* isolate_for_local_isolate_;
AlignedCachedData cached_data_;
CodeSerializer::OffThreadDeserializeData off_thread_data_;
BackgroundMergeTask background_merge_task_;
};
} // namespace internal
......
......@@ -485,9 +485,16 @@ Handle<CompilationCacheTable> CompilationCacheTable::PutScript(
cache = EnsureScriptTableCapacity(isolate, cache);
entry = cache->FindInsertionEntry(isolate, key.Hash());
}
// TODO(v8:12808): Once all code paths are updated to reuse a Script if
// available, we could DCHECK here that the Script in the existing entry
// matches the Script in the new key. For now, there is no such guarantee.
// We might be tempted to DCHECK here that the Script in the existing entry
// matches the Script in the new key. However, replacing an existing Script
// can still happen in some edge cases that aren't common enough to be worth
// fixing. Consider the following unlikely sequence of events:
// 1. BackgroundMergeTask::SetUpOnMainThread finds a script S1 in the cache.
// 2. DevTools is attached and clears the cache.
// 3. DevTools is detached; the cache is reenabled.
// 4. A new instance of the script, S2, is compiled and placed into the cache.
// 5. The merge from step 1 finishes on the main thread, still using S1, and
// places S1 into the cache, replacing S2.
cache->SetKeyAt(entry, *k);
cache->SetPrimaryValueAt(entry, *value);
if (!found_existing) {
......
......@@ -422,6 +422,9 @@ class WeakArrayList
// duplicates.
V8_EXPORT_PRIVATE bool RemoveOne(const MaybeObjectHandle& value);
// Searches the array (linear time) and returns whether it contains the value.
V8_EXPORT_PRIVATE bool Contains(MaybeObject value);
class Iterator;
private:
......
......@@ -4285,6 +4285,13 @@ bool WeakArrayList::RemoveOne(const MaybeObjectHandle& value) {
return false;
}
bool WeakArrayList::Contains(MaybeObject value) {
for (int i = 0; i < length(); ++i) {
if (Get(i) == value) return true;
}
return false;
}
// static
Handle<WeakArrayList> PrototypeUsers::Add(Isolate* isolate,
Handle<WeakArrayList> array,
......
......@@ -9,6 +9,7 @@
#include "src/base/logging.h"
#include "src/base/platform/elapsed-timer.h"
#include "src/base/platform/platform.h"
#include "src/codegen/background-merge-task.h"
#include "src/common/globals.h"
#include "src/handles/maybe-handles.h"
#include "src/handles/persistent-handles.h"
......@@ -465,6 +466,25 @@ MaybeHandle<SharedFunctionInfo> CodeSerializer::Deserialize(
return scope.CloseAndEscape(result);
}
Handle<Script> CodeSerializer::OffThreadDeserializeData::GetOnlyScript(
LocalHeap* heap) {
std::unique_ptr<PersistentHandles> previous_persistent_handles =
heap->DetachPersistentHandles();
heap->AttachPersistentHandles(std::move(persistent_handles));
DCHECK_EQ(scripts.size(), 1);
// Make a non-persistent handle to return.
Handle<Script> script = handle(*scripts[0], heap);
DCHECK_EQ(*script, maybe_result.ToHandleChecked()->script());
persistent_handles = heap->DetachPersistentHandles();
if (previous_persistent_handles) {
heap->AttachPersistentHandles(std::move(previous_persistent_handles));
}
return script;
}
CodeSerializer::OffThreadDeserializeData
CodeSerializer::StartDeserializeOffThread(LocalIsolate* local_isolate,
AlignedCachedData* cached_data) {
......@@ -496,7 +516,8 @@ CodeSerializer::StartDeserializeOffThread(LocalIsolate* local_isolate,
MaybeHandle<SharedFunctionInfo> CodeSerializer::FinishOffThreadDeserialize(
Isolate* isolate, OffThreadDeserializeData&& data,
AlignedCachedData* cached_data, Handle<String> source,
ScriptOriginOptions origin_options) {
ScriptOriginOptions origin_options,
BackgroundMergeTask* background_merge_task) {
base::ElapsedTimer timer;
if (FLAG_profile_deserialization || FLAG_log_function_events) timer.Start();
......@@ -543,23 +564,32 @@ MaybeHandle<SharedFunctionInfo> CodeSerializer::FinishOffThreadDeserialize(
DCHECK(data.persistent_handles->Contains(result.location()));
result = handle(*result, isolate);
// Fix up the source on the script. This should be the only deserialized
// script, and the off-thread deserializer should have set its source to
// the empty string.
DCHECK_EQ(data.scripts.size(), 1);
DCHECK_EQ(result->script(), *data.scripts[0]);
DCHECK_EQ(Script::cast(result->script()).source(),
ReadOnlyRoots(isolate).empty_string());
Script::cast(result->script()).set_source(*source);
// Fix up the script list to include the newly deserialized script.
Handle<WeakArrayList> list = isolate->factory()->script_list();
for (Handle<Script> script : data.scripts) {
DCHECK(data.persistent_handles->Contains(script.location()));
list =
WeakArrayList::AddToEnd(isolate, list, MaybeObjectHandle::Weak(script));
}
isolate->heap()->SetRootScriptList(*list);
if (background_merge_task &&
background_merge_task->HasPendingForegroundWork()) {
Handle<Script> script = handle(Script::cast(result->script()), isolate);
result = background_merge_task->CompleteMergeInForeground(isolate, script);
DCHECK(Script::cast(result->script()).source().StrictEquals(*source));
DCHECK(isolate->factory()->script_list()->Contains(
MaybeObject::MakeWeak(MaybeObject::FromObject(result->script()))));
} else {
// Fix up the source on the script. This should be the only deserialized
// script, and the off-thread deserializer should have set its source to
// the empty string.
DCHECK_EQ(data.scripts.size(), 1);
DCHECK_EQ(result->script(), *data.scripts[0]);
DCHECK_EQ(Script::cast(result->script()).source(),
ReadOnlyRoots(isolate).empty_string());
Script::cast(result->script()).set_source(*source);
// Fix up the script list to include the newly deserialized script.
Handle<WeakArrayList> list = isolate->factory()->script_list();
for (Handle<Script> script : data.scripts) {
DCHECK(data.persistent_handles->Contains(script.location()));
list = WeakArrayList::AddToEnd(isolate, list,
MaybeObjectHandle::Weak(script));
}
isolate->heap()->SetRootScriptList(*list);
}
if (FLAG_profile_deserialization) {
double ms = timer.Elapsed().InMillisecondsF();
......
......@@ -13,6 +13,7 @@ namespace v8 {
namespace internal {
class PersistentHandles;
class BackgroundMergeTask;
class V8_EXPORT_PRIVATE AlignedCachedData {
public:
......@@ -62,6 +63,10 @@ enum class SerializedCodeSanityCheckResult {
class CodeSerializer : public Serializer {
public:
struct OffThreadDeserializeData {
public:
bool HasResult() const { return !maybe_result.is_null(); }
Handle<Script> GetOnlyScript(LocalHeap* heap);
private:
friend class CodeSerializer;
MaybeHandle<SharedFunctionInfo> maybe_result;
......@@ -87,10 +92,11 @@ class CodeSerializer : public Serializer {
AlignedCachedData* cached_data);
V8_WARN_UNUSED_RESULT static MaybeHandle<SharedFunctionInfo>
FinishOffThreadDeserialize(Isolate* isolate, OffThreadDeserializeData&& data,
AlignedCachedData* cached_data,
Handle<String> source,
ScriptOriginOptions origin_options);
FinishOffThreadDeserialize(
Isolate* isolate, OffThreadDeserializeData&& data,
AlignedCachedData* cached_data, Handle<String> source,
ScriptOriginOptions origin_options,
BackgroundMergeTask* background_merge_task = nullptr);
uint32_t source_hash() const { return source_hash_; }
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment