Background merging of deserialized scripts
Recently, https://crrev.com/c/v8/v8/+/3681880 added new API functions with which an embedder could request that V8 merge newly deserialized script data into an existing Script from the Isolate's compilation cache. This change implements those new functions. This functionality is still disabled by default due to the flag merge_background_deserialized_script_with_compilation_cache. The goal of this new functionality is to reduce memory usage when multiple frames load the same script with a long delay between (long enough for the script to have been evicted from Blink's in-memory cache and for the top-level SharedFunctionInfo to be flushed). In that case, there are two Script objects for the same script: one which was found in the Isolate compilation cache (the "old" script), and one which was recently deserialized (the "new" script). The new script's object graph is essentially standalone: it may point to internalized strings and readonly objects such as the empty feedback metadata, but otherwise it is unconnected to the rest of the heap. The merging logic takes any useful data from the new script's object graph and attaches it into the old script's object graph, so that the new Script object and any other duplicated objects can be discarded. More specifically: 1. If the new Script has a SharedFunctionInfo for a particular function literal, and the old Script does not, then the old Script is updated to refer to the new SharedFunctionInfo. 2. If the new Script has a compiled SharedFunctionInfo for a particular function literal, and the old Script has an uncompiled SharedFunctionInfo, then the old SharedFunctionInfo is updated to point to the function_data and feedback_metadata from the new SharedFunctionInfo. 3. If any used object from the new object graph points to a SharedFunctionInfo, where the old object graph contains a matching SharedFunctionInfo for the same function literal, then that pointer is updated to point to the old SharedFunctionInfo. The document at [0] includes diagrams showing an example merge on a very small script. Steps 1 and 2 above are pretty simple, but step 3 requires walking a possibly large set of objects, so this new API lets the embedder run step 3 from a background thread. Steps 1 and 2 are performed later, on the main thread. The next important question is: in what ways can the old script's object graph be modified during the background execution of step 3, or during the time after step 3 but before steps 1 and 2? A. SharedFunctionInfos can go from compiled to uncompiled due to flushing. This is okay; the worst outcome is that the function would need to be compiled again later. Such a risk is already present, since V8 doesn't keep IsCompiledScopes for every compiled function in a background-deserialized script. B. SharedFunctionInfos can go from uncompiled to compiled due to lazy compilation. This is also okay; the merge completion logic on the main thread will just keep this lazily compiled data rather than inserting compiled data from the newly deserialized object graph. C. SharedFunctionInfos can be cleared from the Script's weak array if they are no longer referenced. This is mostly okay, because any SharedFunctionInfo that is needed by the background merge is strongly referenced and therefore can't be cleared. The only problem arises if the top-level SharedFunctionInfo gets cleared, so the merge task must deliberately keep a reference to that one. D. SharedFunctionInfos can be created if they are needed due to lazy compilation of a parent function. This change is somewhat troublesome because it invalidates the background thread's work and requires a re-traversal on the main thread to update any pointers that should point to this lazily compiled SharedFunctionInfo. At a high level, this change implements three previously unimplemented functions in BackgroundDeserializeTask (in compiler.cc) and updates one: - BackgroundDeserializeTask::SourceTextAvailable, run on the main thread, checks whether there is a matching Script in the Isolate compilation cache which doesn't already have a top-level SharedFunctionInfo. If so, it saves that Script in a persistent handle. - BackgroundDeserializeTask::ShouldMergeWithExistingScript checks whether the persistent handle from the first step exists (a fast operation which can be called from any thread). - BackgroundDeserializeTask::MergeWithExistingScript, run on a background thread, performs step 3 of the merge described above and generates lists of persistent data describing how the main thread can complete the merge. - BackgroundDeserializeTask::Finish is updated to perform the merge steps 1 and 2 listed above, as well as a possible re-traversal of the graph if required due to newly created SharedFunctionInfos in the old Script. The merge logic has nothing to do with deserialization, and indeed I hope to reuse it for background compilation tasks as well, so it is all contained within a new class BackgroundMergeTask (in compiler.h,cc). It uses a second class, ForwardPointersVisitor (in compiler.cc) to perform the object visitation that updates pointers to SharedFunctionInfos. [0] https://docs.google.com/document/d/1UksB5Vm7TT1-f3S9W1dK_rP9jKn_ly0WVm_UDPpWuBw/edit Bug: v8:12808 Change-Id: Id405869e9d5b106ca7afd9c4b08cb5813e6852c6 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3739232Reviewed-by: Leszek Swirski <leszeks@chromium.org> Commit-Queue: Seth Brenith <seth.brenith@microsoft.com> Cr-Commit-Position: refs/heads/main@{#81941}
Showing
This diff is collapsed.
This diff is collapsed.
Please
register
or
sign in
to comment