• Seth Brenith's avatar
    Background merging of deserialized scripts · e895b7af
    Seth Brenith authored
    Recently, https://crrev.com/c/v8/v8/+/3681880 added new API functions
    with which an embedder could request that V8 merge newly deserialized
    script data into an existing Script from the Isolate's compilation
    cache. This change implements those new functions. This functionality is
    still disabled by default due to the flag
    merge_background_deserialized_script_with_compilation_cache.
    
    The goal of this new functionality is to reduce memory usage when
    multiple frames load the same script with a long delay between (long
    enough for the script to have been evicted from Blink's in-memory cache
    and for the top-level SharedFunctionInfo to be flushed). In that case,
    there are two Script objects for the same script: one which was found in
    the Isolate compilation cache (the "old" script), and one which was
    recently deserialized (the "new" script). The new script's object graph
    is essentially standalone: it may point to internalized strings and
    readonly objects such as the empty feedback metadata, but otherwise
    it is unconnected to the rest of the heap. The merging logic takes any
    useful data from the new script's object graph and attaches it into the
    old script's object graph, so that the new Script object and any other
    duplicated objects can be discarded. More specifically:
    
    1. If the new Script has a SharedFunctionInfo for a particular function
       literal, and the old Script does not, then the old Script is updated
       to refer to the new SharedFunctionInfo.
    2. If the new Script has a compiled SharedFunctionInfo for a particular
       function literal, and the old Script has an uncompiled
       SharedFunctionInfo, then the old SharedFunctionInfo is updated to
       point to the function_data and feedback_metadata from the new
       SharedFunctionInfo.
    3. If any used object from the new object graph points to a
       SharedFunctionInfo, where the old object graph contains a matching
       SharedFunctionInfo for the same function literal, then that pointer
       is updated to point to the old SharedFunctionInfo.
    
    The document at [0] includes diagrams showing an example merge on a very
    small script.
    
    Steps 1 and 2 above are pretty simple, but step 3 requires walking a
    possibly large set of objects, so this new API lets the embedder run
    step 3 from a background thread. Steps 1 and 2 are performed later, on
    the main thread.
    
    The next important question is: in what ways can the old script's object
    graph be modified during the background execution of step 3, or during
    the time after step 3 but before steps 1 and 2?
    
    A. SharedFunctionInfos can go from compiled to uncompiled due to
       flushing. This is okay; the worst outcome is that the function would
       need to be compiled again later. Such a risk is already present,
       since V8 doesn't keep IsCompiledScopes for every compiled function in
       a background-deserialized script.
    B. SharedFunctionInfos can go from uncompiled to compiled due to lazy
       compilation. This is also okay; the merge completion logic on the
       main thread will just keep this lazily compiled data rather than
       inserting compiled data from the newly deserialized object graph.
    C. SharedFunctionInfos can be cleared from the Script's weak array if
       they are no longer referenced. This is mostly okay, because any
       SharedFunctionInfo that is needed by the background merge is strongly
       referenced and therefore can't be cleared. The only problem arises if
       the top-level SharedFunctionInfo gets cleared, so the merge task must
       deliberately keep a reference to that one.
    D. SharedFunctionInfos can be created if they are needed due to lazy
       compilation of a parent function. This change is somewhat troublesome
       because it invalidates the background thread's work and requires a
       re-traversal on the main thread to update any pointers that should
       point to this lazily compiled SharedFunctionInfo.
    
    At a high level, this change implements three previously unimplemented
    functions in BackgroundDeserializeTask (in compiler.cc) and updates one:
    
    - BackgroundDeserializeTask::SourceTextAvailable, run on the main
      thread, checks whether there is a matching Script in the Isolate
      compilation cache which doesn't already have a top-level
      SharedFunctionInfo. If so, it saves that Script in a persistent
      handle.
    - BackgroundDeserializeTask::ShouldMergeWithExistingScript checks
      whether the persistent handle from the first step exists (a fast
      operation which can be called from any thread).
    - BackgroundDeserializeTask::MergeWithExistingScript, run on a
      background thread, performs step 3 of the merge described above and
      generates lists of persistent data describing how the main thread can
      complete the merge.
    - BackgroundDeserializeTask::Finish is updated to perform the merge
      steps 1 and 2 listed above, as well as a possible re-traversal of the
      graph if required due to newly created SharedFunctionInfos in the old
      Script.
    
    The merge logic has nothing to do with deserialization, and indeed I
    hope to reuse it for background compilation tasks as well, so it is all
    contained within a new class BackgroundMergeTask (in compiler.h,cc). It
    uses a second class, ForwardPointersVisitor (in compiler.cc) to perform
    the object visitation that updates pointers to SharedFunctionInfos.
    
    [0] https://docs.google.com/document/d/1UksB5Vm7TT1-f3S9W1dK_rP9jKn_ly0WVm_UDPpWuBw/edit
    
    Bug: v8:12808
    Change-Id: Id405869e9d5b106ca7afd9c4b08cb5813e6852c6
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3739232Reviewed-by: 's avatarLeszek Swirski <leszeks@chromium.org>
    Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
    Cr-Commit-Position: refs/heads/main@{#81941}
    e895b7af
Name
Last commit
Last update
.github Loading commit data...
bazel Loading commit data...
build_overrides Loading commit data...
custom_deps Loading commit data...
docs Loading commit data...
gni Loading commit data...
include Loading commit data...
infra Loading commit data...
samples Loading commit data...
src Loading commit data...
test Loading commit data...
testing Loading commit data...
third_party Loading commit data...
tools Loading commit data...
.bazelrc Loading commit data...
.clang-format Loading commit data...
.clang-tidy Loading commit data...
.editorconfig Loading commit data...
.flake8 Loading commit data...
.git-blame-ignore-revs Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.gn Loading commit data...
.mailmap Loading commit data...
.style.yapf Loading commit data...
.vpython Loading commit data...
.vpython3 Loading commit data...
.ycm_extra_conf.py Loading commit data...
AUTHORS Loading commit data...
BUILD.bazel Loading commit data...
BUILD.gn Loading commit data...
CODE_OF_CONDUCT.md Loading commit data...
COMMON_OWNERS Loading commit data...
DEPS Loading commit data...
DIR_METADATA Loading commit data...
ENG_REVIEW_OWNERS Loading commit data...
INFRA_OWNERS Loading commit data...
INTL_OWNERS Loading commit data...
LICENSE Loading commit data...
LICENSE.fdlibm Loading commit data...
LICENSE.strongtalk Loading commit data...
LICENSE.v8 Loading commit data...
LOONG_OWNERS Loading commit data...
MIPS_OWNERS Loading commit data...
OWNERS Loading commit data...
PPC_OWNERS Loading commit data...
PRESUBMIT.py Loading commit data...
README.md Loading commit data...
RISCV_OWNERS Loading commit data...
S390_OWNERS Loading commit data...
WATCHLISTS Loading commit data...
WORKSPACE Loading commit data...
codereview.settings Loading commit data...