• Michael Achenbach's avatar
    Optimize check for existing files in download_from_google_storage · 1f067b88
    Michael Achenbach authored
    The first call to gsutil causes a network call and makes the download
    script slow also in the most optimal cases. This CL refactors the
    download script and moves the first gsutil call after checking locally
    if sha1s match.
    
    1) This turns the input acquisition into a generator and buffers the
    files and sha1s in a list before multithreading.
    
    2) This sequentially checks the sha1s and files and bails out early if
    all match. In Chrome-land, we usually call this script with only one
    file. There are some cases with around 4. This could also be
    parallelized if the need arises.
    
    3) The initial gsutil check, which ensures gsutil is updated, is moved
    right in front of the multithreaded downloads.
    
    The performance of one call to download_from_google_storage for an
    existing 500MB file is 2.3s before this CL and 1.2s after (most of the
    remaining time left is spent for making sha1sum).
    
    Example for full gclient runhooks (when everything is up-to-date):
    
    Chromium: 32s before, 12s after
    V8: 12s before, 3s after
    
    Bug: 776311
    Change-Id: Ia7715a6af84b1b336455ea88494d399bdb050317
    Reviewed-on: https://chromium-review.googlesource.com/897562
    Commit-Queue: Michael Achenbach <machenbach@chromium.org>
    Reviewed-by: 's avatarSergiy Byelozyorov <sergiyb@chromium.org>
    Reviewed-by: 's avatarRyan Tseng <hinoka@chromium.org>
    1f067b88
download_from_google_storage.py 22.4 KB