• Michael Achenbach's avatar
    [test] Fix occasional hangs on pool termination · cd1ee28b
    Michael Achenbach authored
    On termination of the worker pool in the main process, a SIGTERM is
    sent from pool to worker. It was meant to terminate long-running
    tests in the worker process. The signal handler on the worker side,
    however, was only registered during test execution. During the
    remaining logic (<1% of the time probably) the default system
    behavior for SIGTERM would be used (which will likely just kill
    the process). The ungracefully killed process might be killed while
    writing to the results queue, which then remains with corrupted data.
    Later when the main process cleans up the queue, it hangs.
    
    We now register a default handler in the worker process that catches
    the SIGTERM and also gracefully stops the processing loop. Like
    that, the SIGTERM signal will always be handled in workers and never
    fall back to SIGKILL.
    
    However, a small time window exists when the SIGTERM was caught
    right when starting a test process, but when the test-abort handler
    was not registered yet. We keep fixing this as a TODO. Worst case,
    the main process will block until the last test run is done.
    
    Bug: v8:13113
    Change-Id: Ib60f82c6a1569da042c9f44f7b516e2f40a46f93
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3882972Reviewed-by: 's avatarAlexander Schulze <alexschulze@chromium.org>
    Commit-Queue: Michael Achenbach <machenbach@chromium.org>
    Cr-Commit-Position: refs/heads/main@{#83101}
    cd1ee28b
pool.py 12.7 KB