[test] Fix occasional hangs on pool termination
On termination of the worker pool in the main process, a SIGTERM is sent from pool to worker. It was meant to terminate long-running tests in the worker process. The signal handler on the worker side, however, was only registered during test execution. During the remaining logic (<1% of the time probably) the default system behavior for SIGTERM would be used (which will likely just kill the process). The ungracefully killed process might be killed while writing to the results queue, which then remains with corrupted data. Later when the main process cleans up the queue, it hangs. We now register a default handler in the worker process that catches the SIGTERM and also gracefully stops the processing loop. Like that, the SIGTERM signal will always be handled in workers and never fall back to SIGKILL. However, a small time window exists when the SIGTERM was caught right when starting a test process, but when the test-abort handler was not registered yet. We keep fixing this as a TODO. Worst case, the main process will block until the last test run is done. Bug: v8:13113 Change-Id: Ib60f82c6a1569da042c9f44f7b516e2f40a46f93 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3882972Reviewed-by: Alexander Schulze <alexschulze@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> Cr-Commit-Position: refs/heads/main@{#83101}
Showing
Please
register
or
sign in
to comment