Commit 954146a5 authored by Gabriel Charette's avatar Gabriel Charette Committed by Commit Bot

Make TimeTicks::Now() high-resolution whenever possible with low-latency.

It was already always high-resolution on POSIX but was never high
resolution on Windows. Windows does support low latency high-resolution
timers for the majority of our user base.

TimeTicks::HighResolutionNow() was only explicitly requested in testing
frameworks. As such I left the call in place but made it DCHECK that
it's running on a Windows machine on which high-resolution clocks are
used. This confirms that none of our test fleet has regressed with this
change (the previous HighResolutionNow() used to be slightly more
aggressive and also do it in a few configurations where we now fallback
to low-resolution now).

This implementation was copied as-is (modulo minor v8 API
compatibility tweaks). These implementations were the same in the
past but had diverged when, sadly, the same bug was fixed separately
years apart, in Chromium and V8:
chromium: https://codereview.chromium.org/1284053004 + https://codereview.chromium.org/2393953003
v8: https://codereview.chromium.org/1304873011

This is a prerequisite to add metrics around parallel task execution
(low-resolution clocks are useless at that level, but we also don't want
to incur high-latency clocks on machines that can't afford it cheaply).

Bug: chromium:807606
Change-Id: Id18e7be895d8431ebd0e565a1bdf358fe7838489
Reviewed-on: https://chromium-review.googlesource.com/897485Reviewed-by: 's avatarHannes Payer <hpayer@chromium.org>
Commit-Queue: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#51027}
parent 0320986a
...@@ -415,6 +415,11 @@ struct timeval Time::ToTimeval() const { ...@@ -415,6 +415,11 @@ struct timeval Time::ToTimeval() const {
#endif // V8_OS_WIN #endif // V8_OS_WIN
// static
TimeTicks TimeTicks::HighResolutionNow() {
DCHECK(TimeTicks::IsHighResolution());
return TimeTicks::Now();
}
Time Time::FromJsTime(double ms_since_epoch) { Time Time::FromJsTime(double ms_since_epoch) {
// The epoch is a valid time, so this constructor doesn't interpret // The epoch is a valid time, so this constructor doesn't interpret
...@@ -447,165 +452,221 @@ std::ostream& operator<<(std::ostream& os, const Time& time) { ...@@ -447,165 +452,221 @@ std::ostream& operator<<(std::ostream& os, const Time& time) {
#if V8_OS_WIN #if V8_OS_WIN
class TickClock { namespace {
public:
virtual ~TickClock() {}
virtual int64_t Now() = 0;
virtual bool IsHighResolution() = 0;
};
// Overview of time counters: // We define a wrapper to adapt between the __stdcall and __cdecl call of the
// mock function, and to avoid a static constructor. Assigning an import to a
// function pointer directly would require setup code to fetch from the IAT.
DWORD timeGetTimeWrapper() { return timeGetTime(); }
DWORD (*g_tick_function)(void) = &timeGetTimeWrapper;
// A structure holding the most significant bits of "last seen" and a
// "rollover" counter.
union LastTimeAndRolloversState {
// The state as a single 32-bit opaque value.
base::Atomic32 as_opaque_32;
// The state as usable values.
struct {
// The top 8-bits of the "last" time. This is enough to check for rollovers
// and the small bit-size means fewer CompareAndSwap operations to store
// changes in state, which in turn makes for fewer retries.
uint8_t last_8;
// A count of the number of detected rollovers. Using this as bits 47-32
// of the upper half of a 64-bit value results in a 48-bit tick counter.
// This extends the total rollover period from about 49 days to about 8800
// years while still allowing it to be stored with last_8 in a single
// 32-bit value.
uint16_t rollovers;
} as_values;
};
base::Atomic32 g_last_time_and_rollovers = 0;
static_assert(sizeof(LastTimeAndRolloversState) <=
sizeof(g_last_time_and_rollovers),
"LastTimeAndRolloversState does not fit in a single atomic word");
// We use timeGetTime() to implement TimeTicks::Now(). This can be problematic
// because it returns the number of milliseconds since Windows has started,
// which will roll over the 32-bit value every ~49 days. We try to track
// rollover ourselves, which works if TimeTicks::Now() is called at least every
// 48.8 days (not 49 days because only changes in the top 8 bits get noticed).
TimeTicks RolloverProtectedNow() {
LastTimeAndRolloversState state;
DWORD now; // DWORD is always unsigned 32 bits.
while (true) {
// Fetch the "now" and "last" tick values, updating "last" with "now" and
// incrementing the "rollovers" counter if the tick-value has wrapped back
// around. Atomic operations ensure that both "last" and "rollovers" are
// always updated together.
int32_t original = base::Acquire_Load(&g_last_time_and_rollovers);
state.as_opaque_32 = original;
now = g_tick_function();
uint8_t now_8 = static_cast<uint8_t>(now >> 24);
if (now_8 < state.as_values.last_8) ++state.as_values.rollovers;
state.as_values.last_8 = now_8;
// If the state hasn't changed, exit the loop.
if (state.as_opaque_32 == original) break;
// Save the changed state. If the existing value is unchanged from the
// original, exit the loop.
int32_t check = base::Release_CompareAndSwap(&g_last_time_and_rollovers,
original, state.as_opaque_32);
if (check == original) break;
// Another thread has done something in between so retry from the top.
}
return TimeTicks() +
TimeDelta::FromMilliseconds(
now + (static_cast<uint64_t>(state.as_values.rollovers) << 32));
}
// Discussion of tick counter options on Windows:
//
// (1) CPU cycle counter. (Retrieved via RDTSC) // (1) CPU cycle counter. (Retrieved via RDTSC)
// The CPU counter provides the highest resolution time stamp and is the least // The CPU counter provides the highest resolution time stamp and is the least
// expensive to retrieve. However, the CPU counter is unreliable and should not // expensive to retrieve. However, on older CPUs, two issues can affect its
// be used in production. Its biggest issue is that it is per processor and it // reliability: First it is maintained per processor and not synchronized
// is not synchronized between processors. Also, on some computers, the counters // between processors. Also, the counters will change frequency due to thermal
// will change frequency due to thermal and power changes, and stop in some // and power changes, and stop in some states.
// states.
// //
// (2) QueryPerformanceCounter (QPC). The QPC counter provides a high- // (2) QueryPerformanceCounter (QPC). The QPC counter provides a high-
// resolution (100 nanoseconds) time stamp but is comparatively more expensive // resolution (<1 microsecond) time stamp. On most hardware running today, it
// to retrieve. What QueryPerformanceCounter actually does is up to the HAL. // auto-detects and uses the constant-rate RDTSC counter to provide extremely
// (with some help from ACPI). // efficient and reliable time stamps.
// According to http://blogs.msdn.com/oldnewthing/archive/2005/09/02/459952.aspx //
// in the worst case, it gets the counter from the rollover interrupt on the // On older CPUs where RDTSC is unreliable, it falls back to using more
// expensive (20X to 40X more costly) alternate clocks, such as HPET or the ACPI
// PM timer, and can involve system calls; and all this is up to the HAL (with
// some help from ACPI). According to
// http://blogs.msdn.com/oldnewthing/archive/2005/09/02/459952.aspx, in the
// worst case, it gets the counter from the rollover interrupt on the
// programmable interrupt timer. In best cases, the HAL may conclude that the // programmable interrupt timer. In best cases, the HAL may conclude that the
// RDTSC counter runs at a constant frequency, then it uses that instead. On // RDTSC counter runs at a constant frequency, then it uses that instead. On
// multiprocessor machines, it will try to verify the values returned from // multiprocessor machines, it will try to verify the values returned from
// RDTSC on each processor are consistent with each other, and apply a handful // RDTSC on each processor are consistent with each other, and apply a handful
// of workarounds for known buggy hardware. In other words, QPC is supposed to // of workarounds for known buggy hardware. In other words, QPC is supposed to
// give consistent result on a multiprocessor computer, but it is unreliable in // give consistent results on a multiprocessor computer, but for older CPUs it
// reality due to bugs in BIOS or HAL on some, especially old computers. // can be unreliable due bugs in BIOS or HAL.
// With recent updates on HAL and newer BIOS, QPC is getting more reliable but
// it should be used with caution.
// //
// (3) System time. The system time provides a low-resolution (typically 10ms // (3) System time. The system time provides a low-resolution (from ~1 to ~15.6
// to 55 milliseconds) time stamp but is comparatively less expensive to // milliseconds) time stamp but is comparatively less expensive to retrieve and
// retrieve and more reliable. // more reliable. Time::EnableHighResolutionTimer() and
class HighResolutionTickClock final : public TickClock { // Time::ActivateHighResolutionTimer() can be called to alter the resolution of
public: // this timer; and also other Windows applications can alter it, affecting this
explicit HighResolutionTickClock(int64_t ticks_per_second) // one.
: ticks_per_second_(ticks_per_second) {
DCHECK_LT(0, ticks_per_second); TimeTicks InitialTimeTicksNowFunction();
}
virtual ~HighResolutionTickClock() {} // See "threading notes" in InitializeNowFunctionPointer() for details on how
// concurrent reads/writes to these globals has been made safe.
int64_t Now() override { using TimeTicksNowFunction = decltype(&TimeTicks::Now);
uint64_t now = QPCNowRaw(); TimeTicksNowFunction g_time_ticks_now_function = &InitialTimeTicksNowFunction;
int64_t g_qpc_ticks_per_second = 0;
// Intentionally calculate microseconds in a round about manner to avoid
// overflow and precision issues. Think twice before simplifying! // As of January 2015, use of <atomic> is forbidden in Chromium code. This is
int64_t whole_seconds = now / ticks_per_second_; // what std::atomic_thread_fence does on Windows on all Intel architectures when
int64_t leftover_ticks = now % ticks_per_second_; // the memory_order argument is anything but std::memory_order_seq_cst:
int64_t ticks = (whole_seconds * Time::kMicrosecondsPerSecond) + #define ATOMIC_THREAD_FENCE(memory_order) _ReadWriteBarrier();
((leftover_ticks * Time::kMicrosecondsPerSecond) / ticks_per_second_);
TimeDelta QPCValueToTimeDelta(LONGLONG qpc_value) {
// Make sure we never return 0 here, so that TimeTicks::HighResolutionNow() // Ensure that the assignment to |g_qpc_ticks_per_second|, made in
// will never return 0. // InitializeNowFunctionPointer(), has happened by this point.
return ticks + 1; ATOMIC_THREAD_FENCE(memory_order_acquire);
}
DCHECK_GT(g_qpc_ticks_per_second, 0);
bool IsHighResolution() override { return true; }
// If the QPC Value is below the overflow threshold, we proceed with
private: // simple multiply and divide.
int64_t ticks_per_second_; if (qpc_value < TimeTicks::kQPCOverflowThreshold) {
}; return TimeDelta::FromMicroseconds(
qpc_value * TimeTicks::kMicrosecondsPerSecond / g_qpc_ticks_per_second);
}
class RolloverProtectedTickClock final : public TickClock { // Otherwise, calculate microseconds in a round about manner to avoid
public: // overflow and precision issues.
RolloverProtectedTickClock() : rollover_(0) {} int64_t whole_seconds = qpc_value / g_qpc_ticks_per_second;
virtual ~RolloverProtectedTickClock() {} int64_t leftover_ticks = qpc_value - (whole_seconds * g_qpc_ticks_per_second);
return TimeDelta::FromMicroseconds(
int64_t Now() override { (whole_seconds * TimeTicks::kMicrosecondsPerSecond) +
// We use timeGetTime() to implement TimeTicks::Now(), which rolls over ((leftover_ticks * TimeTicks::kMicrosecondsPerSecond) /
// every ~49.7 days. We try to track rollover ourselves, which works if g_qpc_ticks_per_second));
// TimeTicks::Now() is called at least every 24 days. }
// Note that we do not use GetTickCount() here, since timeGetTime() gives
// more predictable delta values, as described here: TimeTicks QPCNow() { return TimeTicks() + QPCValueToTimeDelta(QPCNowRaw()); }
// http://blogs.msdn.com/b/larryosterman/archive/2009/09/02/what-s-the-difference-between-gettickcount-and-timegettime.aspx
// timeGetTime() provides 1ms granularity when combined with bool IsBuggyAthlon(const CPU& cpu) {
// timeBeginPeriod(). If the host application for V8 wants fast timers, it // On Athlon X2 CPUs (e.g. model 15) QueryPerformanceCounter is unreliable.
// can use timeBeginPeriod() to increase the resolution. return strcmp(cpu.vendor(), "AuthenticAMD") == 0 && cpu.family() == 15;
// We use a lock-free version because the sampler thread calls it }
// while having the rest of the world stopped, that could cause a deadlock.
base::Atomic32 rollover = base::Acquire_Load(&rollover_);
uint32_t now = static_cast<uint32_t>(timeGetTime());
if ((now >> 31) != static_cast<uint32_t>(rollover & 1)) {
base::Release_CompareAndSwap(&rollover_, rollover, rollover + 1);
++rollover;
}
uint64_t ms = (static_cast<uint64_t>(rollover) << 31) | now;
return static_cast<int64_t>(ms * Time::kMicrosecondsPerMillisecond);
}
bool IsHighResolution() override { return false; }
private:
base::Atomic32 rollover_;
};
static LazyStaticInstance<RolloverProtectedTickClock,
DefaultConstructTrait<RolloverProtectedTickClock>,
ThreadSafeInitOnceTrait>::type tick_clock =
LAZY_STATIC_INSTANCE_INITIALIZER;
struct CreateHighResTickClockTrait { void InitializeTimeTicksNowFunctionPointer() {
static TickClock* Create() { LARGE_INTEGER ticks_per_sec = {};
// Check if the installed hardware supports a high-resolution performance if (!QueryPerformanceFrequency(&ticks_per_sec)) ticks_per_sec.QuadPart = 0;
// counter, and if not fallback to the low-resolution tick clock.
LARGE_INTEGER ticks_per_second; // If Windows cannot provide a QPC implementation, TimeTicks::Now() must use
if (!QueryPerformanceFrequency(&ticks_per_second)) { // the low-resolution clock.
return tick_clock.Pointer(); //
// If the QPC implementation is expensive and/or unreliable, TimeTicks::Now()
// will still use the low-resolution clock. A CPU lacking a non-stop time
// counter will cause Windows to provide an alternate QPC implementation that
// works, but is expensive to use. Certain Athlon CPUs are known to make the
// QPC implementation unreliable.
//
// Otherwise, Now uses the high-resolution QPC clock. As of 21 August 2015,
// ~72% of users fall within this category.
TimeTicksNowFunction now_function;
CPU cpu;
if (ticks_per_sec.QuadPart <= 0 || !cpu.has_non_stop_time_stamp_counter() ||
IsBuggyAthlon(cpu)) {
now_function = &RolloverProtectedNow;
} else {
now_function = &QPCNow;
} }
// If QPC not reliable, fallback to low-resolution tick clock. // Threading note 1: In an unlikely race condition, it's possible for two or
if (IsQPCReliable()) { // more threads to enter InitializeNowFunctionPointer() in parallel. This is
return tick_clock.Pointer(); // not a problem since all threads should end up writing out the same values
} // to the global variables.
//
// Threading note 2: A release fence is placed here to ensure, from the
// perspective of other threads using the function pointers, that the
// assignment to |g_qpc_ticks_per_second| happens before the function pointers
// are changed.
g_qpc_ticks_per_second = ticks_per_sec.QuadPart;
ATOMIC_THREAD_FENCE(memory_order_release);
g_time_ticks_now_function = now_function;
}
return new HighResolutionTickClock(ticks_per_second.QuadPart); TimeTicks InitialTimeTicksNowFunction() {
} InitializeTimeTicksNowFunctionPointer();
}; return g_time_ticks_now_function();
}
#undef ATOMIC_THREAD_FENCE
static LazyDynamicInstance<TickClock, CreateHighResTickClockTrait, } // namespace
ThreadSafeInitOnceTrait>::type high_res_tick_clock =
LAZY_DYNAMIC_INSTANCE_INITIALIZER;
// static // static
TimeTicks TimeTicks::Now() { TimeTicks TimeTicks::Now() {
// Make sure we never return 0 here. // Make sure we never return 0 here.
TimeTicks ticks(tick_clock.Pointer()->Now()); TimeTicks ticks(g_time_ticks_now_function());
DCHECK(!ticks.IsNull()); DCHECK(!ticks.IsNull());
return ticks; return ticks;
} }
// static // static
TimeTicks TimeTicks::HighResolutionNow() { bool TimeTicks::IsHighResolution() {
// Make sure we never return 0 here. if (g_time_ticks_now_function == &InitialTimeTicksNowFunction)
TimeTicks ticks(high_res_tick_clock.Pointer()->Now()); InitializeTimeTicksNowFunctionPointer();
DCHECK(!ticks.IsNull()); return g_time_ticks_now_function == &QPCNow;
return ticks;
}
// static
bool TimeTicks::IsHighResolutionClockWorking() {
return high_res_tick_clock.Pointer()->IsHighResolution();
} }
#else // V8_OS_WIN #else // V8_OS_WIN
TimeTicks TimeTicks::Now() { TimeTicks TimeTicks::Now() {
return HighResolutionNow();
}
TimeTicks TimeTicks::HighResolutionNow() {
int64_t ticks; int64_t ticks;
#if V8_OS_MACOSX #if V8_OS_MACOSX
static struct mach_timebase_info info; static struct mach_timebase_info info;
...@@ -627,11 +688,8 @@ TimeTicks TimeTicks::HighResolutionNow() { ...@@ -627,11 +688,8 @@ TimeTicks TimeTicks::HighResolutionNow() {
return TimeTicks(ticks + 1); return TimeTicks(ticks + 1);
} }
// static // static
bool TimeTicks::IsHighResolutionClockWorking() { bool TimeTicks::IsHighResolution() { return true; }
return true;
}
#endif // V8_OS_WIN #endif // V8_OS_WIN
......
...@@ -5,6 +5,8 @@ ...@@ -5,6 +5,8 @@
#ifndef V8_BASE_PLATFORM_TIME_H_ #ifndef V8_BASE_PLATFORM_TIME_H_
#define V8_BASE_PLATFORM_TIME_H_ #define V8_BASE_PLATFORM_TIME_H_
#include <stdint.h>
#include <ctime> #include <ctime>
#include <iosfwd> #include <iosfwd>
#include <limits> #include <limits>
...@@ -177,22 +179,29 @@ namespace time_internal { ...@@ -177,22 +179,29 @@ namespace time_internal {
template<class TimeClass> template<class TimeClass>
class TimeBase { class TimeBase {
public: public:
static const int64_t kHoursPerDay = 24; static constexpr int64_t kHoursPerDay = 24;
static const int64_t kMillisecondsPerSecond = 1000; static constexpr int64_t kMillisecondsPerSecond = 1000;
static const int64_t kMillisecondsPerDay = static constexpr int64_t kMillisecondsPerDay =
kMillisecondsPerSecond * 60 * 60 * kHoursPerDay; kMillisecondsPerSecond * 60 * 60 * kHoursPerDay;
static const int64_t kMicrosecondsPerMillisecond = 1000; static constexpr int64_t kMicrosecondsPerMillisecond = 1000;
static const int64_t kMicrosecondsPerSecond = static constexpr int64_t kMicrosecondsPerSecond =
kMicrosecondsPerMillisecond * kMillisecondsPerSecond; kMicrosecondsPerMillisecond * kMillisecondsPerSecond;
static const int64_t kMicrosecondsPerMinute = kMicrosecondsPerSecond * 60; static constexpr int64_t kMicrosecondsPerMinute = kMicrosecondsPerSecond * 60;
static const int64_t kMicrosecondsPerHour = kMicrosecondsPerMinute * 60; static constexpr int64_t kMicrosecondsPerHour = kMicrosecondsPerMinute * 60;
static const int64_t kMicrosecondsPerDay = static constexpr int64_t kMicrosecondsPerDay =
kMicrosecondsPerHour * kHoursPerDay; kMicrosecondsPerHour * kHoursPerDay;
static const int64_t kMicrosecondsPerWeek = kMicrosecondsPerDay * 7; static constexpr int64_t kMicrosecondsPerWeek = kMicrosecondsPerDay * 7;
static const int64_t kNanosecondsPerMicrosecond = 1000; static constexpr int64_t kNanosecondsPerMicrosecond = 1000;
static const int64_t kNanosecondsPerSecond = static constexpr int64_t kNanosecondsPerSecond =
kNanosecondsPerMicrosecond * kMicrosecondsPerSecond; kNanosecondsPerMicrosecond * kMicrosecondsPerSecond;
#if V8_OS_WIN
// To avoid overflow in QPC to Microseconds calculations, since we multiply
// by kMicrosecondsPerSecond, then the QPC value should not exceed
// (2^63 - 1) / 1E6. If it exceeds that threshold, we divide then multiply.
static constexpr int64_t kQPCOverflowThreshold = INT64_C(0x8637BD05AF7);
#endif
// Returns true if this object has not been initialized. // Returns true if this object has not been initialized.
// //
// Warning: Be careful when writing code that performs math on time values, // Warning: Be careful when writing code that performs math on time values,
...@@ -345,21 +354,20 @@ class V8_BASE_EXPORT TimeTicks final ...@@ -345,21 +354,20 @@ class V8_BASE_EXPORT TimeTicks final
public: public:
TimeTicks() : TimeBase(0) {} TimeTicks() : TimeBase(0) {}
// Platform-dependent tick count representing "right now." // Platform-dependent tick count representing "right now." When
// The resolution of this clock is ~1-15ms. Resolution varies depending // IsHighResolution() returns false, the resolution of the clock could be as
// on hardware/operating system configuration. // coarse as ~15.6ms. Otherwise, the resolution should be no worse than one
// microsecond.
// This method never returns a null TimeTicks. // This method never returns a null TimeTicks.
static TimeTicks Now(); static TimeTicks Now();
// Returns a platform-dependent high-resolution tick count. Implementation // This is equivalent to Now() but DCHECKs that IsHighResolution(). Useful for
// is hardware dependent and may or may not return sub-millisecond // test frameworks that rely on high resolution clocks (in practice all
// resolution. THIS CALL IS GENERALLY MUCH MORE EXPENSIVE THAN Now() AND // platforms but low-end Windows devices have high resolution clocks).
// SHOULD ONLY BE USED WHEN IT IS REALLY NEEDED.
// This method never returns a null TimeTicks.
static TimeTicks HighResolutionNow(); static TimeTicks HighResolutionNow();
// Returns true if the high-resolution clock is working on this system. // Returns true if the high-resolution clock is working on this system.
static bool IsHighResolutionClockWorking(); static bool IsHighResolution();
private: private:
friend class time_internal::TimeBase<TimeTicks>; friend class time_internal::TimeBase<TimeTicks>;
......
...@@ -153,21 +153,15 @@ TEST(Time, NowResolution) { ...@@ -153,21 +153,15 @@ TEST(Time, NowResolution) {
TEST(TimeTicks, NowResolution) { TEST(TimeTicks, NowResolution) {
// We assume that TimeTicks::Now() has at least 16ms resolution. // TimeTicks::Now() is documented as having "no worse than one microsecond"
static const TimeDelta kTargetGranularity = TimeDelta::FromMilliseconds(16); // resolution. Unless !TimeTicks::IsHighResolution() in which case the clock
// could be as coarse as ~15.6ms.
const TimeDelta kTargetGranularity = TimeTicks::IsHighResolution()
? TimeDelta::FromMicroseconds(1)
: TimeDelta::FromMilliseconds(16);
ResolutionTest<TimeTicks>(&TimeTicks::Now, kTargetGranularity); ResolutionTest<TimeTicks>(&TimeTicks::Now, kTargetGranularity);
} }
TEST(TimeTicks, HighResolutionNowResolution) {
if (!TimeTicks::IsHighResolutionClockWorking()) return;
// We assume that TimeTicks::HighResolutionNow() has sub-ms resolution.
static const TimeDelta kTargetGranularity = TimeDelta::FromMilliseconds(1);
ResolutionTest<TimeTicks>(&TimeTicks::HighResolutionNow, kTargetGranularity);
}
TEST(TimeTicks, IsMonotonic) { TEST(TimeTicks, IsMonotonic) {
TimeTicks previous_normal_ticks; TimeTicks previous_normal_ticks;
TimeTicks previous_highres_ticks; TimeTicks previous_highres_ticks;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment