Why do two logical cores belonging to the same physical core have different frequencies?

69 Views Asked by At

When enabling Turbo mode on the CPU, the operating frequency fluctuates with changes of the workload. I observed through the turbostat command that each core has a different frequency during operation.

Interestingly, even in the case of hyper-threading, the frequencies of the two logical cores belonging to the same physical core are also different. Considering that both logical cores are based on the same physical core, why do they have different operating frequencies?

Furthermore, in Intel CPUs, there is a concept of "turbo level" for AVX instructions, where lvl1/2 instructions can cause the CPU to work under lower frequency. Does this behavior affect the frequency of the other logical core on the same physical core?

2

There are 2 best solutions below

0
Peter Cordes On

That's not physically possible. It probably changed frequencies between your observations.

"Physical core" isn't an abstraction, there literally is one physical core (that executes the instructions of a pair of logical cores), and it has a specific clock speed and voltage at any given time.

Changing frequency involves pausing the clock for a few microseconds while the new voltage and/or frequency stabilize. (Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC). (And yes, changing voltage at the same frequency can be necessary for 256-bit or 512-bit instructions, to give some headroom for the greater swings in voltage from the potentially more highly varying currents drawn by SIMD multipliers being idle or not in a given cycle.)

So yes, wide SIMD instructions lowering the frequency of a core will affect both logical cores. But probably not other physical cores; at least for turbo above the "base" speed, even desktop / laptop CPUs should be able to independently change clocks.

For speeds below the rated speed, Intel client CPUs (not server) generally do run all cores at the same speed, so an infinite loop on one core can be sufficient to stop another core from down-clocking below the base frequency even if they otherwise would on memory-bound workloads with EPP = balance_performance or less (not performance). Slowing down CPU Frequency by imposing memory stress

0
John D McCalpin On

"Logical processors" running on the same physical core must be operating at the same instantaneous frequency, but that does not mean that it is always possible to measure that fact. Although Intel processors do have an MSR that will report the current CPU frequency multiplier, that instantaneous value is not very useful. Since MSRs can only be read in the kernel, it is easy for the transition from user space to kernel space (several thousand cycles of relatively low activity) to generate a frequency change or hide a pending frequency change (e.g., the frequency change happened after the code requested the frequency, but before a previously triggered frequency change actually occurred). For the more recent generations of cores, the recommended configuration includes enabling "Hardware-controlled performance states" (HWP), in which "the processor autonomously selects performance states as deemed appropriate for the applied workload and with consideration of constraining hints that are programmed by the OS." (Intel Arch SW Dev Manual, Volume 3, document 325384-080, section 15.4). Since the operating frequency can change up to ~1000 times per second, isolated samples of the instantaneous frequency are unhelpful.

Intel recommends using hardware performance counters to measure unhalted core cycles and unhalted reference cycles over intervals to compute the average frequency over that interval. This works extremely well for large intervals, but gets tricky for smaller intervals. Trickiness involves factors such as (a) the overhead of reading counters, (b) the inability to atomically read more than one performance counter, (c) the deprecation of "anythread" mode in performance counter events, and (d) the coarse resolution of the unhalted reference cycles counter on many platforms (see footnote).

Two references that might be helpful in looking at the details:

Footnote: The hardware performance counter event CPU_CLK_UNHALTED.REF_TSC provides an estimate of the reference (TSC) cycles elapsed with the processor not in a HALT state. In the old days (e.g., Haswell?) this used to increment more-or-less continuously. Starting in Skylake Xeon, it became more coarse, apparently counting "crystal clocks" (25MHz = 40 ns each) and then multiplying that number by 4 (to match the nominal 100 MHz base clock) and then by the TSC frequency multiplier (e.g., 21 for a 2.1 GHz processor) to get a scaled value. This gives a minimum granularity of 84 for this counter, which can bias computed frequencies for short measurement intervals. As an example, reading this counter 64 times as fast as possible on a Xeon Platinum 8160 running at 3.4 GHz gives ~52 differences of zero and ~11 differences of 84. The statistics of the differences vary with core frequency.