I have a test application like this:
int main()
{
// calls sched_setaffinity() to set affinity to core 0
while(true)
{
}
return 0;
}
I have 4x logical cores across 2x physical cores.
I would like to see perf event counters but only for the CPU core my app uses.
I run this Perf command (killing it after a few seconds):
sudo perf stat -e cycles:u --cpu=0 --delay=1000 ./app
four times, each time changing the cpu id, it shows cycles:u > 0.
How can all four cpus be executing userspace cycles for an application which pins to core 0?
Surely three of the cores should not be executing userspace cycles for my application?
It appears the docs for
perf stat --cpu=are wrong or misleading when they say The-aoption is still necessary to activate system-wide monitoring.It seems to behave identically with or without
-a, so the counts are from other processes, like GUI animations, cursor blink, etc. Or forcycles:k, from any interrupt handler regardless of which process is the kernel'scurrenton that core.I tested this on Arch GNU/Linux system with kernel and perf 6.5. The smoking gun here is running another infinite-loop process pinned to a different core, and seeing
cycles:uon that core the same as the core running the command from the perf command-line.Baseline without any funky options, with
awk 'BEGIN{for(i=0;i<100000000;i++){}}'as the command being profiled. (It keeps a CPU core busy for a few seconds, making no system calls except at start and exit, and having a small cache footprint.)vs. with
--cpu, two signs of not limiting itself to theawkprocess: task-clock is 8x longer than the elapsed time. And the header it prints: Performance counter stats for 'CPU(s) 0-7': rather than for our process.And with
-Aaka--no-aggrto not aggregate across CPUs, each one has task-clock running for the full interval even though our process definitely isn't on it (since it and perf itself started pinned to CPU 1):And with a dummy load (like another awk process, or in this case I ran program that just loops on
pauseinstruction, which isn't special other than using less power), we see it uses as many cycles core CPU 2 asawkdid on CPU 1.Note the
10,485,755,009vs.10,485,655,861counts forcycles:uon cores 1 and 2. The same to within one part per 10k. (At 3.9GHz on my i7-6700k, and the absolute difference is about 1 million, or about 250 nanoseconds, which could easily be accounted for by differences in starting / stopping counting on different cores.)-a --cpuonly changes the header message, not countsAdding
-ato any of these (except the one without--cpu) changes nothing except the header message, to Performance counter stats for 'system wide': instead of 'CPU(s) 0-7'Everything I'm seeing is consistent with
--cpuactually counting everything happening on that core wileperf statis running, just like-amode.