I have measured cycles and dTLB_load_misses.walk_active for an application using perf tool. I executed the application two times. First time in isolation and the second with a single thread memory-bound app running on another core (same socket). I noticed that cycles is almost steady between executions but dTLB_load_misses.walk_active increases sometimes up to 50%.
My understanding is that cycles shows the total CPU cycles the application used throughout its execution. Also, dTLB_load_misses.walk_active is the total cycles that page miss handler (PMH) actively walked the page table. If this is correct, then, how it is possible that PMH cycles increase significantly but total cycles remains steady or even slightly decrease in some cases?
The process running in isolation:
Performance counter stats for './wc ../../data/wc/300MB_1M_Keys.txt -p 10':
186904181366 cycles (66.45%)
13002068556 dTLB_load_misses.walk_active (66.78%)
176249928 dTLB-loads-misses # 0.69% of all dTLB cache hits (66.81%)
25500563014 dTLB-loads (66.57%)
469338866 cache-misses # 55.629 % of all cache refs (66.73%)
843696674 cache-references (66.64%)
9.257579987 seconds time elapsed
The process runs beside a single thread memory-bound process:
Performance counter stats for './wc ../../data/wc/300MB_1M_Keys.txt -p 10':
184567329729 cycles (66.50%)
14006882920 dTLB_load_misses.walk_active (66.61%)
229413645 dTLB-loads-misses # 0.89% of all dTLB cache hits (66.82%)
25693682363 dTLB-loads (66.78%)
472148745 cache-misses # 55.619 % of all cache refs (66.68%)
848898094 cache-references (66.61%)
9.243842709 seconds time elapsed
EDIT:
Transparent huge pages, THP defrag, numa_balancing, and turbo boost are disabled.
My CPU is Intel Skylake, Xeon Gold 6142.