I am using Perf to detect the number and locations of DRAM accesses in a workload. For locations, I need to trace in sampling mode. Therefore, I trace in sampling mode, and in order to get the total number of accesses, I multiply the number of access events by the sampling period (i.e., the number of instructions between each sample) .
I also need the total time the application is on the CPU. In other words, I need a value similar to what is reported by top. Because I need to know the idle period (during which the application is also on the CPU) between each DRAM access in this single application. This is not reported in sampling mode and I can not use both modes (i.e., both perf record and perf stat) at the same time. Is there any mechanism to achieve this?
Ftracedump is huge. It dumps logs for every process, while I only need information for my program and its threads.Ftracefilter can only work onPIDs. But I need process names (a.k.a,comms). So this is the best that I could do without recompiling the kernel.I used
Systemtapwith the following script:scheduler.process_exit()is a probe point from theSystemtaptapset library, that internally hooks thedo_exit()kernel function (I used the former because it seems more portable).execname()returns the process name andtask_current()returns the task for the current context.se(a schedulable entity for the scheduler) is a field intask_struct(data structure associated with a process or thread in Linux) andsum_exec_runtimeshows the total physical runtime of the schedulable entity.So, this is what the script does:
At each process (or thread) exit, it checks if the name of the process (or thread) is evince. If this is the case, the total execution time (in nanoseconds) of evince will be displayed (because we are in the context of the evince process before exit).
Obviously, this is not the most portable solution.