The following results were obtained by pytorch cuda
profiling.
----------------------------------------------------------------------------
Name Self CPU% ... Self CUDA Self CUDA% ...
aten::mul 0.84% ... 136.329ms 20.82% ...
aten::nonzero ...
.
.
I want to check if it takes more time to calculate or access memory when the function is executed. I'm wondering if the time in aten::mul should I think that only computation time is calculated or that memory access time is included. Thanks!
When I measured the time with the Nsight system, it takes a lot of time to access the memory, so I think the time calculated as atem::mul includes the memory access time, but I'm not sure, so I ask you guys